edges.averaging.averaging¶

Functions for averaging arrays.

There are multiple methods in this module, due to the need for careful averages/binning done in different ways. There are ultimately three axes over which we might bin spectra: nights, LST/GHA and frequency. Each of these in fact requires slightly different methods for averaging, in order to make the average unbiased (given flags).

edges.averaging.averaging.bin_array_unweighted(x: ndarray, size: int = 1) → ndarray[source]¶

Simple unweighted mean-binning of an array.

Parameters:

x – The array to be binned. Only the last axis will be binned.
size – The size of the bins.

Notes

The last axis of x is binned. It is assumed that the coordinates corresponding to x are regularly spaced, so the final average just takes size values and averages them together.

If the array is not divisible by size, the last values are left out.

Examples

Simple 1D example:

>>> x = np.array([1, 1, 2, 2, 3, 3])
>>> bin_array(x, size=2)
[1, 2, 3]

The last remaining values are left out:

>>> x = np.array([1, 1, 2, 2, 3, 3, 4])
>>> bin_array(x, size=2)
[1, 2, 3]

edges.averaging.averaging.bin_data(data: ndarray, residuals: ndarray | None = None, weights: ndarray | None = None, bins: list[ndarray | slice] | None = None, axis: int = -1) → tuple[ndarray, ndarray, ndarray][source]¶

Bin data, in an un-biased way if possible.

This uses the estimator from memo #183: http://loco.lab.asu.edu/wp-content/uploads/2020/10/averaging_with_weights.pdf.

Parameters:

data – The data to be binned.
residuals – The residuals of the data, if known. If not provided, and weights is non uniform, the average will be biased.
weights – The weights of the data. If not provided, assume all weights are unity.
bins – The bins into which to bin the data. If not provided, assume a single bin encompassing all the data. Each element should be either an array that indexes into the axis over which to bin, or a slice object.
axis – The axis over which to bin.

Returns:

data – The binned data.
weights – The weights of the binned data.
residuals – The binned residuals (if provided).

Get bin edges given input coordinates and a simple description of the binning.

Parameters:

coords – The input co-ordinates to bin. These must be regular and monotonically increasing.
bins – The bin edges (lower inclusive, upper not inclusive). If an int, simply use bins coords per bin, starting from the first bin. If a float, use equi-spaced bin edges, starting from the start of coords, and ending past the end of coords. If an array, assumed to be the bin edges. If not provided, assume a single bin encompassing all the data.
start – Where to start the bin edges when bins is an int or float. Defaults to the first coordinate minus half of the median coordinate difference.
stop – Where to stop the bin edges when bins is an int or float. Defaults to the last coordinate plus half of the median coordinate difference.

Returns:

np.ndarray – The bin edges.

Notes

This function is robust to the input coordinates being astropy Quantities, and will return a quantity if the coordinates are.

edges.averaging.averaging.get_binned_weights(x: ndarray, bins: ndarray, weights: ndarray | None = None, include_left: bool = True, include_right: bool = True) → ndarray[source]¶

Get the total weight in each bin for a given vector.

Parameters:

x – The input co-ordinates (1D).
bins – The bin edges into which to bin the x.
weights – Array with last dimension the same length as x. Input weights. Default is all ones.
include_left – Whether to include coordinates to the left of the minimum bin in the first bin.
include_right – Whether to include coordinates to the right of the maximum bin in the last bin. Note that for historical reasons, this is True, but it should probably be set to False for typical cases.

Returns:

weights – Output bin weights, with shape equal to the input weights, but with the last dimension replaced by len(bins) - 1.

edges.averaging.averaging.weighted_mean(data: ndarray, weights: ndarray | None = None, axis: int = -1, fill_value: float = nan, keepdims: bool = False) → tuple[ndarray, ndarray][source]¶

Perform a careful weighted mean where zero-weights don’t error.

In this function, if the total weight is zero, fill_value is returned.

Parameters:

data (array-like) – The data over which the weighted mean is to be taken.
weights (array-like, optional) – Same shape as data, giving the weights of each datum.
axis (int, optional) – The axis over which to take the mean.
fill_value (float, optional) – The value to fill in where the sum of the weights is zero.
keepdims (bool, optional) – Whether to keep the original dimensions of data (i.e. have an axis with length one for axis).

Returns:

avg – The weighted mean over axis, where elements with zero total weight are set to fill_value.
weights – The sum of the weights over axis.

edges.averaging.averaging.weighted_sum(data: ndarray, weights: ndarray | None = None, normalize: bool = False, axis: int = -1, fill_value: float = nan, keepdims: bool = False) → tuple[ndarray, ndarray][source]¶

Perform a careful weighted sum.

This routine is ‘careful’ in that it allows performing sums when the total weight is zero, setting those values to a user-defined filling value.

Parameters:

data (array-like) – The data over which the weighted mean is to be taken.
weights (array-like, optional) – Same shape as data, giving the weights of each datum. By default, all weights are unity.
normalize (bool, optional) – If True, normalize weights so that the maximum weight is unity.
axis (int, optional) – The axis over which to take the mean.
fill_value (float, optional) – The value to fill in where the sum of the weights is zero.
keepdims (bool, optional) – Whether to keep the original dimensions of data (i.e. have an axis with length one for axis).

Returns:

datasum – The weighted sum over axis, where elements with zero total weight are set to fill_value.
sumweights – The sum of the weights over axis.

edges.averaging.averaging.weighted_variance(data: ndarray, nsamples: ndarray | None = None, avg: ndarray | None = None, **kwargs)[source]¶

Calculate a careful weighted variance.

Simply calculates the weighted mean of [(data - mean)/sigma]^2 over the data, where the weights are 1/sigma^2. This is useful for computing the expected standard deviation when the intrinsic variance and number of samples of each datum are known.

Parameters:

data (array-like) – The data over which to calculate the variance.
nsamples – The number of samples corresponding to each datum. These will be used as weights. Default is all unity.
avg – The weighted average of the data over the given axis. By default, compute this internally.

Returns:

std – The weighted variance of the data over the given axis.
sumweights – The sum of the nsamples**2 over the given axis.