Skip to content

Add monitor normalization #252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
193 changes: 193 additions & 0 deletions src/ess/reduce/correction.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# SPDX-License-Identifier: BSD-3-Clause
# Copyright (c) 2025 Scipp contributors (https://github.com/scipp)
"""Correction algorithms for neutron data reduction."""

import scipp as sc

from .uncertainty import UncertaintyBroadcastMode, broadcast_uncertainties


def normalize_by_monitor_histogram(
detector: sc.DataArray,
*,
monitor: sc.DataArray,
uncertainty_broadcast_mode: UncertaintyBroadcastMode,
) -> sc.DataArray:
"""Normalize detector data by a histogrammed monitor.

First, the monitor is clipped to the range of the detector

.. math::

\\bar{m}_i = m_i I(x_i, x_{i+1}),

where :math:`m_i` is the monitor intensity in bin :math:`i`,
:math:`x_i` is the lower bin edge of bin :math:`i`, and
:math:`I(x_i, x_{i+1})` selects bins that are within the range of the detector.

The detector bins :math:`d_i` are normalized according to

.. math::

d_i^\\text{Norm} = \\frac{d_i}{\\bar{m}_i} \\Delta x_i
\\frac{\\sum_j\\,\\bar{m}_j}{\\sum_j\\,\\Delta x_j}

where :math:`\\Delta x_i` is the width of monitor bin :math:`i` (see below).
This normalization leads to a result that has the same
unit as the input detector data.

Monitor bin :math:`i` is chosen according to:

- *Histogrammed detector*: The monitor is
`rebinned <https://scipp.github.io/generated/functions/scipp.rebin.html>`_
to the detector binning. This distributes the monitor weights to the
detector bins.
- *Binned detector*: The monitor value for bin :math:`i` is determined via
:func:`scipp.lookup`. This means that for each event, the monitor value
is obtained from the monitor histogram at that event coordinate value.

This function is based on the implementation in
`NormaliseToMonitor <https://docs.mantidproject.org/nightly/algorithms/NormaliseToMonitor-v1.html>`_
of Mantid.

Parameters
----------
detector:
Input detector data.
Must have a coordinate named ``monitor.dim``, that is, the single
dimension name of the monitor.
monitor:
A histogrammed monitor.
Must be one-dimensional and have a dimension coordinate, typically "wavelength".
uncertainty_broadcast_mode:
Choose how uncertainties of the monitor are broadcast to the sample data.

Returns
-------
:
``detector`` normalized by ``monitor``.

See also
--------
normalize_by_monitor_integrated:
Normalize by an integrated monitor.
"""
dim = monitor.dim

clipped = _clip_monitor_to_detector_range(monitor=monitor, detector=detector)
coord = clipped.coords[dim]
delta_w = coord[1:] - coord[:-1]
total_monitor_weight = broadcast_uncertainties(
clipped.sum() / delta_w.sum(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nansum, maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where would NaN values come from? And if there are any, should we instead mask them? See discussion above.

prototype=clipped,
mode=uncertainty_broadcast_mode,
)
delta_w *= total_monitor_weight
norm = broadcast_uncertainties(
clipped / delta_w, prototype=detector, mode=uncertainty_broadcast_mode
)

if detector.bins is None:
return detector / norm
return detector.bins / sc.lookup(norm, dim=dim)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment regarding using the reciprocal.



def normalize_by_monitor_integrated(
detector: sc.DataArray,
*,
monitor: sc.DataArray,
uncertainty_broadcast_mode: UncertaintyBroadcastMode,
) -> sc.DataArray:
"""Normalize detector data by an integrated monitor.

The monitor is integrated according to

.. math::

M = \\sum_{i=0}^{N-1}\\, m_i (x_{i+1} - x_i) I(x_i, x_{i+1}),

where :math:`m_i` is the monitor intensity in bin :math:`i`,
:math:`x_i` is the lower bin edge of bin :math:`i`, and
:math:`I(x_i, x_{i+1})` selects bins that are within the range of the detector.

Parameters
----------
detector:
Input detector data.
monitor:
A histogrammed monitor.
Must be one-dimensional and have a dimension coordinate, typically "wavelength".
uncertainty_broadcast_mode:
Choose how uncertainties of the monitor are broadcast to the sample data.

Returns
-------
:
`detector` normalized by a monitor.

See also
--------
normalize_by_monitor_histogram:
Normalize by a monitor histogram without integration.
"""
clipped = _clip_monitor_to_detector_range(monitor=monitor, detector=detector)
coord = clipped.coords[clipped.dim]
norm = (clipped * (coord[1:] - coord[:-1])).data.sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask:

  • Why ignore masks?
  • Do we need nansum?
    But then I wondered if both of that is anyway invalid in the integrated case (unless the detector is masked in the same range)? So maybe the real question should be:

If there are masks (or nans) and we integrate, we should:

  • Apply the mask to the detector.
  • Mask the nan ranges (of the monitor) in the detector
  • Do not ignore the the mask in the integration.
  • Use nansum.

Or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right. We need the union of the detector and monitor masks.

norm = broadcast_uncertainties(
norm, prototype=detector, mode=uncertainty_broadcast_mode
)
return detector / norm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe minor in practice (because we are memory-bandwidth bound), but since division is much slower multiplication, use detector * sc.reciprocal(norm) or something similar.

Or is broadcast_uncertainties already returning a large array? Then we should take the reciprocal before that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is broadcast_uncertainties already returning a large array? Then we should take the reciprocal before that

It may.

But detector * sc.reciprocal(norm) has to allocate an intermediate array. I doubt that that is faster than division.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, due to the uncertainties broadcast. I had assumed the monitor term would always be small.



def _clip_monitor_to_detector_range(
*, monitor: sc.DataArray, detector: sc.DataArray
) -> sc.DataArray:
dim = monitor.dim
if not monitor.coords.is_edges(dim):
raise sc.CoordError(
f"Monitor coordinate '{dim}' must be bin-edges to integrate the monitor."
)

# Prefer a bin coord over an event coord because this makes the behavior for binned
# and histogrammed data consistent. If we used an event coord, we might allow a
# monitor range that is less than the detector bins which is fine for the vents,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# monitor range that is less than the detector bins which is fine for the vents,
# monitor range that is less than the detector bins which is fine for the events,

# but would be wrong if the detector was subsequently histogrammed.
if dim in detector.coords:
det_coord = detector.coords[dim]

# Mask zero-count bins, which are an artifact from the rectangular 2-D binning.
# The wavelength of those bins must be excluded when determining the range.
Comment on lines +158 to +159
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the assumption that we are not in detector space any more, but something like two_theta, as we did in essdiffraction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment does. But the code does not. Do you think we shouldn't mask in all cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not. Detector pixels can have true zeros (maybe unlikely with significant background?), but that should not limit the wavelength range?

if detector.bins is None:
mask = detector.data == sc.scalar(0.0, unit=detector.unit)
else:
mask = detector.data.bins.size() == sc.scalar(0.0, unit=None)
lo = (
sc.DataArray(det_coord[dim, :-1], masks={'zero_counts': mask}).nanmin().data
)
hi = sc.DataArray(det_coord[dim, 1:], masks={'zero_counts': mask}).nanmax().data

elif dim in detector.bins.coords:
det_coord = detector.bins.coords[dim]
# No need to mask here because we have the exact event coordinate values.
lo = det_coord.nanmin()
hi = det_coord.nanmax()

else:
raise sc.CoordError(
f"Missing '{dim}' coordinate in detector for monitor normalization."
)

if monitor.coords[dim].min() > lo or monitor.coords[dim].max() < hi:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi is the largest event, don't we need hi < mon.max for lookup to do its job?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That is what this condition checks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is, shouldn't it raise if hi is not less then the max? Currently it checks of max is less than hi, which is not the same, or is it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you asking whether the error condition should be monitor.coords[dim].max() <= hi?

This would not be correct for histogrammed detectors:

def test_normalize_by_monitor_histogram_aligned_bins_hist() -> None:

For binned detectors, it is also fine as long as there is a bin-coord:

def test_normalize_by_monitor_histogram_aligned_bins_w_event_coord() -> None:

If there is no bin-coord, then it gets tricky. I am not sure the code does the right thing in this case. E.g., consider

def test_normalize_by_monitor_histogram_aligned_bins_wo_bin_coord() -> None:
    detector = (
        sc.DataArray(
            sc.array(dims=['w'], values=[0, 10, 30], unit='counts'),
            coords={'w': sc.arange('w', 3.0, unit='Å')},
        )
        .bin(w=sc.array(dims=['w'], values=[0.0, 2, 3], unit='Å'))
        .drop_coords('w')
    )
    monitor = sc.DataArray(
        sc.array(dims=['w'], values=[5.0, 6.0], unit='counts'),
        coords={'w': sc.array(dims=['w'], values=[0.0, 2, 3], unit='Å')},
    )
    normalized = normalize_by_monitor_histogram(
        detector,
        monitor=monitor,
        uncertainty_broadcast_mode=UncertaintyBroadcastMode.fail,
    )

    expected = (
        sc.DataArray(
            sc.array(dims=['w'], values=[0.0, 44 / 3, 55 / 3], unit='counts'),
            coords={'w': sc.arange('w', 3.0, unit='Å')},
        )
        .bin(w=sc.array(dims=['w'], values=[0.0, 2, 3], unit='Å'))
        .drop_coords('w')
    )

    sc.testing.assert_identical(normalized, expected)

I would expect this to work based on the event coords. But hi is computed to be in this case which removes the upper monitor bin. This is one of the issues I was trying to avoid with this implementation but apparently failed. I am not sure how to best handle this case.

raise ValueError(
f"Cannot normalize by monitor: The {dim} range of the monitor "
f"({monitor.coords[dim].min().value} to {monitor.coords[dim].max().value}) "
f"is smaller than the range of the detector ({lo.value} to {hi.value})."
)
Comment on lines +181 to +185
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we deal with cases where the range is different, like I encounter in essdiffraction/beamlime? Above it seems not even a wavelength mask is taken into account, that is I this is just a dead end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extend this to take masks into account. But in general, you should slice your data to be within the monitor range.

Note that this has not changed compared to the implementation in ESSdiffraction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? The only way I could make essdiffraction work for me was by masking a wavelength range, or else I would get exceptions in the monitor normalization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in general, you should slice your data to be within the monitor range.

You who? Where do the workflows do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a separate step. It should be clearly visible that we automatically limit the wavelength range of the data.


if detector.bins is None:
# If we didn't rebin to the detector coord here, then, for a finer monitor
# binning than detector, the lookup table would extract one monitor value for
# each detector bin and ignore other values lying in the same detector bin.
# But integration would pick up all monitor bins.
return monitor.rebin({dim: det_coord})
return monitor[dim, lo:hi]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good but can you briefly outline which code was simply copied over and which code is new? Thanks :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing is copied without change. But normalize_by_monitor_integrated is unchanged apart from extracting _clip_monitor_to_detector_range. However, the latter has changed as I described in my initial comment.

normalize_by_monitor_histogram now also clips the data. And the actual normalisation has changed. It used to be https://github.com/scipp/essdiffraction/blob/c9c29dd6155cdb0f9fd201e6c9d6923a754b1e37/src/ess/powder/correction.py#L57-L67, i.e., only det / mon without extra scaling factors.

Loading
Loading