Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotations.orig_time not properly read from csv files #13108

Open
tsbinns opened this issue Feb 10, 2025 · 1 comment · May be fixed by #13109
Open

Annotations.orig_time not properly read from csv files #13108

tsbinns opened this issue Feb 10, 2025 · 1 comment · May be fixed by #13109
Labels

Comments

@tsbinns
Copy link
Contributor

tsbinns commented Feb 10, 2025

Description of the problem

Came across this today when annotating some artefacts in data using the current dev branch. Generated the annotations using the raw data browser and saved them, but when reading them back in and applying to the raw data, the annotations had been shifted towards the start of the recording.

Turns out the annotations I'd read in had orig_time=None even though this information was present in the annotations before saving and was present in the csv file that had been saved. In short, the datetime format saved in the csv files by Annotations.save() is sometimes not compatible with the datetime format expected when initialising an Annotations object, causing it to default to None.

Did some digging and found no problems in <=v1.6. Seems this was introduced in v1.7 with #12289.

The orig_time saved in the csv file is read fine by read_annotations() as the onset time of the first annotation. This gets passed to an Annotations object, where _handle_meas_date() attempts to convert the orig_time string into a datetime object according to ISO8601 using datetime.strptime (here meas_date=orig_time). If the orig_time string does not conform to this format, a ValueError is raised and orig_time gets set to None (L1006-1011).

mne-python/mne/annotations.py

Lines 997 to 1029 in e4cc4e2

def _handle_meas_date(meas_date):
"""Convert meas_date to datetime or None.
If `meas_date` is a string, it should conform to the ISO8601 format.
More precisely to this '%Y-%m-%d %H:%M:%S.%f' particular case of the
ISO8601 format where the delimiter between date and time is ' '.
Note that ISO8601 allows for ' ' or 'T' as delimiters between date and
time.
"""
if isinstance(meas_date, str):
ACCEPTED_ISO8601 = "%Y-%m-%d %H:%M:%S.%f"
try:
meas_date = datetime.strptime(meas_date, ACCEPTED_ISO8601)
except ValueError:
meas_date = None
else:
meas_date = meas_date.replace(tzinfo=timezone.utc)
elif isinstance(meas_date, tuple):
# old way
meas_date = _stamp_to_dt(meas_date)
if meas_date is not None:
if np.isscalar(meas_date):
# It would be nice just to do:
#
# meas_date = datetime.fromtimestamp(meas_date, timezone.utc)
#
# But Windows does not like timestamps < 0. So we'll use
# our specialized wrapper instead:
meas_date = np.array(np.modf(meas_date)[::-1])
meas_date *= [1, 1e6]
meas_date = _stamp_to_dt(np.round(meas_date))
_check_dt(meas_date) # run checks
return meas_date


What happens since v1.7 is that the datetime format of annotations saved in the csv files can sometimes not conform to the ISO8601 format. Specifically, ISO8601 expects 6 decimal places for the sub-second info (e.g. 2025-02-10 10:50:20.123456), but the times saved in the csv files can have >6 decimal places (e.g. 2025-02-10 10:50:20.123456789). When _handle_meas_date() checks if orig_time complies with ISO8601, the extra decimal places prevent a regex match in datetime.strptime, the ValueError is raised, and orig_time gets incorrectly set to None.

These extra decimal places get written to the csv file because in #12289, Annotations.to_data_frame() was updated to allow for different time formats, introducing a call to mne.utils.dataframe._convert_times() and pd.to_timedelta (L49). If the onset times of annotations have <=6 decimal places, the times returned from pd.to_timedelta will have 6 decimal places, in line with ISO8601. However, if the onset times have >6 decimal places, pd.to_timedelta returns times with >6 decimal places. The dataframe of times is what gets saved to csv.

def _convert_times(times, time_format, meas_date=None, first_time=0):
"""Convert vector of time in seconds to ms, datetime, or timedelta."""
# private function; pandas already checked in calling function
from pandas import to_timedelta
if time_format == "ms":
times = np.round(times * 1e3).astype(np.int64)
elif time_format == "timedelta":
times = to_timedelta(times, unit="s")
elif time_format == "datetime":
times = to_timedelta(times + first_time, unit="s") + meas_date
return times


I can't see a case in the unit tests where onset times have >6 decimals, so it could easily have slipped under the radar.

Below is some code to reproduce this. It runs fine on v1.6, but v1.7+ fails when the number of decimal places in the onset time is >6.

Steps to reproduce

import mne
import numpy as np
from datetime import datetime, timezone

# Create orig_time for annotations
orig_time = datetime.now()
orig_time = orig_time.replace(tzinfo=timezone.utc)


def check_orig_time_roundtrip(onset: float):
    # Create annotations object
    annotations = mne.Annotations(
        onset=[onset], description=["bad"], duration=[1.0], orig_time=orig_time
    )
    # Save and reload annotations
    annotations.save("test_annotations.csv", overwrite=True)
    annotations_read = mne.read_annotations("test_annotations.csv")
    print(f"Time stored in original annotations object : {annotations.orig_time}")
    print(f"Time stored in loaded annotations object   : {annotations_read.orig_time}")
    if annotations_read.orig_time is None:
        raise TypeError(f"Bad onset = {onset}")


onset = 5.0123456789  # >6 decimal places

for n in range(10):
    check_orig_time_roundtrip(onset=np.round(onset, decimals=n))

Link to data

No response

Expected results

Annotations.orig_time should always get read from the csv file and not be assigned None.

Actual results

Annotations.orig_time gets assigned None in v1.7+ when the onset times saved in the csv file have >6 decimal places (causes >6 decimal places to be saved in the onset times in the csv files).

Additional information

What is the best way to handle this?

  • Sanitise the onset times from Annotations.to_data_frame("datetime") to have at most 6 decimals?
  • Only sanitise in dataframes being written to a csv?
  • Sanitise the datetime formats when reading from the csv?
  • Or should orig_time handling in Annotations initialisation be more lenient to >6 decimals? (but then it wouldn't be ISO8601 compliant)

In any case, perhaps a RuntimeWarning could be raised when an orig_time string is set to None to reduce chances of people accidentally misaligning annotations in their data.

@tsbinns tsbinns added the BUG label Feb 10, 2025
@drammock
Copy link
Member

excellent detective work. given that this problem has been around for a couple releases, I think we should assume there are some .csv annotations out there that will have >6 decimals, which makes me lean towards a truncate-on-read approach (rather than truncate-on-write). It might make sense to also add a call to this: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.isoformat.html#pandas.Timestamp.isoformat to _convert_times() after L49.

@tsbinns tsbinns linked a pull request Feb 10, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants