`Annotations.orig_time` not properly read from csv files #13108

tsbinns · 2025-02-10T14:50:12Z

Description of the problem

Came across this today when annotating some artefacts in data using the current dev branch. Generated the annotations using the raw data browser and saved them, but when reading them back in and applying to the raw data, the annotations had been shifted towards the start of the recording.

Turns out the annotations I'd read in had orig_time=None even though this information was present in the annotations before saving and was present in the csv file that had been saved. In short, the datetime format saved in the csv files by Annotations.save() is sometimes not compatible with the datetime format expected when initialising an Annotations object, causing it to default to None.

Did some digging and found no problems in <=v1.6. Seems this was introduced in v1.7 with #12289.

The orig_time saved in the csv file is read fine by read_annotations() as the onset time of the first annotation. This gets passed to an Annotations object, where _handle_meas_date() attempts to convert the orig_time string into a datetime object according to ISO8601 using datetime.strptime (here meas_date=orig_time). If the orig_time string does not conform to this format, a ValueError is raised and orig_time gets set to None (L1006-1011).

mne-python/mne/annotations.py

Lines 997 to 1029 in e4cc4e2

    
           def _handle_meas_date(meas_date): 
        
               """Convert meas_date to datetime or None. 
        
               If `meas_date` is a string, it should conform to the ISO8601 format. 
        
               More precisely to this '%Y-%m-%d %H:%M:%S.%f' particular case of the 
        
               ISO8601 format where the delimiter between date and time is ' '. 
        
               Note that ISO8601 allows for ' ' or 'T' as delimiters between date and 
        
               time. 
        
               """ 
        
               if isinstance(meas_date, str): 
        
                   ACCEPTED_ISO8601 = "%Y-%m-%d %H:%M:%S.%f" 
        
                   try: 
        
                       meas_date = datetime.strptime(meas_date, ACCEPTED_ISO8601) 
        
                   except ValueError: 
        
                       meas_date = None 
        
                   else: 
        
                       meas_date = meas_date.replace(tzinfo=timezone.utc) 
        
               elif isinstance(meas_date, tuple): 
        
                   # old way 
        
                   meas_date = _stamp_to_dt(meas_date) 
        
               if meas_date is not None: 
        
                   if np.isscalar(meas_date): 
        
                       # It would be nice just to do: 
        
                       # 
        
                       #     meas_date = datetime.fromtimestamp(meas_date, timezone.utc) 
        
                       # 
        
                       # But Windows does not like timestamps < 0. So we'll use 
        
                       # our specialized wrapper instead: 
        
                       meas_date = np.array(np.modf(meas_date)[::-1]) 
        
                       meas_date *= [1, 1e6] 
        
                       meas_date = _stamp_to_dt(np.round(meas_date)) 
        
                   _check_dt(meas_date)  # run checks 
        
               return meas_date

What happens since v1.7 is that the datetime format of annotations saved in the csv files can sometimes not conform to the ISO8601 format. Specifically, ISO8601 expects 6 decimal places for the sub-second info (e.g. 2025-02-10 10:50:20.123456), but the times saved in the csv files can have >6 decimal places (e.g. 2025-02-10 10:50:20.123456789). When _handle_meas_date() checks if orig_time complies with ISO8601, the extra decimal places prevent a regex match in datetime.strptime, the ValueError is raised, and orig_time gets incorrectly set to None.

These extra decimal places get written to the csv file because in #12289, Annotations.to_data_frame() was updated to allow for different time formats, introducing a call to mne.utils.dataframe._convert_times() and pd.to_timedelta (L49). If the onset times of annotations have <=6 decimal places, the times returned from pd.to_timedelta will have 6 decimal places, in line with ISO8601. However, if the onset times have >6 decimal places, pd.to_timedelta returns times with >6 decimal places. The dataframe of times is what gets saved to csv.

mne-python/mne/utils/dataframe.py

Lines 39 to 50 in e4cc4e2

    
           def _convert_times(times, time_format, meas_date=None, first_time=0): 
        
               """Convert vector of time in seconds to ms, datetime, or timedelta.""" 
        
               # private function; pandas already checked in calling function 
        
               from pandas import to_timedelta 
        
               if time_format == "ms": 
        
                   times = np.round(times * 1e3).astype(np.int64) 
        
               elif time_format == "timedelta": 
        
                   times = to_timedelta(times, unit="s") 
        
               elif time_format == "datetime": 
        
                   times = to_timedelta(times + first_time, unit="s") + meas_date 
        
               return times

I can't see a case in the unit tests where onset times have >6 decimals, so it could easily have slipped under the radar.

Below is some code to reproduce this. It runs fine on v1.6, but v1.7+ fails when the number of decimal places in the onset time is >6.

Steps to reproduce

import mne
import numpy as np
from datetime import datetime, timezone

# Create orig_time for annotations
orig_time = datetime.now()
orig_time = orig_time.replace(tzinfo=timezone.utc)


def check_orig_time_roundtrip(onset: float):
    # Create annotations object
    annotations = mne.Annotations(
        onset=[onset], description=["bad"], duration=[1.0], orig_time=orig_time
    )
    # Save and reload annotations
    annotations.save("test_annotations.csv", overwrite=True)
    annotations_read = mne.read_annotations("test_annotations.csv")
    print(f"Time stored in original annotations object : {annotations.orig_time}")
    print(f"Time stored in loaded annotations object   : {annotations_read.orig_time}")
    if annotations_read.orig_time is None:
        raise TypeError(f"Bad onset = {onset}")


onset = 5.0123456789  # >6 decimal places

for n in range(10):
    check_orig_time_roundtrip(onset=np.round(onset, decimals=n))

Link to data

No response

Expected results

Annotations.orig_time should always get read from the csv file and not be assigned None.

Actual results

Annotations.orig_time gets assigned None in v1.7+ when the onset times saved in the csv file have >6 decimal places (causes >6 decimal places to be saved in the onset times in the csv files).

Additional information

What is the best way to handle this?

Sanitise the onset times from Annotations.to_data_frame("datetime") to have at most 6 decimals?
Only sanitise in dataframes being written to a csv?
Sanitise the datetime formats when reading from the csv?
Or should orig_time handling in Annotations initialisation be more lenient to >6 decimals? (but then it wouldn't be ISO8601 compliant)

In any case, perhaps a RuntimeWarning could be raised when an orig_time string is set to None to reduce chances of people accidentally misaligning annotations in their data.

The text was updated successfully, but these errors were encountered:

drammock · 2025-02-10T15:14:34Z

excellent detective work. given that this problem has been around for a couple releases, I think we should assume there are some .csv annotations out there that will have >6 decimals, which makes me lean towards a truncate-on-read approach (rather than truncate-on-write). It might make sense to also add a call to this: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.isoformat.html#pandas.Timestamp.isoformat to _convert_times() after L49.

tsbinns added the BUG label Feb 10, 2025

tsbinns linked a pull request Feb 10, 2025 that will close this issue

Fix Annotations time formats #13109

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Annotations.orig_time` not properly read from csv files #13108

`Annotations.orig_time` not properly read from csv files #13108

tsbinns commented Feb 10, 2025 •

edited

Loading

drammock commented Feb 10, 2025

Annotations.orig_time not properly read from csv files #13108

Annotations.orig_time not properly read from csv files #13108

Comments

tsbinns commented Feb 10, 2025 • edited Loading

Description of the problem

Steps to reproduce

Link to data

Expected results

Actual results

Additional information

drammock commented Feb 10, 2025

`Annotations.orig_time` not properly read from csv files #13108

`Annotations.orig_time` not properly read from csv files #13108

tsbinns commented Feb 10, 2025 •

edited

Loading