You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Came across this today when annotating some artefacts in data using the current dev branch. Generated the annotations using the raw data browser and saved them, but when reading them back in and applying to the raw data, the annotations had been shifted towards the start of the recording.
Turns out the annotations I'd read in had orig_time=None even though this information was present in the annotations before saving and was present in the csv file that had been saved. In short, the datetime format saved in the csv files by Annotations.save() is sometimes not compatible with the datetime format expected when initialising an Annotations object, causing it to default to None.
Did some digging and found no problems in <=v1.6. Seems this was introduced in v1.7 with #12289.
The orig_time saved in the csv file is read fine by read_annotations() as the onset time of the first annotation. This gets passed to an Annotations object, where _handle_meas_date() attempts to convert the orig_time string into a datetime object according to ISO8601 using datetime.strptime (here meas_date=orig_time). If the orig_time string does not conform to this format, a ValueError is raised and orig_time gets set to None (L1006-1011).
# But Windows does not like timestamps < 0. So we'll use
# our specialized wrapper instead:
meas_date=np.array(np.modf(meas_date)[::-1])
meas_date*= [1, 1e6]
meas_date=_stamp_to_dt(np.round(meas_date))
_check_dt(meas_date) # run checks
returnmeas_date
What happens since v1.7 is that the datetime format of annotations saved in the csv files can sometimes not conform to the ISO8601 format. Specifically, ISO8601 expects 6 decimal places for the sub-second info (e.g. 2025-02-10 10:50:20.123456), but the times saved in the csv files can have >6 decimal places (e.g. 2025-02-10 10:50:20.123456789). When _handle_meas_date() checks if orig_time complies with ISO8601, the extra decimal places prevent a regex match in datetime.strptime, the ValueError is raised, and orig_time gets incorrectly set to None.
These extra decimal places get written to the csv file because in #12289, Annotations.to_data_frame() was updated to allow for different time formats, introducing a call to mne.utils.dataframe._convert_times() and pd.to_timedelta (L49). If the onset times of annotations have <=6 decimal places, the times returned from pd.to_timedelta will have 6 decimal places, in line with ISO8601. However, if the onset times have >6 decimal places, pd.to_timedelta returns times with >6 decimal places. The dataframe of times is what gets saved to csv.
I can't see a case in the unit tests where onset times have >6 decimals, so it could easily have slipped under the radar.
Below is some code to reproduce this. It runs fine on v1.6, but v1.7+ fails when the number of decimal places in the onset time is >6.
Steps to reproduce
importmneimportnumpyasnpfromdatetimeimportdatetime, timezone# Create orig_time for annotationsorig_time=datetime.now()
orig_time=orig_time.replace(tzinfo=timezone.utc)
defcheck_orig_time_roundtrip(onset: float):
# Create annotations objectannotations=mne.Annotations(
onset=[onset], description=["bad"], duration=[1.0], orig_time=orig_time
)
# Save and reload annotationsannotations.save("test_annotations.csv", overwrite=True)
annotations_read=mne.read_annotations("test_annotations.csv")
print(f"Time stored in original annotations object : {annotations.orig_time}")
print(f"Time stored in loaded annotations object : {annotations_read.orig_time}")
ifannotations_read.orig_timeisNone:
raiseTypeError(f"Bad onset = {onset}")
onset=5.0123456789# >6 decimal placesforninrange(10):
check_orig_time_roundtrip(onset=np.round(onset, decimals=n))
Link to data
No response
Expected results
Annotations.orig_time should always get read from the csv file and not be assigned None.
Actual results
Annotations.orig_time gets assigned None in v1.7+ when the onset times saved in the csv file have >6 decimal places (causes >6 decimal places to be saved in the onset times in the csv files).
Additional information
What is the best way to handle this?
Sanitise the onset times from Annotations.to_data_frame("datetime") to have at most 6 decimals?
Only sanitise in dataframes being written to a csv?
Sanitise the datetime formats when reading from the csv?
Or should orig_time handling in Annotations initialisation be more lenient to >6 decimals? (but then it wouldn't be ISO8601 compliant)
In any case, perhaps a RuntimeWarning could be raised when an orig_time string is set to None to reduce chances of people accidentally misaligning annotations in their data.
The text was updated successfully, but these errors were encountered:
excellent detective work. given that this problem has been around for a couple releases, I think we should assume there are some .csv annotations out there that will have >6 decimals, which makes me lean towards a truncate-on-read approach (rather than truncate-on-write). It might make sense to also add a call to this: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.isoformat.html#pandas.Timestamp.isoformat to _convert_times() after L49.
Description of the problem
Came across this today when annotating some artefacts in data using the current dev branch. Generated the annotations using the raw data browser and saved them, but when reading them back in and applying to the raw data, the annotations had been shifted towards the start of the recording.
Turns out the annotations I'd read in had
orig_time=None
even though this information was present in the annotations before saving and was present in the csv file that had been saved. In short, the datetime format saved in the csv files byAnnotations.save()
is sometimes not compatible with the datetime format expected when initialising anAnnotations
object, causing it to default toNone
.Did some digging and found no problems in <=v1.6. Seems this was introduced in v1.7 with #12289.
The
orig_time
saved in the csv file is read fine byread_annotations()
as the onset time of the first annotation. This gets passed to anAnnotations
object, where_handle_meas_date()
attempts to convert theorig_time
string into a datetime object according to ISO8601 usingdatetime.strptime
(heremeas_date=orig_time
). If theorig_time
string does not conform to this format, aValueError
is raised andorig_time
gets set toNone
(L1006-1011).mne-python/mne/annotations.py
Lines 997 to 1029 in e4cc4e2
What happens since v1.7 is that the datetime format of annotations saved in the csv files can sometimes not conform to the ISO8601 format. Specifically, ISO8601 expects 6 decimal places for the sub-second info (e.g.
2025-02-10 10:50:20.123456
), but the times saved in the csv files can have >6 decimal places (e.g.2025-02-10 10:50:20.123456789
). When_handle_meas_date()
checks iforig_time
complies with ISO8601, the extra decimal places prevent a regex match indatetime.strptime
, theValueError
is raised, andorig_time
gets incorrectly set toNone
.These extra decimal places get written to the csv file because in #12289,
Annotations.to_data_frame()
was updated to allow for different time formats, introducing a call tomne.utils.dataframe._convert_times()
andpd.to_timedelta
(L49). If the onset times of annotations have <=6 decimal places, the times returned frompd.to_timedelta
will have 6 decimal places, in line with ISO8601. However, if the onset times have >6 decimal places,pd.to_timedelta
returns times with >6 decimal places. The dataframe of times is what gets saved to csv.mne-python/mne/utils/dataframe.py
Lines 39 to 50 in e4cc4e2
I can't see a case in the unit tests where onset times have >6 decimals, so it could easily have slipped under the radar.
Below is some code to reproduce this. It runs fine on v1.6, but v1.7+ fails when the number of decimal places in the onset time is >6.
Steps to reproduce
Link to data
No response
Expected results
Annotations.orig_time
should always get read from the csv file and not be assignedNone
.Actual results
Annotations.orig_time
gets assignedNone
in v1.7+ when the onset times saved in the csv file have >6 decimal places (causes >6 decimal places to be saved in the onset times in the csv files).Additional information
What is the best way to handle this?
Annotations.to_data_frame("datetime")
to have at most 6 decimals?orig_time
handling inAnnotations
initialisation be more lenient to >6 decimals? (but then it wouldn't be ISO8601 compliant)In any case, perhaps a
RuntimeWarning
could be raised when anorig_time
string is set toNone
to reduce chances of people accidentally misaligning annotations in their data.The text was updated successfully, but these errors were encountered: