Skip to content

Commit 0bee71e

Browse files
msgpack: support tzindex in datetime
Support non-zero tzindex in datetime extended type. If both tzoffset and tzindex are specified, tzindex is prior (same as in Tarantool [1]). pytz [2] is used to build timezone info. Tarantool index to Olson name map and inverted one are built with gen_timezones.sh script based on tarantool/go-tarantool script [3]. All Tarantool unique and alias timezones presents in pytz.all_timezones list. Only the following abrreviated timezones from Tarantool presents in pytz.all_timezones (version 2022.2.1): - CET - EET - EST - GMT - HST - MST - UTC - WET pytz does not natively support work with abbreviated timezones due to its possibly ambiguous nature [4-6]. Tarantool itself do not support work with ambiguous abbreviated timezones: ``` Tarantool 2.10.1-0-g482d91c66 tarantool> datetime.new({tz = 'BST'}) --- - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone' ... ``` If ambiguous timezone is specified, the exception is raised. Tarantool header timezones.h [7] provides a map for all abbreviated timezones with category info (all ambiguous timezones are marked with TZ_AMBIGUOUS flag) and offset info. We parse this info to build pytz.fixedOffset() timezone for each Tarantool abbreviated timezone not supported natively by pytz. The warning is raised in this case. Tarantool does not know of the following pytz version 2022.2.1 timezones: - CST6CDT - EST5EDT - Etc/GMT+1 - Etc/GMT+10 - Etc/GMT+11 - Etc/GMT+12 - Etc/GMT+2 - Etc/GMT+3 - Etc/GMT+4 - Etc/GMT+5 - Etc/GMT+6 - Etc/GMT+7 - Etc/GMT+8 - Etc/GMT+9 - Etc/GMT-1 - Etc/GMT-10 - Etc/GMT-11 - Etc/GMT-12 - Etc/GMT-13 - Etc/GMT-14 - Etc/GMT-2 - Etc/GMT-3 - Etc/GMT-4 - Etc/GMT-5 - Etc/GMT-6 - Etc/GMT-7 - Etc/GMT-8 - Etc/GMT-9 - Europe/Kyiv - MET - MST7MDT - PST8PDT It is some utility timezones or new synonyms. For each timezone not supported by Tarantool, we use tzoffset data from pytz object info instead. The warning is raised in this case. 1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/ 2. https://pypi.org/project/pytz/ 3. https://github.com/tarantool/go-tarantool/blob/5801dc6f5ce69db7c8bc0c0d0fe4fb6042d5ecbc/datetime/gen-timezones.sh 4. https://stackoverflow.com/questions/37109945/how-to-use-abbreviated-timezone-namepst-ist-in-pytz 5. https://stackoverflow.com/questions/27531718/datetime-timezone-conversion-using-pytz 6. https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset 7. https://github.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
1 parent 2e695b7 commit 0bee71e

File tree

5 files changed

+2093
-5
lines changed

5 files changed

+2093
-5
lines changed

tarantool/msgpack_ext_types/datetime.py

+100-5
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,9 @@
33
import pandas
44
import pytz
55

6+
import tarantool.msgpack_ext_types.datetime_timezones.timezones as tt_timezones
7+
from tarantool.error import MsgpackError, MsgpackWarning, warn
8+
69
# https://www.tarantool.io/ru/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type
710
#
811
# The datetime MessagePack representation looks like this:
@@ -55,6 +58,59 @@
5558
def get_int_as_bytes(data, size):
5659
return data.to_bytes(size, byteorder=BYTEORDER, signed=True)
5760

61+
def get_tz_offset(obj):
62+
utc_offset = obj.tz.utcoffset(obj)
63+
# There is no precision loss since pytz.FixedOffset is in minutes
64+
return utc_offset.days * MIN_IN_DAY + utc_offset.seconds // SEC_IN_MIN
65+
66+
def get_tt_timezone(obj):
67+
# Tarantool 2.10 (commit 9ee45289e01232b8df1413efea11db170ae3b3b4)
68+
# do not support the following pytz (version 2022.2.1) timezones
69+
# - CST6CDT
70+
# - EST5EDT
71+
# - Etc/GMT+1
72+
# - Etc/GMT+10
73+
# - Etc/GMT+11
74+
# - Etc/GMT+12
75+
# - Etc/GMT+2
76+
# - Etc/GMT+3
77+
# - Etc/GMT+4
78+
# - Etc/GMT+5
79+
# - Etc/GMT+6
80+
# - Etc/GMT+7
81+
# - Etc/GMT+8
82+
# - Etc/GMT+9
83+
# - Etc/GMT-1
84+
# - Etc/GMT-10
85+
# - Etc/GMT-11
86+
# - Etc/GMT-12
87+
# - Etc/GMT-13
88+
# - Etc/GMT-14
89+
# - Etc/GMT-2
90+
# - Etc/GMT-3
91+
# - Etc/GMT-4
92+
# - Etc/GMT-5
93+
# - Etc/GMT-6
94+
# - Etc/GMT-7
95+
# - Etc/GMT-8
96+
# - Etc/GMT-9
97+
# - Europe/Kyiv
98+
# - MET
99+
# - MST7MDT
100+
# - PST8PDT
101+
#
102+
# They are transformed to tzoffset based on pytz info.
103+
104+
tzoffset = get_tz_offset(obj)
105+
106+
if not obj.tz.zone in tt_timezones.timezoneToIndex:
107+
warn(f'pytz timezone {obj.tz} is not supported by Tarantool, '
108+
f'using tzoffset={tzoffset} instead', MsgpackWarning)
109+
110+
return 0, tzoffset
111+
112+
return tt_timezones.timezoneToIndex[obj.tz.zone], tzoffset
113+
58114
def encode(obj):
59115
seconds = obj.value // NSEC_IN_SEC
60116
nsec = obj.value % NSEC_IN_SEC
@@ -64,11 +120,9 @@ def encode(obj):
64120

65121
if obj.tz is not None:
66122
if obj.tz.zone is not None:
67-
raise NotImplementedError
123+
tzindex, tzoffset = get_tt_timezone(obj)
68124
else:
69-
utc_offset = obj.tz.utcoffset(0)
70-
# There is no precision loss since pytz.FixedOffset is in minutes
71-
tzoffset = utc_offset.days * MIN_IN_DAY + utc_offset.seconds // SEC_IN_MIN
125+
tzoffset = get_tz_offset(obj)
72126

73127
bytes_buffer = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES)
74128

@@ -83,6 +137,46 @@ def get_bytes_as_int(data, cursor, size):
83137
part = data[cursor:cursor + size]
84138
return int.from_bytes(part, BYTEORDER, signed=True), cursor + size
85139

140+
def is_ambiguous_tz(tt_tzinfo):
141+
return (tt_tzinfo['category'] & tt_timezones.TZ_AMBIGUOUS) != 0
142+
143+
def get_pytz_timezone(tzindex):
144+
# https://raw.githubusercontent.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
145+
#
146+
# There are several possible timezone types in Tarantool.
147+
# Abbreviated timezones are a bit tricky since they could be ambiguous.
148+
# Tarantool itself do not support creating datetime with ambiguous timezones:
149+
#
150+
# Tarantool 2.10.1-0-g482d91c66
151+
#
152+
# tarantool> datetime.new({tz = 'BST'})
153+
# ---
154+
# - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone'
155+
# ...
156+
#
157+
# pytz version 2022.2.1 do not support most of Tarantool abbreviated timezones
158+
# (except for CET, EET EST, GMT, HST, MST, UTC, WET). Since Tarantool sources
159+
# provide offset info for abbreviated timezones, we use pytz.FixedOffset instead.
160+
#
161+
# https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset
162+
if not tzindex in tt_timezones.indexToTimezone:
163+
raise MsgpackError(f'Unknown tzindex {tzindex}')
164+
tzname = tt_timezones.indexToTimezone[tzindex]
165+
166+
try:
167+
tzinfo = pytz.timezone(tzname)
168+
except pytz.exceptions.UnknownTimeZoneError:
169+
tt_tzinfo = tt_timezones.timezoneAbbrevInfo[tzname]
170+
171+
if is_ambiguous_tz(tt_tzinfo):
172+
raise MsgpackError(f'Failed to decode datetime {tzname} with ambiguous timezone')
173+
174+
tzinfo = pytz.FixedOffset(tt_tzinfo['offset'])
175+
warn(f'Abbreviated timezone {tzname} is not supported by pytz, '
176+
f'using {tzinfo} instead', MsgpackWarning)
177+
178+
return tzinfo
179+
86180
def decode(data):
87181
cursor = 0
88182
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES)
@@ -100,7 +194,8 @@ def decode(data):
100194

101195
tzinfo = None
102196
if (tzindex != 0):
103-
raise NotImplementedError
197+
tzinfo = get_pytz_timezone(tzindex)
198+
return pandas.to_datetime(total_nsec, unit='ns').replace(tzinfo=pytz.utc).tz_convert(tzinfo)
104199
elif (tzoffset != 0):
105200
tzinfo = pytz.FixedOffset(tzoffset)
106201
return pandas.to_datetime(total_nsec, unit='ns').replace(tzinfo=pytz.utc).tz_convert(tzinfo)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
#!/usr/bin/env bash
2+
set -xeuo pipefail
3+
4+
SRC_COMMIT="9ee45289e01232b8df1413efea11db170ae3b3b4"
5+
SRC_FILE=timezones.h
6+
DST_FILE=timezones.py
7+
8+
[ -e ${SRC_FILE} ] && rm ${SRC_FILE}
9+
wget -O ${SRC_FILE} \
10+
https://raw.githubusercontent.com/tarantool/tarantool/${SRC_COMMIT}/src/lib/tzcode/timezones.h
11+
12+
# We don't need aliases in indexToTimezone because Tarantool always replace it:
13+
#
14+
# tarantool> T = date.parse '2022-01-01T00:00 Pacific/Enderbury'
15+
# ---
16+
# ...
17+
# tarantool> T
18+
# ---
19+
# - 2022-01-01T00:00:00 Pacific/Kanton
20+
# ...
21+
#
22+
# So we can do the same and don't worry, be happy.
23+
24+
cat <<EOF > ${DST_FILE}
25+
# Automatically generated by gen-timezones.sh
26+
27+
TZ_UTC = 0x01
28+
TZ_RFC = 0x02
29+
TZ_MILITARY = 0x04
30+
TZ_AMBIGUOUS = 0x08
31+
TZ_NYI = 0x10
32+
TZ_OLSON = 0x20
33+
TZ_ALIAS = 0x40
34+
TZ_DST = 0x80
35+
36+
indexToTimezone = {
37+
EOF
38+
39+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
40+
| awk '{printf("\t%s : %s,\n", $1, $3)}' >> ${DST_FILE}
41+
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
42+
| awk '{printf("\t%s : %s,\n", $1, $2)}' >> ${DST_FILE}
43+
44+
cat <<EOF >> ${DST_FILE}
45+
}
46+
47+
timezoneToIndex = {
48+
EOF
49+
50+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
51+
| awk '{printf("\t%s : %s,\n", $3, $1)}' >> ${DST_FILE}
52+
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
53+
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
54+
grep ZONE_ALIAS ${SRC_FILE} | sed "s/ZONE_ALIAS( *//g" | sed "s/[),]//g" \
55+
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
56+
57+
cat <<EOF >> ${DST_FILE}
58+
}
59+
60+
timezoneAbbrevInfo = {
61+
EOF
62+
63+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
64+
| awk '{printf("\t%s : {\"offset\" : %d, \"category\" : %s},\n", $3, $2, $4)}' >> ${DST_FILE}
65+
echo "}" >> ${DST_FILE}
66+
67+
rm timezones.h
68+
69+
python validate.py

0 commit comments

Comments
 (0)