Skip to content

Commit e7aab2b

Browse files
msgpack: support tzindex in datetime
Support non-zero tzindex in datetime extended type. If both tzoffset and tzindex are specified, tzindex is prior (same as in Tarantool [1]). pytz [2] is used to build timezone info. Tarantool index to Olson name map and inverted one are built with gen_timezones.sh script based on tarantool/go-tarantool script [3]. All Tarantool unique and alias timezones presents in pytz.all_timezones list. Only the following abrreviated timezones from Tarantool presents in pytz.all_timezones (version 2022.2.1): - CET - EET - EST - GMT - HST - MST - UTC - WET pytz does not natively support work with abbreviated timezones due to its possibly ambiguous nature [4-6]. Tarantool itself do not support work with ambiguous abbreviated timezones: ``` Tarantool 2.10.1-0-g482d91c66 tarantool> datetime.new({tz = 'BST'}) --- - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone' ... ``` If ambiguous timezone is specified, the exception is raised. Tarantool header timezones.h [7] provides a map for all abbreviated timezones with category info (all ambiguous timezones are marked with TZ_AMBIGUOUS flag) and offset info. We parse this info to build pytz.fixedOffset() timezone for each Tarantool abbreviated timezone not supported natively by pytz. Since we explicitly store tarantool_tzindex, no info is lost on msgpack convert. Tarantool does not know of the following pytz version 2022.2.1 timezones: - CST6CDT - EST5EDT - Etc/GMT+1 - Etc/GMT+10 - Etc/GMT+11 - Etc/GMT+12 - Etc/GMT+2 - Etc/GMT+3 - Etc/GMT+4 - Etc/GMT+5 - Etc/GMT+6 - Etc/GMT+7 - Etc/GMT+8 - Etc/GMT+9 - Etc/GMT-1 - Etc/GMT-10 - Etc/GMT-11 - Etc/GMT-12 - Etc/GMT-13 - Etc/GMT-14 - Etc/GMT-2 - Etc/GMT-3 - Etc/GMT-4 - Etc/GMT-5 - Etc/GMT-6 - Etc/GMT-7 - Etc/GMT-8 - Etc/GMT-9 - Europe/Kyiv - MET - MST7MDT - PST8PDT It is some utility timezones or new synonyms. For each timezone not supported by Tarantool, we use tzoffset data from pytz object info instead. The warning is raised in this case. 1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/ 2. https://pypi.org/project/pytz/ 3. https://github.com/tarantool/go-tarantool/blob/5801dc6f5ce69db7c8bc0c0d0fe4fb6042d5ecbc/datetime/gen-timezones.sh 4. https://stackoverflow.com/questions/37109945/how-to-use-abbreviated-timezone-namepst-ist-in-pytz 5. https://stackoverflow.com/questions/27531718/datetime-timezone-conversion-using-pytz 6. https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset 7. https://github.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
1 parent 7c5ef89 commit e7aab2b

File tree

7 files changed

+2194
-13
lines changed

7 files changed

+2194
-13
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111
- UUID type support (#202).
1212
- Datetime type support and tarantool.Datetime type (#204).
1313
- Offset in datetime type support (#204).
14+
- Timezone in datetime type support (#204).
1415

1516
### Changed
1617
- Bump msgpack requirement to 1.0.4 (PR #223).

tarantool/msgpack_ext/types/datetime.py

+138-13
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
import pandas
22
import pytz
33

4+
import tarantool.msgpack_ext.types.timezones as tt_timezones
5+
from tarantool.error import MsgpackError, MsgpackWarning, warn
6+
47
# https://www.tarantool.io/ru/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type
58
#
69
# The datetime MessagePack representation looks like this:
@@ -43,13 +46,6 @@
4346
SEC_IN_MIN = 60
4447
MIN_IN_DAY = 60 * 24
4548

46-
def compute_offset(dt):
47-
if dt.tz is None:
48-
return 0
49-
50-
utc_offset = dt.tz.utcoffset(dt)
51-
# There is no precision loss since pytz.FixedOffset is in minutes
52-
return utc_offset.days * MIN_IN_DAY + utc_offset.seconds // SEC_IN_MIN
5349

5450
def get_bytes_as_int(data, cursor, size):
5551
part = data[cursor:cursor + size]
@@ -58,6 +54,118 @@ def get_bytes_as_int(data, cursor, size):
5854
def get_int_as_bytes(data, size):
5955
return data.to_bytes(size, byteorder=BYTEORDER, signed=True)
6056

57+
def compute_offset(dt):
58+
utc_offset = dt.tz.utcoffset(dt)
59+
# There is no precision loss since pytz.FixedOffset is in minutes
60+
return utc_offset.days * MIN_IN_DAY + utc_offset.seconds // SEC_IN_MIN
61+
62+
def get_tz_as_offset(dt, tarantool_tz=None):
63+
tzoffset = compute_offset(dt)
64+
tzindex = 0
65+
if tarantool_tz is not None:
66+
tzindex = tt_timezones.timezoneToIndex[tarantool_tz]
67+
return tzoffset, tzindex
68+
69+
def get_tarantool_timezone(dt):
70+
# Tarantool 2.10 (commit 9ee45289e01232b8df1413efea11db170ae3b3b4)
71+
# do not support the following pytz (version 2022.2.1) timezones
72+
# - CST6CDT
73+
# - EST5EDT
74+
# - Etc/GMT+1
75+
# - Etc/GMT+10
76+
# - Etc/GMT+11
77+
# - Etc/GMT+12
78+
# - Etc/GMT+2
79+
# - Etc/GMT+3
80+
# - Etc/GMT+4
81+
# - Etc/GMT+5
82+
# - Etc/GMT+6
83+
# - Etc/GMT+7
84+
# - Etc/GMT+8
85+
# - Etc/GMT+9
86+
# - Etc/GMT-1
87+
# - Etc/GMT-10
88+
# - Etc/GMT-11
89+
# - Etc/GMT-12
90+
# - Etc/GMT-13
91+
# - Etc/GMT-14
92+
# - Etc/GMT-2
93+
# - Etc/GMT-3
94+
# - Etc/GMT-4
95+
# - Etc/GMT-5
96+
# - Etc/GMT-6
97+
# - Etc/GMT-7
98+
# - Etc/GMT-8
99+
# - Etc/GMT-9
100+
# - Europe/Kyiv
101+
# - MET
102+
# - MST7MDT
103+
# - PST8PDT
104+
#
105+
# They are transformed to tzoffset based on pytz info.
106+
tzoffset = compute_offset(dt)
107+
108+
if not dt.tz.zone in tt_timezones.timezoneToIndex:
109+
warn(f'pytz timezone {dt.tz} is not supported by Tarantool, '
110+
f'using tzoffset={tzoffset} instead', MsgpackWarning)
111+
112+
return tzoffset, 0
113+
114+
return tzoffset, tt_timezones.timezoneToIndex[dt.tz.zone]
115+
116+
def get_tarantool_tz_data(dt, tarantool_tz=None):
117+
if dt.tz is None:
118+
return 0, 0
119+
120+
if dt.tz.zone is not None:
121+
return get_tarantool_timezone(dt)
122+
else:
123+
return get_tz_as_offset(dt, tarantool_tz)
124+
125+
def is_ambiguous_tz(tt_tzinfo):
126+
return (tt_tzinfo['category'] & tt_timezones.TZ_AMBIGUOUS) != 0
127+
128+
def get_pytz_timezone(tzindex=None, tzname=None):
129+
# https://raw.githubusercontent.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
130+
#
131+
# There are several possible timezone types in Tarantool.
132+
# Abbreviated timezones are a bit tricky since they could be ambiguous.
133+
# Tarantool itself do not support creating datetime with ambiguous timezones:
134+
#
135+
# Tarantool 2.10.1-0-g482d91c66
136+
#
137+
# tarantool> datetime.new({tz = 'BST'})
138+
# ---
139+
# - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone'
140+
# ...
141+
#
142+
# pytz version 2022.2.1 do not support most of Tarantool abbreviated timezones
143+
# (except for CET, EET EST, GMT, HST, MST, UTC, WET). Since Tarantool sources
144+
# provide offset info for abbreviated timezones, we use pytz.FixedOffset instead.
145+
#
146+
# https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset
147+
if tzname is not None:
148+
if tzname not in tt_timezones.timezoneToIndex:
149+
raise ValueError(f'Unknown Tarantool timezone "{tzname}"')
150+
elif tzindex is not None:
151+
if tzindex not in tt_timezones.indexToTimezone:
152+
raise MsgpackError(f'Unknown tzindex {tzindex}')
153+
tzname = tt_timezones.indexToTimezone[tzindex]
154+
else:
155+
raise ValueError('Pass tzindex or tzname')
156+
157+
try:
158+
tzinfo = pytz.timezone(tzname)
159+
except pytz.exceptions.UnknownTimeZoneError:
160+
tt_tzinfo = tt_timezones.timezoneAbbrevInfo[tzname]
161+
162+
if is_ambiguous_tz(tt_tzinfo):
163+
raise MsgpackError(f'Failed to decode datetime {tzname} with ambiguous timezone')
164+
165+
tzinfo = pytz.FixedOffset(tt_tzinfo['offset'])
166+
167+
return tzinfo
168+
61169
def msgpack_decode(data):
62170
cursor = 0
63171
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES)
@@ -74,7 +182,8 @@ def msgpack_decode(data):
74182
total_nsec = seconds * NSEC_IN_SEC + nsec
75183

76184
if (tzindex != 0):
77-
raise NotImplementedError
185+
tzinfo = get_pytz_timezone(tzindex=tzindex)
186+
dt = pandas.to_datetime(total_nsec, unit='ns').replace(tzinfo=pytz.utc).tz_convert(tzinfo)
78187
elif (tzoffset != 0):
79188
tzinfo = pytz.FixedOffset(tzoffset)
80189
dt = pandas.to_datetime(total_nsec, unit='ns').replace(tzinfo=pytz.utc).tz_convert(tzinfo)
@@ -85,33 +194,39 @@ def msgpack_decode(data):
85194
return dt, tzoffset, tzindex
86195

87196
class Datetime(pandas.Timestamp):
88-
def __new__(cls, *args, **kwargs):
197+
def __new__(cls, *args, tarantool_tz=None, **kwargs):
89198
dt = None
90199
if len(args) > 0:
91200
if isinstance(args[0], bytes):
92201
dt, tzoffset, tzindex = msgpack_decode(args[0])
93202
elif isinstance(args[0], Datetime):
94203
dt = pandas.Timestamp.__new__(cls, *args, **kwargs)
95204
tzoffset = args[0].tarantool_tzoffset
205+
tzindex = args[0].tarantool_tzindex
96206

97207
if dt is None:
98208
dt = super().__new__(cls, *args, **kwargs)
99-
tzoffset = compute_offset(dt)
209+
tzoffset, tzindex = get_tarantool_tz_data(dt)
210+
211+
if tarantool_tz is not None:
212+
tzinfo = get_pytz_timezone(tzname=tarantool_tz)
213+
dt = pandas.Timestamp.replace(dt, tzinfo=tzinfo)
214+
tzoffset, tzindex = get_tarantool_tz_data(dt, tarantool_tz)
100215

101216
dt.__class__ = cls
102217
dt.tarantool_tzoffset = tzoffset
218+
dt.tarantool_tzindex = tzindex
103219
return dt
104220

105221
def msgpack_encode(self):
106222
seconds = self.value // NSEC_IN_SEC
107223
nsec = self.value % NSEC_IN_SEC
108-
tzoffset = 0
109-
tzindex = 0
110224

111225
if isinstance(self, Datetime):
112226
tzoffset = self.tarantool_tzoffset
227+
tzindex = self.tarantool_tzindex
113228
else:
114-
tzoffset = compute_offset(self)
229+
tzoffset, tzindex = get_tarantool_tz_data(self)
115230

116231
buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES)
117232

@@ -137,3 +252,13 @@ def tz_convert(self, *args, **kwargs):
137252
def tz_localize(self, *args, **kwargs):
138253
dt = super().tz_localize(*args, **kwargs)
139254
return Datetime(dt)
255+
256+
def tarantool_tz_convert(self, tarantool_tz):
257+
tzinfo = get_pytz_timezone(tzname=tarantool_tz)
258+
dt = super().tz_convert(tzinfo)
259+
return Datetime(dt, tarantool_tz=tarantool_tz)
260+
261+
def tarantool_tz_localize(self, tarantool_tz):
262+
tzinfo = get_pytz_timezone(tzname=tarantool_tz)
263+
dt = super().tz_localize(tzinfo)
264+
return Datetime(dt, tarantool_tz=tarantool_tz)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
from tarantool.msgpack_ext.types.timezones.timezones import (
2+
TZ_AMBIGUOUS,
3+
indexToTimezone,
4+
timezoneToIndex,
5+
timezoneAbbrevInfo,
6+
)
7+
8+
__all__ = ['TZ_AMBIGUOUS', 'indexToTimezone', 'timezoneToIndex',
9+
'timezoneAbbrevInfo']
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
#!/usr/bin/env bash
2+
set -xeuo pipefail
3+
4+
SRC_COMMIT="9ee45289e01232b8df1413efea11db170ae3b3b4"
5+
SRC_FILE=timezones.h
6+
DST_FILE=timezones.py
7+
8+
[ -e ${SRC_FILE} ] && rm ${SRC_FILE}
9+
wget -O ${SRC_FILE} \
10+
https://raw.githubusercontent.com/tarantool/tarantool/${SRC_COMMIT}/src/lib/tzcode/timezones.h
11+
12+
# We don't need aliases in indexToTimezone because Tarantool always replace it:
13+
#
14+
# tarantool> T = date.parse '2022-01-01T00:00 Pacific/Enderbury'
15+
# ---
16+
# ...
17+
# tarantool> T
18+
# ---
19+
# - 2022-01-01T00:00:00 Pacific/Kanton
20+
# ...
21+
#
22+
# So we can do the same and don't worry, be happy.
23+
24+
cat <<EOF > ${DST_FILE}
25+
# Automatically generated by gen-timezones.sh
26+
27+
TZ_UTC = 0x01
28+
TZ_RFC = 0x02
29+
TZ_MILITARY = 0x04
30+
TZ_AMBIGUOUS = 0x08
31+
TZ_NYI = 0x10
32+
TZ_OLSON = 0x20
33+
TZ_ALIAS = 0x40
34+
TZ_DST = 0x80
35+
36+
indexToTimezone = {
37+
EOF
38+
39+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
40+
| awk '{printf("\t%s : %s,\n", $1, $3)}' >> ${DST_FILE}
41+
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
42+
| awk '{printf("\t%s : %s,\n", $1, $2)}' >> ${DST_FILE}
43+
44+
cat <<EOF >> ${DST_FILE}
45+
}
46+
47+
timezoneToIndex = {
48+
EOF
49+
50+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
51+
| awk '{printf("\t%s : %s,\n", $3, $1)}' >> ${DST_FILE}
52+
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
53+
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
54+
grep ZONE_ALIAS ${SRC_FILE} | sed "s/ZONE_ALIAS( *//g" | sed "s/[),]//g" \
55+
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
56+
57+
cat <<EOF >> ${DST_FILE}
58+
}
59+
60+
timezoneAbbrevInfo = {
61+
EOF
62+
63+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
64+
| awk '{printf("\t%s : {\"offset\" : %d, \"category\" : %s},\n", $3, $2, $4)}' >> ${DST_FILE}
65+
echo "}" >> ${DST_FILE}
66+
67+
rm timezones.h
68+
69+
python validate_timezones.py

0 commit comments

Comments
 (0)