Skip to content

Commit 7d847c8

Browse files
msgpack: support tzindex in datetime
Support non-zero tzindex in datetime extended type. If both tzoffset and tzindex are specified, tzindex is prior (same as in Tarantool [1]). pytz [2] is used to build timezone info. Tarantool index to Olson name map and inverted one are built with gen_timezones.sh script based on tarantool/go-tarantool script [3]. All Tarantool unique and alias timezones presents in pytz.all_timezones list. Only the following abrreviated timezones from Tarantool presents in pytz.all_timezones (version 2022.2.1): - CET - EET - EST - GMT - HST - MST - UTC - WET pytz does not natively support work with abbreviated timezones due to its possibly ambiguous nature [4-6]. Tarantool itself do not support work with ambiguous abbreviated timezones: ``` Tarantool 2.10.1-0-g482d91c66 tarantool> datetime.new({tz = 'BST'}) --- - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone' ... ``` If ambiguous timezone is specified, the exception is raised. Tarantool header timezones.h [7] provides a map for all abbreviated timezones with category info (all ambiguous timezones are marked with TZ_AMBIGUOUS flag) and offset info. We parse this info to build pytz.fixedOffset() timezone for each Tarantool abbreviated timezone not supported natively by pytz. Since we explicitly store tarantool_tzindex, no info is lost on msgpack convert. Tarantool does not know of the following pytz version 2022.2.1 timezones: - CST6CDT - EST5EDT - Etc/GMT+1 - Etc/GMT+10 - Etc/GMT+11 - Etc/GMT+12 - Etc/GMT+2 - Etc/GMT+3 - Etc/GMT+4 - Etc/GMT+5 - Etc/GMT+6 - Etc/GMT+7 - Etc/GMT+8 - Etc/GMT+9 - Etc/GMT-1 - Etc/GMT-10 - Etc/GMT-11 - Etc/GMT-12 - Etc/GMT-13 - Etc/GMT-14 - Etc/GMT-2 - Etc/GMT-3 - Etc/GMT-4 - Etc/GMT-5 - Etc/GMT-6 - Etc/GMT-7 - Etc/GMT-8 - Etc/GMT-9 - Europe/Kyiv - MET - MST7MDT - PST8PDT It is some utility timezones or new synonyms. For each timezone not supported by Tarantool, we use tzoffset data from pytz object info instead. The warning is raised in this case. 1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/ 2. https://pypi.org/project/pytz/ 3. https://github.com/tarantool/go-tarantool/blob/5801dc6f5ce69db7c8bc0c0d0fe4fb6042d5ecbc/datetime/gen-timezones.sh 4. https://stackoverflow.com/questions/37109945/how-to-use-abbreviated-timezone-namepst-ist-in-pytz 5. https://stackoverflow.com/questions/27531718/datetime-timezone-conversion-using-pytz 6. https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset 7. https://github.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
1 parent ad15365 commit 7d847c8

File tree

7 files changed

+2209
-13
lines changed

7 files changed

+2209
-13
lines changed

CHANGELOG.md

+1
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1111
- UUID type support (#202).
1212
- Datetime type support and tarantool.Datetime type (#204).
1313
- Offset in datetime type support (#204).
14+
- Timezone in datetime type support (#204).
1415

1516
### Changed
1617
- Bump msgpack requirement to 1.0.4 (PR #223).

tarantool/msgpack_ext/types/datetime.py

+145-13
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
import pandas
22
import pytz
33

4+
import tarantool.msgpack_ext.types.timezones as tt_timezones
5+
from tarantool.error import MsgpackError, MsgpackWarning, warn
6+
47
# https://www.tarantool.io/ru/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type
58
#
69
# The datetime MessagePack representation looks like this:
@@ -43,13 +46,6 @@
4346
SEC_IN_MIN = 60
4447
MIN_IN_DAY = 60 * 24
4548

46-
def compute_offset(dt):
47-
if dt.tz is None:
48-
return 0
49-
50-
utc_offset = dt.tz.utcoffset(dt)
51-
# There is no precision loss since pytz.FixedOffset is in minutes
52-
return utc_offset.days * MIN_IN_DAY + utc_offset.seconds // SEC_IN_MIN
5349

5450
def get_bytes_as_int(data, cursor, size):
5551
part = data[cursor:cursor + size]
@@ -58,6 +54,125 @@ def get_bytes_as_int(data, cursor, size):
5854
def get_int_as_bytes(data, size):
5955
return data.to_bytes(size, byteorder=BYTEORDER, signed=True)
6056

57+
def compute_offset(dt):
58+
utc_offset = dt.tz.utcoffset(dt)
59+
# There is no precision loss since pytz.FixedOffset is in minutes
60+
return utc_offset.days * MIN_IN_DAY + utc_offset.seconds // SEC_IN_MIN
61+
62+
def get_tz_as_offset(dt, tarantool_tz=None):
63+
tzoffset = compute_offset(dt)
64+
tzindex = 0
65+
if tarantool_tz is not None:
66+
tzindex = tt_timezones.timezoneToIndex[tarantool_tz]
67+
return tzoffset, tzindex
68+
69+
def get_tarantool_timezone(dt, tarantool_tz=None):
70+
# Tarantool 2.10 (commit 9ee45289e01232b8df1413efea11db170ae3b3b4)
71+
# do not support the following pytz (version 2022.2.1) timezones
72+
# - CST6CDT
73+
# - EST5EDT
74+
# - Etc/GMT+1
75+
# - Etc/GMT+10
76+
# - Etc/GMT+11
77+
# - Etc/GMT+12
78+
# - Etc/GMT+2
79+
# - Etc/GMT+3
80+
# - Etc/GMT+4
81+
# - Etc/GMT+5
82+
# - Etc/GMT+6
83+
# - Etc/GMT+7
84+
# - Etc/GMT+8
85+
# - Etc/GMT+9
86+
# - Etc/GMT-1
87+
# - Etc/GMT-10
88+
# - Etc/GMT-11
89+
# - Etc/GMT-12
90+
# - Etc/GMT-13
91+
# - Etc/GMT-14
92+
# - Etc/GMT-2
93+
# - Etc/GMT-3
94+
# - Etc/GMT-4
95+
# - Etc/GMT-5
96+
# - Etc/GMT-6
97+
# - Etc/GMT-7
98+
# - Etc/GMT-8
99+
# - Etc/GMT-9
100+
# - Europe/Kyiv
101+
# - MET
102+
# - MST7MDT
103+
# - PST8PDT
104+
#
105+
# They are transformed to tzoffset based on pytz info.
106+
tzoffset = compute_offset(dt)
107+
108+
# Abbreviated Tarantool timezones with zero offset are treated as
109+
# UTC-zone timestamps.
110+
if tarantool_tz is not None:
111+
tzindex = tt_timezones.timezoneToIndex[tarantool_tz]
112+
else:
113+
if dt.tz.zone in tt_timezones.timezoneToIndex:
114+
tzindex = tt_timezones.timezoneToIndex[dt.tz.zone]
115+
else:
116+
warn(f'pytz timezone {dt.tz} is not supported by Tarantool, '
117+
f'using tzoffset={tzoffset} instead', MsgpackWarning)
118+
119+
tzindex = 0
120+
121+
return tzoffset, tzindex
122+
123+
def get_tarantool_tz_data(dt, tarantool_tz=None):
124+
if dt.tz is None:
125+
return 0, 0
126+
127+
if dt.tz.zone is not None:
128+
return get_tarantool_timezone(dt, tarantool_tz)
129+
else:
130+
return get_tz_as_offset(dt, tarantool_tz)
131+
132+
def is_ambiguous_tz(tt_tzinfo):
133+
return (tt_tzinfo['category'] & tt_timezones.TZ_AMBIGUOUS) != 0
134+
135+
def get_pytz_timezone(tzindex=None, tzname=None):
136+
# https://raw.githubusercontent.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
137+
#
138+
# There are several possible timezone types in Tarantool.
139+
# Abbreviated timezones are a bit tricky since they could be ambiguous.
140+
# Tarantool itself do not support creating datetime with ambiguous timezones:
141+
#
142+
# Tarantool 2.10.1-0-g482d91c66
143+
#
144+
# tarantool> datetime.new({tz = 'BST'})
145+
# ---
146+
# - error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone'
147+
# ...
148+
#
149+
# pytz version 2022.2.1 do not support most of Tarantool abbreviated timezones
150+
# (except for CET, EET EST, GMT, HST, MST, UTC, WET). Since Tarantool sources
151+
# provide offset info for abbreviated timezones, we use pytz.FixedOffset instead.
152+
#
153+
# https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset
154+
if tzname is not None:
155+
if tzname not in tt_timezones.timezoneToIndex:
156+
raise ValueError(f'Unknown Tarantool timezone "{tzname}"')
157+
elif tzindex is not None:
158+
if tzindex not in tt_timezones.indexToTimezone:
159+
raise MsgpackError(f'Unknown tzindex {tzindex}')
160+
tzname = tt_timezones.indexToTimezone[tzindex]
161+
else:
162+
raise ValueError('Pass tzindex or tzname')
163+
164+
try:
165+
tzinfo = pytz.timezone(tzname)
166+
except pytz.exceptions.UnknownTimeZoneError:
167+
tt_tzinfo = tt_timezones.timezoneAbbrevInfo[tzname]
168+
169+
if is_ambiguous_tz(tt_tzinfo):
170+
raise MsgpackError(f'Failed to decode datetime {tzname} with ambiguous timezone')
171+
172+
tzinfo = pytz.FixedOffset(tt_tzinfo['offset'])
173+
174+
return tzinfo
175+
61176
def msgpack_decode(data):
62177
cursor = 0
63178
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES)
@@ -74,7 +189,8 @@ def msgpack_decode(data):
74189
total_nsec = seconds * NSEC_IN_SEC + nsec
75190

76191
if (tzindex != 0):
77-
raise NotImplementedError
192+
tzinfo = get_pytz_timezone(tzindex=tzindex)
193+
dt = pandas.to_datetime(total_nsec, unit='ns').replace(tzinfo=pytz.utc).tz_convert(tzinfo)
78194
elif (tzoffset != 0):
79195
tzinfo = pytz.FixedOffset(tzoffset)
80196
dt = pandas.to_datetime(total_nsec, unit='ns').replace(tzinfo=pytz.utc).tz_convert(tzinfo)
@@ -85,33 +201,39 @@ def msgpack_decode(data):
85201
return dt, tzoffset, tzindex
86202

87203
class Datetime(pandas.Timestamp):
88-
def __new__(cls, *args, **kwargs):
204+
def __new__(cls, *args, tarantool_tz=None, **kwargs):
89205
dt = None
90206
if len(args) > 0:
91207
if isinstance(args[0], bytes):
92208
dt, tzoffset, tzindex = msgpack_decode(args[0])
93209
elif isinstance(args[0], Datetime):
94210
dt = pandas.Timestamp.__new__(cls, *args, **kwargs)
95211
tzoffset = args[0].tarantool_tzoffset
212+
tzindex = args[0].tarantool_tzindex
96213

97214
if dt is None:
98215
dt = super().__new__(cls, *args, **kwargs)
99-
tzoffset = compute_offset(dt)
216+
tzoffset, tzindex = get_tarantool_tz_data(dt)
217+
218+
if tarantool_tz is not None:
219+
tzinfo = get_pytz_timezone(tzname=tarantool_tz)
220+
dt = pandas.Timestamp.replace(dt, tzinfo=tzinfo)
221+
tzoffset, tzindex = get_tarantool_tz_data(dt, tarantool_tz)
100222

101223
dt.__class__ = cls
102224
dt.tarantool_tzoffset = tzoffset
225+
dt.tarantool_tzindex = tzindex
103226
return dt
104227

105228
def msgpack_encode(self):
106229
seconds = self.value // NSEC_IN_SEC
107230
nsec = self.value % NSEC_IN_SEC
108-
tzoffset = 0
109-
tzindex = 0
110231

111232
if isinstance(self, Datetime):
112233
tzoffset = self.tarantool_tzoffset
234+
tzindex = self.tarantool_tzindex
113235
else:
114-
tzoffset = compute_offset(self)
236+
tzoffset, tzindex = get_tarantool_tz_data(self)
115237

116238
buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES)
117239

@@ -137,3 +259,13 @@ def tz_convert(self, *args, **kwargs):
137259
def tz_localize(self, *args, **kwargs):
138260
dt = super().tz_localize(*args, **kwargs)
139261
return Datetime(dt)
262+
263+
def tarantool_tz_convert(self, tarantool_tz):
264+
tzinfo = get_pytz_timezone(tzname=tarantool_tz)
265+
dt = super().tz_convert(tzinfo)
266+
return Datetime(dt, tarantool_tz=tarantool_tz)
267+
268+
def tarantool_tz_localize(self, tarantool_tz):
269+
tzinfo = get_pytz_timezone(tzname=tarantool_tz)
270+
dt = super().tz_localize(tzinfo)
271+
return Datetime(dt, tarantool_tz=tarantool_tz)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
from tarantool.msgpack_ext.types.timezones.timezones import (
2+
TZ_AMBIGUOUS,
3+
indexToTimezone,
4+
timezoneToIndex,
5+
timezoneAbbrevInfo,
6+
)
7+
8+
__all__ = ['TZ_AMBIGUOUS', 'indexToTimezone', 'timezoneToIndex',
9+
'timezoneAbbrevInfo']
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
#!/usr/bin/env bash
2+
set -xeuo pipefail
3+
4+
SRC_COMMIT="9ee45289e01232b8df1413efea11db170ae3b3b4"
5+
SRC_FILE=timezones.h
6+
DST_FILE=timezones.py
7+
8+
[ -e ${SRC_FILE} ] && rm ${SRC_FILE}
9+
wget -O ${SRC_FILE} \
10+
https://raw.githubusercontent.com/tarantool/tarantool/${SRC_COMMIT}/src/lib/tzcode/timezones.h
11+
12+
# We don't need aliases in indexToTimezone because Tarantool always replace it:
13+
#
14+
# tarantool> T = date.parse '2022-01-01T00:00 Pacific/Enderbury'
15+
# ---
16+
# ...
17+
# tarantool> T
18+
# ---
19+
# - 2022-01-01T00:00:00 Pacific/Kanton
20+
# ...
21+
#
22+
# So we can do the same and don't worry, be happy.
23+
24+
cat <<EOF > ${DST_FILE}
25+
# Automatically generated by gen-timezones.sh
26+
27+
TZ_UTC = 0x01
28+
TZ_RFC = 0x02
29+
TZ_MILITARY = 0x04
30+
TZ_AMBIGUOUS = 0x08
31+
TZ_NYI = 0x10
32+
TZ_OLSON = 0x20
33+
TZ_ALIAS = 0x40
34+
TZ_DST = 0x80
35+
36+
indexToTimezone = {
37+
EOF
38+
39+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
40+
| awk '{printf("\t%s : %s,\n", $1, $3)}' >> ${DST_FILE}
41+
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
42+
| awk '{printf("\t%s : %s,\n", $1, $2)}' >> ${DST_FILE}
43+
44+
cat <<EOF >> ${DST_FILE}
45+
}
46+
47+
timezoneToIndex = {
48+
EOF
49+
50+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
51+
| awk '{printf("\t%s : %s,\n", $3, $1)}' >> ${DST_FILE}
52+
grep ZONE_UNIQUE ${SRC_FILE} | sed "s/ZONE_UNIQUE( *//g" | sed "s/[),]//g" \
53+
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
54+
grep ZONE_ALIAS ${SRC_FILE} | sed "s/ZONE_ALIAS( *//g" | sed "s/[),]//g" \
55+
| awk '{printf("\t%s : %s,\n", $2, $1)}' >> ${DST_FILE}
56+
57+
cat <<EOF >> ${DST_FILE}
58+
}
59+
60+
timezoneAbbrevInfo = {
61+
EOF
62+
63+
grep ZONE_ABBREV ${SRC_FILE} | sed "s/ZONE_ABBREV( *//g" | sed "s/[),]//g" \
64+
| awk '{printf("\t%s : {\"offset\" : %d, \"category\" : %s},\n", $3, $2, $4)}' >> ${DST_FILE}
65+
echo "}" >> ${DST_FILE}
66+
67+
rm timezones.h
68+
69+
python validate_timezones.py

0 commit comments

Comments
 (0)