Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datetime handling in function parameters #1

Closed
russbiggs opened this issue Jan 23, 2025 · 14 comments
Closed

datetime handling in function parameters #1

russbiggs opened this issue Jan 23, 2025 · 14 comments
Labels
enhancement New feature or request

Comments

@russbiggs
Copy link
Member

Currently we only accept PosixCt date for parameters such as datetime_from and datetime_to et al. This is fine but kind of obscures one really nice feature of the API which is the automatic inference of timezone based on location. In the API if you pass a date without TZ itll infer the timezone of the related location, making usre you dont have to guess or manually do that. with as.PosixCt this isnt really possible because without a TZ it assumes UTC and that is what is passed.

To solve this perhaps we accept PosixCt and a datetime string and do some validation around the ISO8601 format to help users pass a valid string.

@russbiggs russbiggs added the enhancement New feature or request label Jan 23, 2025
@AarshBatra
Copy link

AarshBatra commented Jan 25, 2025

That will be very useful. Here is a list of possible scenarios for expected inputs, which can then be validated using a custom function (e.g. validate_datetime_for_api) and a regex to create a valid ISO-8601 string from which the API can then automatically infer the time zone. If this looks like a good next step, let me know, I can try and give it a go:

1. POSIXct Input

  • If the input is a POSIXct object, the function:
    • Removes the timezone (if present) and returns the datetime as a string in ISO 8601 format without the timezone.
    • If the timezone is not specified in the POSIXct object, the function assumes it is in the local time zone and still converts it to an ISO 8601 format without the timezone.
    • Example:
Input (POSIXct with a timezone): validate_datetime_for_api(as.POSIXct("2025-01-25 12:30:00", tz = "America/New_York"), regex, ...)
# result: "2025-01-25T12:30:00"
Input (POSIXct without a time zone): validate_datetime_for_api(as.POSIXct("2025-01-25 12:30:00"), regex, ...)
#result: "2025-01-25T12:30:00"

2. ISO 8601 Input (With Timezone)

  • If the input is a valid ISO 8601 string that includes a timezone (e.g., "2025-01-25T12:30:00+05:30" or "2025-01-25T12:30:00Z"), the function:
    • Removes the timezone part (either Z or +hh:mm).
    • Returns the datetime string in ISO 8601 format without the timezone.
    • Example:
validate_datetime_for_api("2025-01-25T12:30:00+05:30")
#result: "2025-01-25T12:30:00"
validate_datetime_for_api("2025-01-25T12:30:00Z", regex, ...)
#result: "2025-01-25T12:30:00"

3. ISO 8601 Input (Without Timezone)

  • If the input is a valid ISO 8601 string without a timezone (e.g., "2025-01-25T12:30:00"), the function:
    • Returns the datetime string as is, in the correct ISO 8601 format without any modification.
    • Example:
validate_datetime_for_api("2025-01-25T12:30:00", regex, ...)
#result: "2025-01-25T12:30:00"

4. Date-Only Input (Without Time)

  • If the input is just a date string (e.g., "2025-01-25"), the function:
    • Assumes the time as 00:00:00 (midnight).
    • Appends the time part to the date string, converting it into a valid ISO 8601 datetime format: "2025-01-25T00:00:00".
    • A warning is issued to inform the user that time is assumed.
    • Example:
validate_datetime_for_api("2025-01-25", regex, ...)
Result: "2025-01-25T00:00:00"
Warning: Warning: No time specified. Assuming time as 00:00:00 for the given date.

5. Datetime String Without T
If the input is a datetime string in the form of "YYYY-MM-DD HH:MM:SS" (i.e., without T between date and time), the function:

  • Replaces the space (or any number of white space characters, etc) with a T, converting the datetime string into the correct ISO 8601 format.
  • Example:
validate_datetime_for_api("2025-01-25 12:30:00", regex, ...)
#result: "2025-01-25T12:30:00"
#message: Note: Time format without 'T' detected. Converting the datetime string to ISO 8601 format.

6. Invalid Input Format
If the input does not match any of the expected formats (e.g., missing both date and time, incorrect characters), the function:
Stops and returns an error message indicating the input format is invalid and guides the user to provide a valid input in one of the recognized formats.

@russbiggs
Copy link
Member Author

Thanks for this @AarshBatra.

To clarify some of the intentions here because you've assumed a couple things that differ from what i'm looking for:

When a timezone is provided, e.g. as a POSIXct as.POSIXct("2025-01-25 12:30:00", tz = "America/New_York") The timezone should not be stripped, we want to assume adding the tz this is intentional to the query. The problem arises in how as.POSIXct` defaults to UTC when none is provided, This create ambiguity on the intentionality. Is the user using the default to "intentionally" create a UTC date (i doubt this)? or are they trying to create a "bare" date.

The same would go for handling strings, if a user give a full ISO 8601 date, with tz, we dont want to strip the tz, we want to assumed intentionality. In the case where no TZ is provided in the string we assume they are passing a "local" date.

The intention of the API functionality is somewhat aiding in "laziness" i.e. without provided an explicit TZ you may not know the TZ (and dont have to!) so you're allowing the API to provide the sensible default. When a user explicitly indicate TZ in the query we want to respect that, even if it maybe doesn't make sense.

Given this this would change some of the proposed functionality you added, namely #1 and #2. Additionally if we support dates as strings I would want to provide pretty heavy handed validation, as i am for most other parameters, and make sure that only ISO-8601 strings can come through (this is a pretty complex regex because theres a lot of valid forms in the spec.)

If this makes sense I'm definitely open to a PR, happy to discuss and clarify more. Also note, I'm no R expert so i may be missing something in the as.POSIXct that could solve this as well.

@AarshBatra
Copy link

AarshBatra commented Jan 26, 2025

Thanks @russbiggs for clarifying the context.

I read the as.POSIXct documentation and found something that might simplify things. As you mentioned, it specifies: Dates without times are treated as being at midnight UTC.

But, it also specifies that if tz = "" (which is the default, i.e. no time zone specified), then the output is returned in the current time zone, which might be useful for us.

In my understanding, this would then imply that as.POSIXct() internally treats the date as midnight UTC. However, the output seems to automatically adjusts to the local time zone (tz = "") unless explicitly told to remain in UTC (tz = "UTC"). I tried this and this seems to be the case, can you try it? Also, if someone does explicitly request UTC, we in any case want to respect that intentionality. Does this make sense?

If yes, this means that if we continue with the current setup in which users are only allowed to input dates in POSIXct format (and no strings, or ISO8601, avoiding all that validation) than that might be all we need, no?

But, one improvement that can be made on top of that is perhaps a small function that as a first step validates the POSIXct format and outputs better error messages if a user incorrectly specifies the POSIXct date/datetime. For e.g. we can be explicit in the error message informing user the exact format which is needed, e.g. stop("Invalid date format. Please provide dates as POSIXct using as.POSIXct(datetime), where datetime is in the format '%Y-%m-%d %H:%M:%S'.") OR if only date is expected: stop("Invalid date format. Please provide dates as POSIXct using as.POSIXct(date) where date is in the format '%Y-%m-%d"). Time zone if explicitly mentioned is respected, otherwise it automatically defaults to the local time zone.

Also, according to the documentation, if a user intentionally specifies a timezone, but does so incorrectly the function then defaults to UTC with a warning message. Here again, a better warning message than the default one, can be written to inform the user explicitly about this default behavior.

That aside, if we want to allow for ISO8601 string inputs (i.e. something like "2024-01-01", instead of as.POSIXct("2024-01-01")) assuming that it provides better user experience, then yes a separate more comprehensive function with appropriate validation will be required. Even in the string input scenario, we'll have to impose a restriction in terms of standard like ISO8601, because as you noted, just simply allowing any string input would be extremely error prone in the validation part.

But, before we get into that bit, I wanted to check if the as.POSIXct() smaller date validation function approach with better error messages is all that you might be looking for at this point.

@russbiggs
Copy link
Member Author

Thanks for digging into as.POSIXct, this is generally the behavior i had seen as well. This does not address the problem i'm outlining in this issue though. In the API if you pass a date or datetime (one without a TZ) to a sensor query it will automatically infer that sensors timezone, e.g.

/v3/sensors/3920/measurements?datetime_from=2025-01-01&datetime_to=2025-01-07

This will query from 2025-01-01 00:00:00-07:00 to 2025-01-07 00:00:00-07:00(-7 being the UTC offset for sensor 3920). In the case of passing POSIXct this kind of thing is not possible. Doing as.POSIXct('2025-01-01') will turn into 2025-01-01 00:00:00 in my system timezone, and so not allowing this inferred timezone functionality.

Because it all gets converted to a string when eventually building the URL, the best option seems like also just allowing an ISO 8601 string to also be an option for the datetime parameters, this way we have a very transparent option without assumption around timezone like the posixct function.

@AarshBatra
Copy link

Ah, I see. Can you give me an example/confirm the below of what the API request URLs would look like in the following cases (assuming sensor 3920 located in UTC-07:00). This will help me understand next steps:

  • User correctly specifies date and time without a time zone (output defaults to UTC-07:00) in a non-ISO8601 format. Will the query look like:

    • /v3/sensors/3920/measurements?datetime_from=2025-01-01 12:00:00&datetime_to=2025-01-07 13:00:00
  • User correctly specifies date and time without a time zone (output defaults to UTC-07:00) in a ISO8601 format. Will the query look like:

    • /v3/sensors/3920/measurements?datetime_from=2025-01-01T12:00:00&datetime_to=2025-01-07T13:00:00
  • User correctly specifies date and time with a IST time zone (output respects intentionality and returns query data in IST format) in a ISO8601 format. In this case, the user is requesting data for the sensor 3920 in UTC +5:30 (IST), but the sensor is located in UTC-7. So, I am assuming in this scenario, the API will return data for 12 pm IST to 1 pm IST for sensor 3920 and not 12 pm Mountain time (UTC -07:00) to 1 pm Mountain Time (UTC -07:00), right? Also will the query look like:

    • /v3/sensors/3920/measurements?datetime_from=2024-12-02T13:00:00+05:30&datetime_to=2025-01-07T+5:30

Finally, when specifying the time zone, does the API request URL adds an additional &tz = to the URL string or does it mention the time zone within the datetime_from and datetime_to parameter? Also, does it mention time zone as an offset in hours (e.g. +05:30, or does it convert it to a string, i.e. IST and then pass it on in the URL)?

@russbiggs
Copy link
Member Author

For date, datetime and timezone info read this in the docs for the general thinking: https://docs.openaq.org/using-the-api/dates-datetimes

Heres examples of how i see the R API working and how it would translate to the REST API:

df <- list_sensor_measurements(3920, datetime_from = as.POSIXct('2025-01-01'))
#  https://api.openaq.org/v3/sensors/3920/measurements?datetime_from=2025-01-01T00:00:00-{user system time offset}

This is how POSIXct works but depending on user system time is goofy and likely not what a user wants/expects, unless the sensor in question just happens to be in the same timezone as the machine they are running the code (not a reliable way to working).

They can explicitly list TZ, but they would have to have prior knowledge of that sensors timezone:

df <- list_sensor_measurements(3920, datetime_from = as.POSIXct('2025-01-01', tz="America/Denver"))
#  https://api.openaq.org/v3/sensors/3920/measurements?datetime_from=2025-01-01T00:00:00-07:00

If we allow strings with some validation a user could do:

df <- list_sensor_measurements(3920, datetime_from = "2025-01-01")
#  https://api.openaq.org/v3/sensors/3920/measurements?datetime_from=2025-01-01

This is a better solution because it follows a predictable pattern, a date without time is assumed as midnight and the timezone automatically gets inferred to be local with respect to the sensor/locations. This is then functionally the same as: 2025-01-01T00:00:00-07:00 without having to type all that out.

This could extend further to datetime, still without timezone:

df <- list_sensor_measurements(3920, datetime_from = "2025-01-01T02:00:00")
#  https://api.openaq.org/v3/sensors/3920/measurements?datetime_from=2025-01-01T02:00:00

Again the users gets it in the local timezone of the sensor/location in question.

As you can see the string option isn't possible with as.POSIXct because it wants to explicitly set the TZ, whereas the API is OK with no TZ because it assumes local, unless explicitly specified by the user.

If a user explicitly set a timezone, we respect it. If they don't we allow it fall back to the documented behavior of the API.

@AarshBatra
Copy link

These examples are very helpful, thanks. Makes sense to only allow a string for the date time field, eliminating the issues that come with POSIXct, especially given how the API works.

Q about an edge case: What happens in the scenario in which a user explicitly specifies a time zone that is not the time zone of the sensor? I know there is probably no good reason for the user to do this.

I understand that we want to respect intentionality, but, in such a scenario I am assuming the API will either throw an error OR will return the requested data in the sensor's timezone with a warning of what it has done. Is that correct?

From the proposed date validation function's perspective, suppose a user passes datetime = "2024-01-01T12:00:00+5:30" , for sensor 3920. We know that the sensor is in -07:00, not +05:30, but nonetheless the ISO8601 string is in the correct format.

Within R we have no way initially to know what's wrong as this will be caught only when it goes to the API which will automatically try to infer timezone for sensor 3920 and find a discrepancy.

So, from the function's perspective do we simply pass on the string as valid to the API?

Or within R we can simply strip away timezone from the ISO8601 string if specified by a user, as the API in any case will later figure it out.

Also, are we open to completely eliminating POSIXct from the pipeline and only allow strings? Or do we still want to allow certain valid scenarios, e.g. in which a user mentions both date and time zone correctly for the sensor (even though that's redundant)?

I imagine allowing only strings and completely eliminating the option of POSIXct might be least error prone, with proper validation of course.

@russbiggs
Copy link
Member Author

The API does not care and will not stop you from passing a datetime in a timezone differing from the sensors actual timezone. It's also not the role of the API to warn you about this. We assume intention is correct. Follow a "were all adults here" mindset, we assume the users intention have a purpose.

We initially designed to use POSIX as the default type and will likely keep that, so no intention to remove for now. Passing strings especially ISO 8601 compliant ones has always proven a very poor default solution so we'd still like to support a "native datetime" format such as posixct, the strong option will simply be an imperfect fallback option.

If you'd like to submit a PR we might be able to better discuss these particulars with code and examples.

@russbiggs
Copy link
Member Author

To clarify these are very valid issues you raise. Once we iron out a solution these can be addressed in documentation and guides to show users any peculiarities

@AarshBatra
Copy link

The we're all adults mindset definitely simplifies things, when it comes to function design and is good advice in general, I completely agree.

But nonetheless, all of us make certain unintentional mistakes and interaction with APIs can be confusing at times and everyone is working with deadlines and often imperfect information. This is not ideal ofcourse the onus is on the user to properly read documentation, etc before interacting with the API.

But, in my little experience, I have found that obvious unintentional and detectable mistakes happen all the time. That's part of programming. If we as interface designers or programmers know about this small subset of obvious mistakes that one might unintentionally run into, then flagging useful warnings where possible is always worth it, that improves user experience.

Of course we cannot do this for all possible mistakes out there. But, useful warnings for some clear mistakes proves useful in my opinion.

Making this part of documentation later on sounds good.

Yes, I think a good next step will be a PR. I'll get on that soon. We can continue the discussion there then.

@russbiggs
Copy link
Member Author

@AarshBatra ive assigned this to you just for triaging, if you end up not being able to do a PR for, or just don't want to let me know.

@AarshBatra
Copy link

Got it. Sounds good Russ, thanks for assigning.

@russbiggs
Copy link
Member Author

@AarshBatra let's hold on this. After some additional thought and conversation outside of this issue i think I would like to keep this as just accepting date object for now and focus on better documenting the timezone issues outlined here.

@AarshBatra
Copy link

AarshBatra commented Jan 30, 2025

Sure, @russbiggs. If it helps, feel free to let me know if you'd like me to pick this up at a later point—no worries if not. I made a start and have something in the works, which could be useful down the line once it's ready. But I’ll hold off for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants