Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to support Geocaching Home Assistant integration feature updates #23

Merged
merged 48 commits into from
Dec 14, 2024

Conversation

marc7s
Copy link
Contributor

@marc7s marc7s commented Dec 12, 2024

Introduction

This PR adds the necessary features required for our updates to the HA Geocaching integration. These updates are:

  • Add devices with entities for Trackables and Caches (such as name and location among others)
  • Add list of tracked Trackables and Caches
  • Add nearby caches search

Following is an explanation of the features we have added to the Geocaching integration.

Feature overview

For each model, we create a new device which groups all relevant entities together, in the same way as the current integration does with the profile entities. For each cache we add in HA, we therefore add a device for that specific cache with some entities that hold data for that cache. The same goes for each trackable we add. There are therefore now in total 3 different device types: Profile (1 created for the integration), Cache (1 created for each nearby cache, and another created for each tracked cache) and Trackable (1 created for each tracked trackable).

image

Trackable device

Each trackable contains 7 entities. These are:

  • Current cache code (reference code of the cache it currently resides in, or Unknown)
  • Current cache name (cache name of the cache it currently resides in, or Unknown)
  • Name (the name of the trackable)
  • Owner (the display name of the user who owns the trackable)
  • Release date (the date when the trackable was released)
  • Traveled distance (the total distance it has traveled, in kilometers)
  • A device tracker entity for its current location, so that it can be used with the map card

In addition to these entities, we also store its trackable journey (from the trackable log, filtered for only events where it moved) in the extra attributes which can be parsed and extracted using Jinja. See an example of this later.

image

Cache device

Each cache contains 7 entities. These are:

  • Favorite points (the number of favorite points the cache has)
  • Found (Yes/No, if the authenticated user has found this specific cache or not)
  • Found date (a date, or Unknown depending on if the authenticated user has found this specific cache or not)
  • Hidden date (the date when the cache was hidden)
  • Name (the name of the cache)
  • Owner (the display name of the user who owns the cache)
  • A device tracker entity for its current location, so that it can be used with the map card
image

Tracked trackables and caches

During configuration, the user can select one or more caches (and trackables) to track. These will then be returned as part of the GeocachingStatus object with their updated information. For example, a user can choose to track their own hidden caches, and their own trackables (or caches/trackables they are interested in) and then display the information for those in Home Assistant.

Configuration:
image

Displaying a Trackable:
image

Displaying a Cache:
image

Nearby caches

We use the home location of the HA instance as the position, and then search for caches in a radius around this position. These are considered "nearby caches". As with the tracked caches, we generate a cache device for each nearby cache. The user can configure the settings for this in the configuration step, to adjust the radius and maximum number of caches to generate devices for. A known limitation for this feature is that the nearby cache devices are generated during the configuration, and will therefore not be dynamically updated should a new cache be placed nearby. We looked into supporting this but were unable to within our time span.

Configuration:
image

Displaying the nearby caches:
image

Additions to profile device

For the profile device, which is the current featureset for the integration, we have added two sensors: total number of caches nearby, and total distance traveled by all tracked Trackables.

Setup

To test, we have made changes to the ha-core repository. This PR is used with the geo branch of our fork, available here. We did not manage to get it working just using the hass --skip-pip-packages geocachingapi command, it would continue to use the downloaded version from pypi instead of our local version of this repository. Therefore we made some temporary changes during development, which we will revert before opening a PR for the ha-core repository. This is our first time developing Home Assistant, so if you know of a better way to solve it you can ignore these instructions, but be aware that there are changes made in our geo branch that are temporary and used to set up our development environments.

We used Dev Containers in Visual Studio Code to develop our features. In order to get everything working, we created a bind mount for the dev container, and then we had to:

  1. Clone this repository to $HOME/Repos/geocachingapi-python, see the bind mount
  2. Clone the forked ha-core repository and switch to the geo branch
  3. Inside the HA dev container, run Ctrl + Shift + P -> Dev Containers: Rebuild Container Without Cache
  4. In the HA dev container terminal, run pip3 install -e config/custom_libraries/geocachingapi
  5. In the HA dev container terminal, run hass --skip-pip-packages geocachingapi
  6. Set up the HA instance
  7. Add the Geocaching integration

The cards we used to display the information in the previous images are available here.

Thank you for the previous work with this library and HA integration. Please let us know if you have any questions or comments. If this PR gets merged, then we will try to merge the forked ha-core geo-branch afterwards, after reverting our temporary development changes. Merging this PR would of course warrant a new version, so let us know which version this should be before merging.

Credits

The feature updates to the Geocaching integration and updates to this repository has been developed by:

marc7s and others added 30 commits November 3, 2024 20:11
…ot been able to test that it works yet, /Per&Albin
Fix trackable parsing and add missing fields
@reinder83
Copy link
Collaborator

Haven't checked it yet but first of all, what an incredible addition!

@Sholofly
Copy link
Owner

Simply lovely! Thanks! This is exactly what I hoped for when I started this package. Very happy with it!
I will take a closer look at the code in the coming days.

One question I have in my mind since the beginning:
The HA integration is using one partner consumer key (managed by Nabu Casa) for the API connection. The API documentation has the following rate limits:

image

Given the fact that 350 users are using the integration, and a maximum of 1200 API calls per minute are allowed, how do we make sure not to hit that rate limit, making the integration unuseable for all users for a minute, or even longer if we hit that limit too often?

And if we do hit the limit, how will the integration respond?

It's something that we probably should have built in from the beginning but with just some generic requests the need wasn't there.

How are your views on this? We can try to contact Groundspeak to try stretching the limits, but I doubt they can do that. But maybe the influence of the largest open source community in the world can convince them to move ;)

Can you make a rough estimation of the expected calls per minute? It would help to get a better view of the (possible) problem and how to overcome it..

@reinder83
Copy link
Collaborator

And in addition to @Sholofly's comment there are also limitations on basic vs premium members, non-premium have more restrictions in what they can see or request, there should be checks built-in for that more information on this:

https://api.groundspeak.com/documentation#restrictions

This is one of the reasons we kept the initial implementation very basic, because we didnt have checks on that yet

@marc7s
Copy link
Contributor Author

marc7s commented Dec 13, 2024

Thank you both for your comments.

As a note:

  • For caches, we only fetch lite caches, which are limited to 10 000 per basic user rather than 3/basic user each day. So there should not be any issues here
  • For trackables, I am not sure what the limits are. There is no section for trackable limits as there are for caches. There is only a section for trackable discoveries, which should be unrelated as we are only reading data and not discovering any trackables
  • For verifying the settings, we only fetch the referenceCode field, meaning these calls do not count towards the user's limit (although rate limiting should still apply)

Here is a breakdown of the API requests:
Each time the integration updates (_async_update_data is called), we call the update() function on the Geocaching instance to retrieve a new GeocachingStatus.

Retrieving a new status initiates the following requests:

  1. Update the user (1 API call to the /users/me endpoint)
  2. Update tracked trackables, if enabled (1 API call to the /trackables endpoint)
    2.1: For each tracked trackable, update its journey data (TT API calls to the /trackables/___/trackablelogs endpoint, where TT is the number of tracked trackables)
  3. Update the tracked caches, if enabled (1 API call to the /geocaches endpoint)
  4. Update the nearby caches, if enabled (1 API call to the /geocaches/search endpoint)

So in total, each new status yields 4 + TT API calls, with a minimum of 1 call (tracked trackables, tracked caches and nearby caches disabled). We can expect these to occur during the same minute, as they are called in succession when the update is triggered. TT is tied to each integration instance and that user's configuration, with a minimum of 1 and currently no maximum. However, pagination is not implemented so there is an imposed max limit there when making the API call, however we should probably include some handling in the code that limits the number of tracked trackables and caches. Currently we do not supply the take parameter, so the default of 10 will be used by the API, meaning there is an imposed upper limit for TT of 10.

The minimum number of requests per status update is therefore 1, with the maximum being 4 + max(TT) which is 14.

To expand on the API imposed limits:

  1. 50 tracked caches (imposed by max number of reference codes allowed by /geocaches endpoint)
  2. 10 tracked trackables (imposed by the default take parameter of the /trackables endpoint)
  3. 100 nearby caches (imposed by the take parameter of the /geocaches/search endpoint, but also limited in code to a value between 0-100 currently)

Some quick estimations therefore yield (with the current update interval at one every hour):

With the three rate limits as:
RL1: 60 calls per minute for user per method
RL2: 1200 calls per minute for partner consumer key
RL3: 6000 calls per minute for IP address

And assuming each integration is under a unique IP address, we get:

Minimum configuration (tracked caches and trackables disabled, nearby caches disabled):

Requests per user per hour Subject to RL1 Subject to RL2 Subject to RL3
1 No Worst case: 1 200 users. Best case: 103 680 000 users No

Maximum configuration (50 tracked caches, 10 tracked trackables, 100 nearby caches):

Requests per user per hour Subject to RL1 Subject to RL2 Subject to RL3
14 No Worst case: 85 users. Best case: 7 405 714 users No

Note about the calculations:

  1. RL1 will only be relevant if the requests per minute per user exceeds 60, and is therefore not relevant.
    1 request per user per hour, meaning worst case where all integrations
  2. RL3 follows the prior rules, however it must instead exceed 6000 as we assume each integration is bound to a unique IP address
  3. The worst case is if all Geocaching integrations happen to update during the same minute, which is highly unlikely. I have not looked into this, but my guess is the update interval starts from the time when the integration was configured, meaning the worst case assumes all integrations were individually configured by users during the same minute of a day.
  4. The best case is if all Geocaching integrations were equally distributed among the available minutes of the day (24 * 60 * 60 = 86 400) meaning the API requests are optimally placed, again highly unlikely. This gives us the best case user count as 86 400 * 1 200 / Requests per user per hour
  5. The truth will of course lie somewhere between the best and worst case, but the important part is that if this integration should scale massively and we are only allowed a single partner consumer key, we would be able to reach the best case scenario by programmatically scheduling all API calls optimally among all users. This would of course be a very large task, and would require all integration instances to somehow communicate with each other to decide during which minute each integration should update its status, but at least there is a possibility to scale here.

All in all, I do not think it will be an issue at the moment, however I do think it would be a good idea to add some limitations in the code as well for all of the API calls. I also think updating the data each hour is not really necessary for this integration, so we could for example update it every other hour, and set the hour it updates on based on a coin flip (tails is hour 1, 3, 5, 7..., heads is hour 2, 4, 6, 8...) which would halve the API calls and importantly the rates, allowing for double the users.

This is of course my own interpretations of the restrictions, I may have misinterpreted them so it is a good idea to double check that, and that my math looks reasonable. I have not spent much time digging into the restrictions, so I may also have missed something.

TLDR: We do not currently enforce any limits to the number of tracked caches or tracked trackables but could easily do so by implementing it. For nearby caches we do enforce limits in the code, currently allowing the entire span (0-100 caches). All three are affected by the API imposed limitations described above. There are ways of enforcing or scheduling to allow the integration to scale more under a single partner key, which could be implemented with different levels of ambition.

@marc7s
Copy link
Contributor Author

marc7s commented Dec 13, 2024

And in addition to @Sholofly's comment there are also limitations on basic vs premium members, non-premium have more restrictions in what they can see or request, there should be checks built-in for that more information on this:

https://api.groundspeak.com/documentation#restrictions

This is one of the reasons we kept the initial implementation very basic, because we didnt have checks on that yet

If I understand these restrictions correctly, they should not be an issue for us as we only fetch lite geocaches with a limit of 10 000 per basic user per day, which is far above what is currently possible to achieve (which should be 150 * 24 = 3 600), with the most of these being the nearby caches which we can easily reduce from its current maximum of 100 caches. These 100 nearby caches are fetched with a single API call, but as I understand these rules fetching data about 100 caches with a single API call would count as 100 towards these daily 10 000 available. Nevertheless, premium subscriptions should not be needed for this use case as explained, if I have understood the restrictions correctly.

Sholofly
Sholofly previously approved these changes Dec 13, 2024
Copy link
Owner

@Sholofly Sholofly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read everything twice and couldn't find something to correct. Must say I am not a Python developer so it doesn't say much from my side. You should probably ask a native Python programmer to review it too, if you idn't do that already.

@Sholofly
Copy link
Owner

Brilliant change! Please let me know if you are planning to review this by a more experienced python developer. Otherwise I will complete the PR.

I will create a new release 0.3.0 that will automaticcally publish te package to pypi

… handling: Raise error if too many codes were configured in settings. Automatically remove duplicate codes
@marc7s
Copy link
Contributor Author

marc7s commented Dec 14, 2024

@Sholofly I made some additions to this PR:

  • Handle limits in settings (throw an error if you pass in too many reference codes during the configuration)
  • Handle limits during API call (limit applicable API calls with take parameter, even though the settings handling should have already caught this)
  • Automatically remove duplicate reference codes (changed from list to set)
  • Separated the min(max(... function into a clamp function to make it more readable
  • Put the limits in a limits.py file where you can configure them

These changes were made to address my previous comments, and make it easy to change the limits down the line. There are now two imposed limits in the code:

  1. For the settings: raising an error if you try to configure the API with values above the limits
  2. For the API: setting the take parameter where possible

The limits are configurable in limits.py, so you can easily change them later, for example lowering the limits to allow for more users.

Regarding the review, we have previous experience in python even though it is not our main language. However, I am confident enough in these changes that I do not think they should need a further review. We have tried to put comments and documentation where necessary to make it more maintainable and not overuse python specific syntax to make it more accessible for non python-natives. I think most of the reviewing will take place in the HA side of things, so these changes may get revisited as part of that PR which we will try to publish in the near future. So from our end bumping the version number and merging is fine!

Copy link
Owner

@Sholofly Sholofly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good addition!

@Sholofly Sholofly merged commit 0aebbd4 into Sholofly:dev Dec 14, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants