Skip to content

Allow passing custom estimation_options in localize_inloc.py #470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ayoussf
Copy link

@ayoussf ayoussf commented Jul 25, 2025

Allows passing custom estimation_options to localize_inloc.py, while maintaining the default behaviour if not provided.

I have also modified the Markdown in pipeline_InLoc.ipynb to specify additional details. If this is unwanted, I can revert to the original markdown.

@ayoussf
Copy link
Author

ayoussf commented Jul 30, 2025

Hi @Phil26AT,

I wanted to share a few observations regarding recent PyColmap updates (starting around v3.10.0 and later). Since these updates, I’ve been unable to reproduce InLoc's results, not only for SuperGlue and LightGlue but across other models I’ve tested (e.x. LoFTR)

As an example, after freshly cloning the InLoc repository and running the provided notebook without making any changes, I obtained the following results with SuperGlue:

  • DUC1: 44.4 / 66.7 / 79.8
  • DUC2: 50.4 / 73.3 / 77.1

These numbers were fairly consistent across multiple machines.

For context:

Interestingly, when replacing pycolmap.estimate_and_refine_absolute_pose with poselib.estimate_absolute_pose, I saw improved results on DUC2:

  • DUC1: 44.9 / 67.7 / 79.3
  • DUC2: 56.5 / 77.1 / 78.6

Moreover, I can still reliably reproduce results on Aachen v1.1, where I got the following with default settings:

SuperGlue:

  • Day: 90.3 / 96.2 / 99.4
  • Night: 76.4 / 90.1 / 100.0

LightGlue:

  • Day: 90.3 / 96.2 / 99.2
  • Night: 77.5 / 91.6 / 99.5

Thus, I am not entirely sure the reason for this behaviour on the InLoc dataset specifically.

I hope this helps and I’m happy to run additional tests if needed.

@Phil26AT
Copy link
Collaborator

Hi @ayoussf, thank you for reopening this PR.

Thank you for reporting detailed statistics on InLoc, and great that Aachen v1.1 is reproducible again. I was rerunning the pipeline (SP+SG) without changes, and also got similar results. However, results with the temporal pairs (used in the leaderboard) were reproducible up to ~2%:

Methods InLoc DUC1 InLoc DUC2 Retrieval
Leaderboard 46.5 / 65.7 / 78.3 52.7 / 72.5 / 79.4 NetVLAD top 40
Pycolmap / 3.12.0 43.9 / 66.2 / 79.3 51.1 / 74.0 / 77.1 NetVLAD top 40
Leaderboard 49.0 / 68.7 / 80.8 53.4 / 77.1 / 82.4 NetVLAD top 40 (temporal)
Pycolmap / 3.12.0 47.0 / 68.2 / 80.3 52.7 / 77.9 / 80.2 NetVLAD top 40 (temporal)

Note that this dataset is fairly small and that pose estimation is non-deterministic (I get fluctuations of 1-2%), so small differences might not be significant.

I checked the backlog, and realized that some default parameters in the estimation options changed, e.g.: old vs. new. It might be worth trying those.

Could you maybe post the last pycolmap version where results were reproducible for you, together with the actual numbers?

I will also try to run the pipeline with older versions over the next days.

@ayoussf
Copy link
Author

ayoussf commented Jul 31, 2025

@Phil26AT Thank you for the detailed response.

Unfortunately, I cannot pinpoint the exact pycolmap version I used, as I have since switched machines and no longer have access to the old environment to fully reproduce it. What I can say with certainty is that it was prior to v3.11.0.

Over the coming days, I will rerun the evaluations for SP+SG and SP+LG across pycolmap versions 0.6.0 through 3.12.3. Starting from v3.11.0, I will also include results using both the old and new estimation options for consistency. I will share the updated results in a new comment on this PR.

Lastly, I am aware there is a notebook for InLoc evaluation; however, if you prefer, I can create \hloc\pipelines\InLoc and add a pipeline.py script to stay consistent with the other dataset evaluations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants