Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix request.url with full URI as SERVER_NAME #2458

Draft
wants to merge 3 commits into
base: 21.12LTS
Choose a base branch
from

Conversation

PromyLOPh
Copy link

Should fix #2457. Not sure why .strip(':') was suggested – we can just split at the correct token, though I’m leaving other places where this kind of splitting occurs untouched. I’m also keeping two currently failing tests commented out, because they are out of scope for this fix and probably require larger refactoring.

@PromyLOPh PromyLOPh requested a review from a team as a code owner May 18, 2022 12:57
@ahopkins
Copy link
Member

Not sure why .strip(':') was suggested

What if the SERVER_NAME is: //mystie.com intentionally leaving off the scheme? I think in that case it would erroneously be reported as http.

@PromyLOPh
Copy link
Author

My understanding was that you’d not prepend // in that case, but just use a bare hostname, i.e. example.com instead of //example.com.

@ahopkins
Copy link
Member

That might not be the case, especially if you wanted to build URLs that had // at the beginning without a scheme.

Consider your code:

{
  "url": "http://example.com/",
  "scheme": "http",
  "host": "example.com",
  "path": "/",
  "url_for": "http:////example.com/"
}

versus w/ strip

{
  "url": "//example.com/",
  "scheme": "",
  "host": "example.com",
  "path": "/",
  "url_for": "http:////example.com/"
}

As you suggested, with SERVER_NAME = "example.com" there is no way to get: //example.com.

Looking into this further, it looks like we also need to handle url_for regardless. Ideally that would be a single PR to handle both. Let me know if you can take this on as well.

@PromyLOPh
Copy link
Author

So what is SERVER_NAME then? Is it a URL? If so, wouldn’t it make sense to use urllib.parse.urlsplit to parse it?

I don’t have time to propose a fix for url_for right now unfortunately.

@ahopkins
Copy link
Member

No worries. It really should only be the domain, but that doesn't mean because of historical reasons we shouldn't be supporting the scheme too. // is legit. For this PR we can just close it with strip and url for will be another day.

@prryplatypus
Copy link
Member

prryplatypus commented May 24, 2022

The property documentation does say it uses "`config.SERVER_NAME` if in full URL format", which -according to the URI spec- requires a scheme to be defined. URIs starting with // are considered relative URIs, and are only valid "within the name space of another hierarchical URI", which would not be the case for the server's name.

Therefore, I personally agree with @PromyLOPh here in believing that splitting on :// would be the most accurate possibility without going overkill, and that url_for would be the method requiring some tweaking.

sanic/request.py Outdated Show resolved Hide resolved
@ahopkins
Copy link
Member

If that is the the decision we need to do some validation. As it stands there is certainly a use when you would need an empty scheme and we are not providing that. To be honest, I'm leaning more towards closing this PR because there is a deeper underlying issue that needs to be fixed and made consistent through the application and not just a band-aid in one location.

@Tronic
Copy link
Member

Tronic commented May 24, 2022

@ahopkins Do you mean for https vs. wss? Other than that I cannot really think of reasons to leave the scheme open. In particular, nothing should have external URLs without TLS anymore. This really needs a consistent fix (with support for nested external path) but frankly it has been a couple of years with no-one willing to do the fixing. My personal take is to cater to practical use cases rather than trying to read exactly RFCs say on URL parsing, and my practical cases would include hosting behind nginx proxy (meaning that scheme, hostname, port and path are different than internally, but Sanic should be able to produce wss for websocket URLs, even when Sanic connection is over plain http, with SERVER_NAME defined as https).

@ahopkins
Copy link
Member

Yup. Consistency is the pain point here. I am hesitant to add yet another minor fix, and will try and take a crack in the coming weeks at revamping this. It will likely cause some breaking changes. All the more reason to get it into this June release and not wait.

@PromyLOPh PromyLOPh force-pushed the fix-server-name-scheme branch from 5cfa763 to d41c478 Compare May 25, 2022 11:12
@PromyLOPh
Copy link
Author

I updated my code to use rstrip. A minimal fix for 21.12 would be appreciated, but in the end it’d be better if SERVER_NAME was not required behind a reverse proxy at all. The Forwarded header and ASGI’s root_path should be respected and used by url_for as well as the router.

@prryplatypus
Copy link
Member

prryplatypus commented May 26, 2022

As it stands there is certainly a use when you would need an empty scheme and we are not providing that.

Just came to mind that it's still possible to have an empty scheme by starting the URL with :// instead of //, isn't it?

EDIT: Ignore me; you meant leaving the scheme off, not leaving it blank.

ahopkins
ahopkins previously approved these changes Jun 29, 2022
@ahopkins ahopkins marked this pull request as draft September 19, 2022 19:06
@ChihweiLHBird
Copy link
Member

@ahopkins This seems simple and approved? Can we make it ready and rerun the CI? I think the main branch would also need it.

@ahopkins ahopkins self-requested a review November 6, 2022 09:34
@ahopkins
Copy link
Member

ahopkins commented Nov 6, 2022

I do not remember approving this, because I am not in agreement with this strategy yet.

I actually was thinking about this one yesterday while in the car. There is an absolute mess of inconsistent uses here and the whole API regarding naming, URL generation, domain setting, etc needs to be overhauled. I was going to propose a solution this week that would:

  • Leave current functionality in tact by creating a new config value, and a new url generator
  • The config would be something like config.PROXY_URL or config.BASE_URL. Something unused, unambiguous, and would use some of the setter logic on the config object to parse and validate what is set.
  • Leave url_for untouched to not break, but move it to deprecation. Probably a longer deprecation than our normal 2 release cycles.
  • Add a new method named something like generate_url or build_url that has a couple features:
    • explicit arguments (no more _external, _scheme, etc)
    • move path params to a params argument
    • works consistently with the new PROXY_URL value and also host headers when necessary

What else am I missing?

@ChihweiLHBird
Copy link
Member

@ahopkins

I am not familiar with the URL APIs in Sanic but this sounds like urllib.parse?

This PR seems like simple fix to an issue related to the feature that using config.SERVER_NAME as the scheme if in full URL format, right? Maybe this can be a temporary workaround before we largely refactoring the module?

@ahopkins
Copy link
Member

ahopkins commented Nov 6, 2022

I don't think it is a simple fix since it will be a breaking change.

@Tronic
Copy link
Member

Tronic commented Nov 7, 2022

  • Leave current functionality in tact by creating a new config value, and a new url generator
  • The config would be something like config.PROXY_URL or config.BASE_URL. Something unused, unambiguous, and would use some of the setter logic on the config object to parse and validate what is set.
  • Leave url_for untouched to not break, but move it to deprecation. Probably a longer deprecation than our normal 2 release cycles.
  • Add a new method named something like generate_url or build_url that has a couple features:

Just a brief comment now that my intuition says that all four of these are ideas to be avoided (just intuition, no reasoning yet). Certainly needs more thought and consideration. Would rather see some breaking evolution of existing facilities, e.g. via this PR, as the lesser evil than completely breaking it with a new/parallel system.

@ahopkins
Copy link
Member

Would rather see some breaking evolution of existing facilities

I am super reluctant to cause these breaking changes. The code that implements SERVER_NAME (and its lack) is not consistent. What is consistent, however, is how it has been used for the last few years. It is not a comfortable experience to force an update to all url_for implementations in a single release. And, trying to do it slowly and support both _server and `server can lead to some super off ambiguities.

Unfortunately we are stuck with a legacy API from 2017. I do not blame the original author or developers since the use case for building URLs with SERVER_NAME was not in scope. Nonetheless, we need to contend with how it was built over time.

If we try to "fix" the issue by changing how SERVER_NAME works then we will undoubtedly break applications. Perhaps there is a middle ground where we can implement a newer version of url_for if the developer opts into BASE_URL or something like that. TBH, SERVER_NAME is a terrible value anyway as it should (if anything) be used to set Server: Name headers.

I am open to suggestions on ways to implement this if anyone has another idea that will:

  • not cause a breaking change to existing applications that have not opted into a new pattern, possible pain points:
    • Sanic.url_for
    • Request.host
    • Request.scheme
    • etc
  • cleanup the url_for API to have proper type annotations for params (no more _external, _host, etc)

@Tronic
Copy link
Member

Tronic commented Nov 14, 2022

Possible approaches to start with:

  1. fix operation for situations where the current implementation is simply broken (full URL SERVER_NAME?)
  2. detect situations where there would be a potentially problematic breaking change and display a warning on log on how to change code to retain old or new functionality

Adding any parallel settings or url functions still doesn't seem any better than it was when I wrote about my intuition first. I am afraid that some degree of breaking cannot be avoided with this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid request.url when full URI is used as SERVER_NAME
5 participants