-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add Uri\WhatWg classes to ext/uri #18672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comments regarding the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some first comments before ending my day. I'm having some troubles with the amount of “indirection” / “function pointers” handling things in a very abstract fashion that I'm not sure is always necessary.
ext/uri/php_uri_common.h
Outdated
if (property_handler->write_func(new_internal_uri, property_zv, &errors) == FAILURE) { \ | ||
throw_invalid_uri_exception(new_internal_uri->handler, &errors); \ | ||
zval_ptr_dtor(&errors); \ | ||
zend_object_release(new_object); \ | ||
zval_ptr_dtor(property_zv); \ | ||
RETURN_THROWS(); \ | ||
} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner if the write_func()
would throw the exception by itself. This gives it best control over the exception data.
If necessary a silent
parameter could be added, but extensions are already able to suppress any exceptions if they are undesired, so this is probably not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to throw in the handlers themselves because of the internal API: different kinds of exceptions/errors may be used in different contexts, especially when code needs to be backward compatible. For example SOAP uses a SoapFault
with the "Unable to parse URL" message (https://github.com/php/php-src/blob/cf0c982b0ba937f4b94923c8fe4d541f0181a283/ext/soap/php_http.c#L467) to indicate an error, while the FILTER_VALIDATE_URL
filter uses the false
return value etc. Another example from the code of the RFC itself is the parse_uri
handler that you spotted below: it should generally throw an InvalidUriException
, except when it's used during unserialization.
That's why my reasoning is that handlers should be free of any exception throwing, and the caller should decide based on the context what to do with the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why my reasoning is that handlers should be free of any exception throwing, and the caller should decide based on the context what to do with the error.
When internal code throws an exception while an Exception is already active, it will automatically set the $previous
value to the active exception. This matches the:
It MUST set the
$previous
property to the original exception when doing so.
rule from the Throwable policy RFC (https://externals.io/message/127340) that is very likely to pass. Thus it is super easy to wrap the exception during unserialization (and probably also in Soap). You can see this pattern in action at:
php-src/ext/standard/password.c
Lines 87 to 91 in c4fba37
if (FAILURE == php_random_bytes_throw(ZSTR_VAL(buffer), ZSTR_LEN(buffer))) { | |
zend_value_error("Unable to generate salt"); | |
zend_string_release_ex(buffer, 0); | |
return NULL; | |
} |
Where the Random\RandomException
will be wrapped into the ValueError
.
Generally speaking, APIs that do not throw an Exception on error should become more rare going forward, thus throwing an exception is the "standard case". In case an exception is absolutely undesired either a silent
parameter would work or alternatively the exception can be suppressed with zend_clear_exception()
after extracting the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round, didn't check the tests yet
I addressed most review comments (that I marked with 👀 ). I'll continue tomorrow with the rest of the comments. Niels, please let me know if you'd prefer to resolve your comments yourself or I should do it. I'll resolve the obvious ones for Tim tomorrow. |
It's fine by me if you resolve them. If you'd like some extra hands helping with the coding just hmu. |
The includes are not on the system path, but should be taken from the build include paths.
I'll push the fix for the test failures later today |
Question btw: Why are the "Cannot recompose %s to string" exceptions of type \Exception instead of \Uri\UriException ? I tried to find this in the RFC but couldn't |
I'm not sure anymore why I used a regular exception, but maybe because the possible errors are not something people want to catch (since normally, recomposition should always be successful. If not, then there's some legitimate exceptional situation). Currently, this exception cannot be triggered, as neither lexbor, nor uriparser can fail (only memory errors can happen, but we bail out (I hope I used the correct term) then anyway). However, possible 3rd party implementations may fail (the example scenario I can bring up is when an URI parser is not integrated into PHP's memory handler), so I kept the error handling in place. BTW this practice was questioned by Tim here: #18672 (comment) |
ext/uri/php_uri.c
Outdated
PHPAPI void php_uri_instantiate_uri( | ||
INTERNAL_FUNCTION_PARAMETERS, const uri_handler_t *handler, const zend_string *uri_str, const zend_object *base_url_object, | ||
bool should_throw, bool should_update_this_object, zval *errors_zv | ||
) { | ||
zval errors; | ||
ZVAL_UNDEF(&errors); | ||
|
||
void *base_url = NULL; | ||
if (base_url_object != NULL) { | ||
uri_internal_t *internal_base_url = uri_internal_from_obj(base_url_object); | ||
URI_ASSERT_INITIALIZATION(internal_base_url); | ||
base_url = internal_base_url->uri; | ||
} | ||
|
||
void *uri = handler->parse_uri(uri_str, base_url, should_throw || errors_zv != NULL ? &errors : NULL); | ||
if (UNEXPECTED(uri == NULL)) { | ||
if (should_throw) { | ||
throw_invalid_uri_exception(handler, &errors); | ||
zval_ptr_dtor(&errors); | ||
RETURN_THROWS(); | ||
} else { | ||
if (pass_errors_by_ref(errors_zv, &errors) == FAILURE) { | ||
RETURN_THROWS(); | ||
} | ||
RETURN_NULL(); | ||
} | ||
} | ||
|
||
if (pass_errors_by_ref(errors_zv, &errors) == FAILURE) { | ||
RETURN_THROWS(); | ||
} | ||
|
||
uri_object_t *uri_object; | ||
if (should_update_this_object) { | ||
uri_object = Z_URI_OBJECT_P(ZEND_THIS); | ||
} else { | ||
if (EX(func)->common.fn_flags & ZEND_ACC_STATIC) { | ||
object_init_ex(return_value, Z_CE_P(ZEND_THIS)); | ||
} else { | ||
object_init_ex(return_value, Z_OBJCE_P(ZEND_THIS)); | ||
} | ||
uri_object = Z_URI_OBJECT_P(return_value); | ||
} | ||
|
||
uri_object->internal.handler = handler; | ||
uri_object->internal.uri = uri; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to my remark regarding throw_invalid_uri_exception
, I have a feeling that trying to handle the constructor in a generic fashion internally leads to great complexity that we don't need and that is not helping folks. The external API is different for good reason, so trying to unify the internal API is going to result in very awkward logic.
My suggestion would be the following:
- Expose
lexbor_parse_uri()
directly: Users that know they need WHATWG semantics can then use the function directly and get improved type-safety. - Make the
parse_uri
handler a wrapper aroundlexbor_parse_uri()
: This one exposes a simple APIvoid *parse_uri(zend_string *uri, bool silent)
for generic internal usage where the exact implementation does not matter. IfNULL
is returned an error happened. - Make the constructor for
Uri\WhatWg\Url
uselexbor_parse_uri()
(and possibly fill_errors) directly.
My reasoning is that folks that use the “generic” API will not be able to meaningfully make use of the errors
value, since the possible errors depend on the URI implementation. The only thing they can do is pass it to the create_invalid_uri_exception
handler, which then is able to make use of it. We can thus avoid that layer of indirection.
Trying to handle things generically that are quite different in the details is also something that burned me in ext/random, specifically the seeding. That's why every engine now exposes engine-specific php_random_*_seed*
functions.
I was thinking something like this:
diff --git i/ext/uri/php_lexbor.c w/ext/uri/php_lexbor.c
index ba8f1440e0b..0703e7be228 100644
--- i/ext/uri/php_lexbor.c
+++ w/ext/uri/php_lexbor.c
@@ -537,27 +537,39 @@ void lexbor_request_shutdown(void)
lexbor_urls = 0;
}
-static void *lexbor_parse_uri(const zend_string *uri_str, const void *base_url, zval *errors)
+lxb_url_t *lexbor_parse_uri(const zend_string *uri_str, const void *base_url, zval *errors, bool silent)
{
lexbor_cleanup_parser();
+ zval z;
+ if (!silent && errors == NULL) {
+ errors = &z;
+ }
const lxb_url_t *lexbor_base_url = base_url;
lxb_url_t *url = lxb_url_parse(lexbor_parser, lexbor_base_url, (unsigned char *) ZSTR_VAL(uri_str), ZSTR_LEN(uri_str));
fill_errors(errors);
+ if (url == NULL && !silent) {
+ zval exception;
+
+ object_init_ex(&exception, uri_whatwg_invalid_url_exception_ce);
+
+ zval value;
+ ZVAL_STRING(&value, "URL parsing failed");
+ zend_update_property_ex(uri_whatwg_invalid_url_exception_ce, Z_OBJ(exception), ZSTR_KNOWN(ZEND_STR_MESSAGE), &value);
+ zval_ptr_dtor_str(&value);
+
+ zend_update_property(uri_whatwg_invalid_url_exception_ce, Z_OBJ(exception), ZEND_STRL("errors"), errors);
+
+ zend_throw_exception_object(&exception);
+ }
+
return url;
}
-static void lexbor_create_invalid_uri_exception(zval *exception_zv, zval *errors)
+static void *parse_uri(const zend_string *uri_str, bool silent)
{
- object_init_ex(exception_zv, uri_whatwg_invalid_url_exception_ce);
-
- zval value;
- ZVAL_STRING(&value, "URL parsing failed");
- zend_update_property_ex(uri_whatwg_invalid_url_exception_ce, Z_OBJ_P(exception_zv), ZSTR_KNOWN(ZEND_STR_MESSAGE), &value);
- zval_ptr_dtor_str(&value);
-
- zend_update_property(uri_whatwg_invalid_url_exception_ce, Z_OBJ_P(exception_zv), ZEND_STRL("errors"), errors);
+ return lexbor_parse_uri(uri_str, NULL, NULL, silent);
}
static void *lexbor_clone_uri(void *uri)
@@ -599,8 +611,7 @@ static void lexbor_free_uri(void *uri)
const uri_handler_t lexbor_uri_handler = {
.name = URI_PARSER_WHATWG,
- .parse_uri = lexbor_parse_uri,
- .create_invalid_uri_exception = lexbor_create_invalid_uri_exception,
+ .parse_uri = parse_uri,
.clone_uri = lexbor_clone_uri,
.uri_to_string = lexbor_uri_to_string,
.free_uri = lexbor_free_uri,
diff --git i/ext/uri/php_lexbor.h w/ext/uri/php_lexbor.h
index 24f4fbfffe5..e3ce19161b9 100644
--- i/ext/uri/php_lexbor.h
+++ w/ext/uri/php_lexbor.h
@@ -22,6 +22,8 @@
extern const uri_handler_t lexbor_uri_handler;
+lxb_url_t *lexbor_parse_uri(const zend_string *uri_str, const void *base_url, zval *errors, bool silent);
+
void lexbor_module_init(void);
void lexbor_module_shutdown(void);
diff --git i/ext/uri/php_uri.c w/ext/uri/php_uri.c
index 8493d0136cf..43dd8158beb 100644
--- i/ext/uri/php_uri.c
+++ w/ext/uri/php_uri.c
@@ -218,11 +218,9 @@ PHPAPI void php_uri_instantiate_uri(
base_url = internal_base_url->uri;
}
- void *uri = handler->parse_uri(uri_str, base_url, should_throw || errors_zv != NULL ? &errors : NULL);
+ lxb_url_t *uri = lexbor_parse_uri(uri_str, base_url, errors_zv != NULL ? &errors : NULL, !should_throw);
if (UNEXPECTED(uri == NULL)) {
if (should_throw) {
- throw_invalid_uri_exception(handler, &errors);
- zval_ptr_dtor(&errors);
RETURN_THROWS();
} else {
if (pass_errors_by_ref(errors_zv, &errors) == FAILURE) {
And the implementation for Rfc3986 would have a rfc3986_parse_uri()
that does not have an errors
parameter, since it can't emit detailed errors.
ext/uri/php_lexbor.c
Outdated
return SUCCESS; | ||
} | ||
|
||
static ZEND_RESULT_CODE init_idna(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest being consistent about using ZEND_RESULT_CODE
or zend_result
static ZEND_RESULT_CODE init_idna(void) | |
static zend_result init_idna(void) |
ext/uri/php_lexbor.c
Outdated
|
||
zval_string_or_null_to_lexbor_str(value, &str); | ||
|
||
if ( lxb_url_api_search_set(lexbor_uri, lexbor_parser, str.data, str.length) != LXB_STATUS_OK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ( lxb_url_api_search_set(lexbor_uri, lexbor_parser, str.data, str.length) != LXB_STATUS_OK) { | |
if (lxb_url_api_search_set(lexbor_uri, lexbor_parser, str.data, str.length) != LXB_STATUS_OK) { |
ext/uri/php_uri.c
Outdated
RETURN_THROWS(); | ||
} | ||
|
||
zend_update_property(uri_whatwg_url_validation_error_ce, Z_OBJ_P(ZEND_THIS), ZEND_STRL("type"), type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zend_update_property(uri_whatwg_url_validation_error_ce, Z_OBJ_P(ZEND_THIS), ZEND_STRL("type"), type); | |
zend_update_property_ex(uri_whatwg_url_validation_error_ce, Z_OBJ_P(ZEND_THIS), ZSTR_KNOWN(ZEND_STR_TYPE), type); |
ext/uri/php_uri.c
Outdated
internal_uri->uri = internal_uri->handler->parse_uri(Z_STR_P(uri_zv), NULL, &errors); | ||
if (internal_uri->uri == NULL) { | ||
zend_throw_exception_ex(NULL, 0, "Invalid serialization data for %s object", ZSTR_VAL(object->ce->name)); | ||
zval_ptr_dtor(&errors); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading the API for internal_uri->handler->parse_uri()
correctly, the errors parameter can be set to NULL
if the errors are not going to be used. Here, they are not used, on either success or failure - suggest omitting them entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very nice catch, thanks!
ext/uri/php_lexbor.c
Outdated
|
||
zval error_type; | ||
zend_enum_new(&error_type, uri_whatwg_url_validation_error_type_ce, error_str, NULL); | ||
zend_update_property(uri_whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("type"), &error_type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zend_update_property(uri_whatwg_url_validation_error_ce, Z_OBJ(error), ZEND_STRL("type"), &error_type); | |
zend_update_property_ex(uri_whatwg_url_validation_error_ce, Z_OBJ(error), ZSTR_KNOWN(ZEND_STR_TYPE), &error_type); |
I don't immediately spot something that still needs changing. |
This makes sense to me :)
Yes, I would also prefer to merge it soon, so that the rest of the changes can also be continued (and of course this code can still be fine tuned later). First, I'd probably create a PR for the uriparser stuffs without the write handlers, than probably the internal API related code. I'm still thinking about Tim's suggestion above (#18672 (comment)). I'm currently playing with the idea. I like the fact that the exception handler wouldn't be needed anymore, but in the same time, all the write handlers should also take care of throwing the exception themselves. |
I don't have strong preference for either design decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay for me.
Please add an entry to UPGRADING.INTERNALS for the zend_exceptions.c zend_update_exception_properties API addition
Relates to #14461 and https://wiki.php.net/rfc/url_parsing_api