Description
Hi James, thanks for the continued work on maintaining ring.
Just want to throw this issue out there in case anyone else stumbles upon the same issue, plus gathering some feedback.
In our project, we were dealing with non utf-8 encoded forms (Shift_JIS to be specific), and found that the params middleware is garbling the incoming string, even though we were using the wrap-params middleware with the Shift_JIS option e.g. (param/wrap-params {:encoding "Shift_JIS"})
Looking at the source code deeper inside, I see that java.net.URLDecoder is being used for parsing, and that java.net.URLDecoder does not decode some non utf-8 strings correctly, while org.apache.commons.codec.net.URLCodec can, illustrated in below snippet.
;; "モジバケコワイ", URL encoded with Shif_JIS by the browser
"%83%82%83W%83o%83P%83R%83%8F%83C"
(import [java.net URLDecoder])
(URLDecoder/decode "%83%82%83W%83o%83P%83R%83%8F%83C" "Shift-JIS")
;; => "モ�W�o�P�Rワ�C"
(import [org.apache.commons.codec.net URLCodec])
(let [codec (URLCodec. "Shift-JIS")]
(.decode codec "%83%82%83W%83o%83P%83R%83%8F%83C" "Shift-JIS"))
;; => "モジバケコワイ"
We came up with 2 work arounds for this issue:
- Use org.apache.commons.codec.net.URLCodec instead of java.net.URLDecoder for decoding URL encoded parameteres
- Use a form with an enctype of multipart/form-data so that nothing gets encoded and thus avoid the problem entirely
Would appreciate the if I can get feed back on:
- Which of the above workaround is preferable?
- Would you be interested in a PR that replaces the decoder used in the params middleware with org.apache.commons.codec.net URLCodec?
Thanks!