Skip to content

Clarity about embedded HTML and escaping #96

Open
@spookylukey

Description

@spookylukey

Some examples use HTML snippets in the message e.g. http://projectfluent.org/fluent/guide/text.html

description =
    Loki is a simple micro-blogging
    app written entirely in <i>HTML5</i>.
    It uses FTL to implement localization.

The question then is what happens when this is used. I would not expect fluent to not do any HTML escaping. It it therefore up to the bindings to always HTML-escape the entire returned string when it is inserted into the DOM (client-side) or into a chunk of HTML (server-side). If the message contains any interpolated user supplied input, this is vital for correctness and security (XSS etc.), but in any case we should not be expecting translators to have to know HTML syntax and manually escape ampersands etc.

However, with the above message, the HTML tags would end up as &lt;i&gt;HTML5&lt;/i&gt; which would be rendered as <i>HTML5</i> rather than HTML5 - this is not what the example implies to me.

Looking around in this repo, it seems the current consensus is in agreement with what I've outlined above (see projectfluent/play#2 for example), and therefore it is the examples that are misleading/confusing.

This leaves the problem of what happens when a translated string actually needs to embed HTML. This seems to be one solution: #16 (comment) . A more lightweight but less robust solution I had been thinking about was a name convention (e.g. any message id that ends -html is treated as HTML, anything else not).

It is vital for this to be really well defined (and simple to implement), otherwise you end up with XSS, or double escaping, or being unable to embed HTML in translated messages. I'm considering an implementation in Elm, and the only practical way it would work would be to compile FTL messages to Elm functions. For this to work, we'd need to know for every message what type of output (text/HTML) it was returning so that it can have the correct type signature. I'm also considering a Python implementation that would integrate into a Django project, and we'd again need to know very explicitly whether something is returning HTML or plain text.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions