Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Describe how to provide metada #39

Closed
qsorix opened this issue Oct 13, 2014 · 14 comments
Closed

Describe how to provide metada #39

qsorix opened this issue Oct 13, 2014 · 14 comments

Comments

@qsorix
Copy link

qsorix commented Oct 13, 2014

This issue is to define section "6.4. Metadata" of the protocol, i.e. how to provide additional meta information when uploading files.

I'm going to start with my use case. I think I need metadata to let a client specify what kind of file it is uploading, so that server can trigger appropriate processing. Let's say clients can upload two types of files: measurement data and logfiles.

What do you think about including this in the file creation request, as POST data? E.g. filetype=measurements or filetype=logs?

POST /v1/files HTTP/1.1
Entity-Length: 3000
Content-Length: 21
Content-Type: application/x-www-form-urlencoded

filetype=measurements

When using POST request's content to send metadata, there could be some conflicts in case protocol wanted to include some metadata as well, e.g. a checksum. In such case some naming convention should be used to avoid conflicts. E.g. keys of metadata defined in the protocol must start with a prefix "tus-". This means that users are free to pick any names they want, but namespace "tus-*" is reserved for future use by the protocol.

Your thoughts?

@tim-kos
Copy link
Member

tim-kos commented Nov 26, 2014

Hm, this seems like a good suggestions. I don't think there is any header that we could use in order to transfer metadata in any meaningful way, so sending this in the POST request as POST fields makes sense.

Regarding the checksum: valid point. We would need to see how this is implemented, but a tus-* namespace could already do the trick.

@Acconut
Copy link
Member

Acconut commented Dec 2, 2014

I don't think there is any header that we could use in order to transfer metadata in any meaningful way, so sending this in the POST request as POST fields makes sense.

I agree. It's not necessary to put all the stuff into headers if you have a body available.

Instead of providing a prefix we could provide a namespace, e.g. in case of application/x-www-form-urlencoded:

checksum=4g5...ht567&meta=filetype:measurements&meta=encoding:utf8

or

checksum=4g5...ht567&meta[]=filetype:measurements&meta[]=encoding:utf8

Instead of urlencoded we could also use another syntax, e.g. JSON:

{
    "checksum": "ug5...43t",
    "meta": {
        "filetype": "measurements",
        "encoding": "utf8"
    }
}

@MarkMurphy
Copy link

I think it IS necessary to put it in a header (or at least support that method) in order to support JSON APIs

@Acconut
Copy link
Member

Acconut commented Dec 3, 2014

@MarkMurphy Could you please give reasons for this thinking? I'm not able to understand it fully.
About which header are you speaking (maybe even file naming)?

@MarkMurphy
Copy link

@Acconut Sure. In the context of a json api and where metadata is required by the server, how would you invision that the client send both binary data from a file and metadata in the same request?

@Acconut Acconut added this to the 1.0 milestone Dec 16, 2014
@Acconut
Copy link
Member

Acconut commented Dec 16, 2014

As I mentioned in #38 (comment) there is a problem with putting meta data in the headers:

Another point against putting metadata in the headers is that some or most server (including nginx and Apache) limit the maximum size of headers (e.g. nginx limits all headers to 4KB). If you have a lot of metadata this could be a potential problem whereas the size of the body isn't hard limited.

Regardless of whether we go with the header or the body approach we need to agree on one (or even multiple formats). Going with querystring-encoded (application/x-www-form-urlencoded) data or a format similar to INI-files would be the easiest since they are supported widely but don't provide much functionality. More flexible solutions would be JSON (or even XML) but we must consider the implementations.

@Acconut
Copy link
Member

Acconut commented Dec 22, 2014

@qsorix Any opinion?

@Acconut
Copy link
Member

Acconut commented Jan 3, 2015

See #47 for the proposed changes.

@qsorix
Copy link
Author

qsorix commented Jan 4, 2015

Again, sorry for a late response.

@Acconut, I don't have a strong opinion. A few random thoughts:

It is impossible to tell what kind of metadata will be required by different applications, so picking JSON over www-form-urlencoded or whatever makes little sense. If we cannot make a sensible decision, let's leave it undecided. (So I would not mention JSON in the protocol itself, I guess.)

Some metadata are useful to the protocol itself and generic enough that we can think about it. Examples mentioned already were the checksum and filename. Perhaps using dedicated headers for those limited cases is not a bad idea after all.

The way you wrote it about an optional JSON's body is in the right direction I think. However, I would maybe write something along these lines instead:

The TUS protocol will never use (bold declaration) the body of the POST request.
It is purposefully left empty so implementations can use it.

Implementations MAY use the body to transfer any kind of metadata about the file.
In such case, Content-Type should be set appropriately. The server MAY
decide to ignore or use this information to further process the request or to
reject it.

Implementations MUST not use the body of POST request to send the file itself.
(I feel like such use would have to be standardized, and it contradicts the first
sentence, so should a need arise, a different end-point that accepts immediate uploads
shall be introduced.)

So that leaves the body for the implementations, e.g. to send a thumbnail of a huge file that is being uploaded. And in headers, we could just...

TUS-Metadata: checksum=md5:abcdef1234
TUS-Metadata: filename=foobar.txt

(Actually, I think that if we use headers, the format should be thought of carefully. Perhaps more like TUS-Metadata: <property> <base64 encoded value> or sth. We can discuss the format further if anyone likes the general idea.)

These are just thoughts to maybe inspire other ideas. I'm not even sure if I like it myself.

@Acconut
Copy link
Member

Acconut commented Jan 5, 2015

I'm aware that standardizing the different properties and their values is not possible and simply useless since they depend on the different needs of the applications. Checksums is a special case and shouldn't be included into the metadata since it deserves its own extension (see #7).

I'm, too, not sure whether to choose between the body and headers. Latter solution has the problem of a limited length and the need to escape special characters (e.g. base64 encode, as you mentioned). Putting metadata in the body solves both of these problems but prevents the usage of it for e.g. uploading the first chunk of data (as @MarkMurphy suggested).

Frankly, JSON is not the best solution but the best I could find.

@Acconut
Copy link
Member

Acconut commented Jan 20, 2015

After thinking about this and an internal discussion I am currently moving towards using the header version (against my "old" opinion). I mainly reason this allowing the usage of the request body and a simpler metadata structure (key-value and not deeply nested).

So my last commit f76419c now introduces the Metadata header. I decided against using the TUS- prefix to maintain consistency with other headers (Offset, Enitity-Length, etc).
In addition a base64 encoded value is used to allow any character. Speaking of this we may decide the base64 encode the key, too, or to disallow certain character (e.g. space, newlines).
And finally I'm not sure about separated key and value by a space or an equality sign but I'm not religious about that.

@qsorix
Copy link
Author

qsorix commented Jan 20, 2015

I like headers here. The option to use POST's body will still be open in the future.

Metadata without TUS- prefix is OK.

Base64 encoding of a value is a sensible step. Plain text is easier to debug, but being able to send arbitrary bytes may be important. https://www.ietf.org/rfc/rfc2047.txt solves the problem for email, but parsing "=?utf-8?b?Zm9v?=" makes implementation complicated. So I think base64 is the least bad option.

I am for keeping keys in plain text. I can't imagine this being a limitation and it will help debugging.

I would use space as a separator. It is cleaner visually, it is a default for split() functions, = appears in Base64 encoded values (at the end, as padding).

@Acconut
Copy link
Member

Acconut commented Jan 20, 2015

That's great to hear, especially your last finding:

= appears in Base64 encoded values (at the end, as padding).

The PR will be merged once enough time passes or we get more feedback.

@Acconut
Copy link
Member

Acconut commented Jan 26, 2015

Merged #47.

@Acconut Acconut closed this as completed Jan 26, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants