Skip to content

e-mail messages with application/rtf body are imported as attachments, not message body #3897

@vsessink

Description

@vsessink

While importing an e-mail-archive in the (IMHO cursed) .PST-format, I came across a mailbox having all application/rtf for body type.

Content-Type: application/rtf
Content-Transfer-Encoding: base64
Content-Disposition: attachment; 
        filename*=utf-8''rtf-body.rtf;
        filename="rtf-body.rtf"

Yep, that's right: Content-Disposition: attachment, but still this is the actual e-mail body.

Now in Aleph, these messages will show up as empty, with rtf-body.rtf document as attachment.

I tried to work around it by unpacking the mail archive manually with readpst; then fixing the messages with a small python script (essentially replacing the rtf part with an html part. I used python's email.parser and simply checked if the first content_type would be application/rtf - if so, pipe that through unrtf and repack the message. Filthy, but working for the mail box itself).

This workaround would not help in Aleph, because the mime detection wizardry afterwards recognized text/html for mime type, instead of message/rfc822 - and actual attachments of the message would not be recognized anymore.

The latter may count as a separate bug: a message that starts with the following should IMHO not be detected as text/html?

Status: RO
User-Agent: none
From: "Firstname Lastname" <MAILER-DAEMON>
Subject: FW: Ticket 08-05
To:  Name (Company Name)
Date: Tue, 09 May 2022 14:41:50 +0000
Message-Id: <AM5PR04MB53161D02E214FEEC35C2156FB6466@AM0PR04MB9122.eurprd02.prod.outlook.com>
X-libpst-forensic-sender: /O=EXCHANGELABS/OU=EXCHANGE ADMINISTRATIVE GROUP (FYDIBOHF23SPDLT)/CN=RECIPIENTS/CN=09C4BB2213F35544FEBBBBF1FD14B522
MIME-Version: 1.0
Content-Type: multipart/mixed;
        boundary="--boundary-LibPST-iamunique-887075155_-_-"


----boundary-LibPST-iamunique-887075155_-_-
Content-Type: text/html; charset="utf-8"

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}

Metadata

Metadata

Assignees

No one assigned

    Labels

    ModerateIssue that may require attentionbugThings that should work, but don’tingest-file

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions