ParseException Content-Disposition filename spaces

Hello All,

I have an email generated from Thunderbird 1.5.0.9 (Windows/20061207) which contains an attachment whose filename has spaces.

JavaMail (1.4) throws a javax.mail.internet.ParseException on the MimeBodyPart.getDisposition() call:

javax.mail.internet.ParseException: Expected';', got"-"

javax.mail.internet.ParameterList.<init>(ParameterList.java:179)

javax.mail.internet.ContentDisposition.<init>(ContentDisposition.java:87)

javax.mail.internet.MimeBodyPart.getDisposition(MimeBodyPart.java:1039)

javax.mail.internet.MimeBodyPart.getDisposition(MimeBodyPart.java:299)

The offending attachment part has a part header which looks like this:

Content-Disposition: inline;

filename*0=Test - Test.pdf

Clearly the parser is failing due to the spaces in the filename. Bugzilla for Thunderbird (Bug 221028 - https://bugzilla.mozilla.org/show_bug.cgi?id=221028) discusses this issue, however the status of the bug is VERIFIED WONTFIX.

According to the Mozilla discussion, their implementation is conformant to the RFC:

...Just two days ago I was talking about this issue in #mozillazine with Christian

Biesinger and Boris Zbarsky, and they said that this is the proper behaviour

according to the RFC...

Whether it IS conformant to all relevant RFCs I don't know, but it seems it won't be changed any time soon.

This issue also exists for the Content-Type header in the same email which has the same format:

Content-Type: application/pdf;

name*0=Test - Test.pdf

Anyone know if there is a way I can "safely" parse these headers? Is this a JavaMail bug?

Thanks

[1783 byte] By [jasonpolitesa] at [2007-11-26 20:53:49]
# 1

OK... For anyone having similar issues, this appears to have been fixed in the latest source tree for javamail (1.4.1ea)

But... this release appears to have some other problems (or valid changes?) as it seems to fail to find all parts in a message, so I'm not certain that the problem is fixed, or whether it's just "avoiding" the problem due to parts being missed.

Mr Shannon? any ideas?

jasonpolitesa at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 2

This is a bug in Thunderbird, please file it. Parameter values containing spaces

must be quoted.

The bug you referenced seems to be complaining about the same thing you're

complaining about - that mozilla fails to parse headers with unquoted filenames

containing spaces.

Apparently this is also a bug in Apple Mail. I have a contributed fix that works

around this bug, but I haven't had a chance to push it out yet.

As for other bugs in 1.4.1ea, please tell me more. Failure to parse incorrect

headers can cause all sorts of cascading failures, but if you're having a problem

with valid messages, I'd like to see the details.

bshannona at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 3

I have created a work-around in my code which simply catches exceptions thrown when extracting dispositions/content-types etc, then handles nulls more gracefully. Not ideal, but good enough for now.

As for the changes in JavaMail (1.4-> 1.4.1ea) I have narrowed down the use case. It seems that in 1.4 a call to getContent() from the root Part returns a MimeMultipart (in the case where a multipart email is used), but returns a javax.mail.util.SharedByteArrayInputStream in 1.4.1ea. My code is expecting an instance of Multipart and thus fails with 1.4.1ea.

I have created a test case to reproduce, which consists of a test class and a test email (which also shows the disposition problem from Thunderbird).

Shall I send the test case to your sun address?

jasonpolitesa at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 4

I'm seeing the same problem, the parser does not like Content-Disposition field below.

_=_NextPart_003_01C77B90.D442C080

Content-Type: application/pdf;

name="Scan1852.pdf"

Content-Transfer-Encoding: base64

Content-Description: Scan1852.pdf

Content-Disposition: filename="Scan1852.pdf";

filename="Scan1852.pdf"

It throws:

javax.mail.internet.ParseException: Expected ';', got "="

at javax.mail.internet.ParameterList.<init>(ParameterList.java:179)

at javax.mail.internet.ContentDisposition.<init>(ContentDisposition.java:87)

at javax.mail.internet.MimeBodyPart.getFileName(MimeBodyPart.java:1099)

at javax.mail.internet.MimeBodyPart.getFileName(MimeBodyPart.java:509)

etc...

However, I don't see any reference to Thunderbird in the MIME headers (should I be looking for that in the raw message, or did you ask the sender what email client he used?)

Also, for your workaround - did you add this catch-ignore to the ContentDisposition class in JavaMail, or somewhere else? Can you provide any details?

i_program_java_all_daya at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 5
That's because that Content-Disposition is completely bogus!What software created that message?Most likely what they meant was:Content-Disposition: attachment;filename="Scan1852.pdf"
bshannona at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 6

Not sure what software created that particular one, but I just received another one that was created by AppleMail (as you mention above has the same problem):

javax.mail.internet.ParseException: Expected ';', got "SHEET.PDF"

at javax.mail.internet.ParameterList.<init>(ParameterList.java:179)

at javax.mail.internet.ContentType.<init>(ContentType.java:100)

etc

i_program_java_all_daya at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 7
Looks like AppleMail is really broken. Sadly, this isn't the first time.Maybe someone could pass along a copy of the RFC to them?...
bshannona at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 8

Just received another. This one does not indicate AppleMail or Thunderbird anywhere in the headers.

--MailMan_Boundary

Content-Type: text/richtext; name=Jana's application 4 cell phone.rtf

Content-Transfer-Encoding: base64

Content-Disposition: attachment; filename=Jana's application 4 cell phone.rtf

e1xydGYxXGFuc2lcYW5zaWNwZzEyNTJcZGVmZjBcZGVmbGFuZzEwMzN7XGZvbnR0Ymx7XGYwXGZy

b21hblxmY2hhcnNldDAgVGltZXMgTmV3IFJvbWFuO317XGYxXGZzd2lzc1xmY2hhcnNldDAgVGFo

(etc....)

Which throws:

javax.mail.internet.ParseException: Expected ';', got "APPLICATION"

at javax.mail.internet.ParameterList.<init>(ParameterList.java:179)

at javax.mail.internet.ContentType.<init>(ContentType.java:100)

I assume in this case the mail client is not correctly escaping the single quote in the file name.

I've been reading the RFC's but they seem to be very vague to me (I'm a noob to ABNF). In any case, if 1.4.1 has a workaround - I'll be trying out that release soon.

Message was edited by:

i_program_java_all_day

i_program_java_all_daya at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 9

I've had reports that both AppleMail and Thunderbird will create incorrect

headers of that form. The problem is that the spaces in the filename require

the filename to be quoted.

Using the latest JavaMail 1.4.1 development version available from the

java.net maven repository, you can set the System property "mail.mime.applefilenames"

to "true" and JavaMail will try to work around this error.

bshannona at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 10

Jon Postel's robustness principle: http://en.wikipedia.org/wiki/Robustness_Principle begs the question if such a parameter should be the default behavior. JavaMail appears to be a excellent conservative producer (of smtp messages), but not a liberal consumer (of IMAP, for example). IMHO, it would be better to accept these kinds of wrongdoings by default, and only reject them (throw exceptions) if a nondefault parameter is set. Most mail processing applications need to read all incoming mail when at all possible, while I would think fewer would require adherence to strict RFC.

Of course, another problem is that when you are a good liberal consumer, you become the defacto standard. Which then, encourages more bad producers.

i_program_java_all_daya at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 11

This is a game I can't win.

Some customers complain that JavaMail doesn't strictly enforce

every requirement of the standards. Others complain that JavaMail

does work with every broken mail server out there.

Even though there are cases where JavaMail can be more tolerant,

there are forms of brokenness that go beyond what can be handled

at the JavaMail API level. Ultimately, the application using JavaMail

has to make a decision about how much brokenness it wants to

handle. Having made that decision it's relatively straightforward to

configure JavaMail appropriately.

But for developers who don't think about this problem, I generally

prefer that JavaMail doesn't try to hide the brokenness to avoid

propagating brokenness as you suggest.

bshannona at 2007-7-10 2:20:20 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...
# 12

Absolutely, and please don't take my posting as a complaint - well over 99.9% of our emails are consumed flawlessly. I'm expressing more of (my opinion only) a high-level philosophical argument about of why I think good software (JavaMail) should try to by-default interact with not-so-good software. I'm not basing this not on "technical correctness", but rather the following unfortunate scenario:

One particular example would be certain products made by a company with the initals "MS". These products almost always hide brokenness (they act as highly liberal consumers). Outlook, Internet Explorer, are the 2 primary examples. So a user (and unfortunately most users have never read or care about an RFC) opens the email with Outlook, or the web page with Explorer, etc. It "works". Now, we programmers certainly know better, but remember that we aren't buying or choosing products or services, we're offering them. User then opens with Java-based product (stack trace). User buys or chooses MS-based product, MS-based product retains market stranglehold, commercial email providers test their products with Outlook, and in the end, brokenness is propagated anyway. It's kind of like reverse-Darwinism for software (survival of the most-broken).

Some customers complain that JavaMail doesn't strictly enforce every requirement of the standards

As an SMTP producer, I might see that, but personally I can never recall seeing a single problem with outgoing email from JavaMail. As a client consumer, I'm not sure why they would take that philosophy, unless maybe they are using JavaMail to test RFC-compliance.

there are forms of brokenness that go beyond what can be handled at the JavaMail API level.

Of course, but I have yet to see a message that the "MS" client could not read. So perhaps they are working around some of these at a higher level (in the GUI, maybe?)

But for developers who don't think about this problem...

Most developers most likely aren't using JavaMail to read in over 25,000 emails a day. So admittedly, I'm in a unique situation.

Having made that decision it's relatively straightforward to configure JavaMail appropriately. Not really, one change required updating the jar file, others require obscure uses of the API. And a developer/administrator is only going to make those changes after seeing the problem.

to avoid propagating brokenness as you suggest.

Brokenness will get propagated anyway (see reverse-Darwinism, or the Theory of De-evolution above).

I'm looking forward to trying out 1.4.1, and thanks again for all the assistance, a very thorough FAQ, and overall a great API. I think the opensourcing was a good idea, JavaMail really is by far the best API that I have seen for this type of high-volume work.

i_program_java_all_daya at 2007-7-10 2:20:21 > top of Java-index,Enterprise & Remote Computing,Enterprise Technologies...