|
Subject: [PATCH] restore: Be more liberal in which data to accept. Newsgroups: gmane.mail.notmuch.general Date: 2011-10-29 10:40:07 GMT (28 weeks, 4 days, 6 hours and 9 minutes ago)
From: Thomas Schwinge <thomas <at> schwinge.name>
There are ``Message-ID''s out in the wild that contain spaces.
---
Hi!
Carl, the main question for you is: does this break sup-import
operability?
Spammers are quite inventive for creating ``interesting Messages-ID''s.
Apparently, notmuch handles these fine internally, but it breaks a
dump/restore cycle:
$ notmuch restore < ~/tmp/Mail-notmuch_dump/dump
No filename given. Reading dump from stdin.
Warning: Ignoring invalid input line: 3791856948.991306994491 <at> m0.net
Received:fromdialup-62.215.274.4.dial1.stamford([62.215.274.4] ([...])
Warning: Ignoring invalid input line: PM200010:29:54 AM ([...])
Warning: Ignoring invalid input line: PM200010:51:48 AM ([...])
Warning: Ignoring invalid input line: PM200011:47:35 AM ([...])
Warning: Ignoring invalid input line: PM200011:48:46 AM ([...])
Warning: Ignoring invalid input line: PM200011:50:10 AM ([...])
Warning: Ignoring invalid input line: PM200012:21:05 AM ([...])
Warning: Ignoring invalid input line: PM200012:21:17 AM ([...])
Warning: Ignoring invalid input line: PM200012:21:18 AM ([...])
Warning: Ignoring invalid input line: PM200012:21:32 AM ([...])
Warning: Ignoring invalid input line: PM20001:48:38 PM ([...])
Warning: Ignoring invalid input line: PM20001:53:07 PM ([...])
Warning: Ignoring invalid input line: PM20004:01:48 AM ([...])
Warning: Ignoring invalid input line: PM20004:01:59 AM ([...])
Warning: Ignoring invalid input line: PM20004:10:44 AM ([...])
Warning: Ignoring invalid input line: PM20004:20:00 AM ([...])
Warning: Ignoring invalid input line: PM20005:06:50 PM ([...])
Warning: Ignoring invalid input line: PM20005:14:17 AM ([...])
Warning: Ignoring invalid input line: PM20005:32:15 PM ([...])
Warning: Ignoring invalid input line: PM20005:32:22 PM ([...])
Warning: Ignoring invalid input line: PM20005:33:05 PM ([...])
Warning: Ignoring invalid input line: PM20005:33:57 AM ([...])
Warning: Ignoring invalid input line: PM20006:24:12 AM ([...])
Warning: Ignoring invalid input line: PM20006:25:04 AM ([...])
Warning: Ignoring invalid input line: PM20006:25:49 AM ([...])
Warning: Ignoring invalid input line: PM20006:26:11 AM ([...])
Warning: Ignoring invalid input line: PM20007:05:34 PM ([...])
Warning: Ignoring invalid input line: PM2000PM 04:09:15 ([...])
Warning: Ignoring invalid input line: PM2000¿ÀÀü 11:07:41 ([...])
Warning: Ignoring invalid input line: PM2000¿ÀÈÄ 12:42:47 ([...])
Warning: Ignoring invalid input line: PM2000¿ÀÈÄ 12:42:48 ([...])
Warning: Ignoring invalid input line: PM2000¿ÀÈÄ 5:58:28 ([...])
Warning: Ignoring invalid input line: PM2000¿ÀÈÄ 6:30:51 ([...])
Warning: Ignoring invalid input line: Prospect Mailer 20000:37:04 ([...])
Warning: Ignoring invalid input line: Prospect Mailer 20000:37:09 ([...])
Warning: Ignoring invalid input line: Prospect Mailer 20000:37:11 ([...])
Warning: Ignoring invalid input line: Prospect Mailer 20000:37:12 ([...])
Warning: Ignoring invalid input line: Prospect Mailer 20000:37:45 ([...])
Warning: Ignoring invalid input line: Prospect Mailer 20000:38:10 ([...])
Thus, dump; remove all tags; restore is not nullipotent, which it should
be.
Especially noteworthy is probably the first one: it happens to have
gotten a Received line mangled into the Message-ID, and it ends with a
space character.
Some more from the freak show:
$MESSAGE_ID ([...])
%CUSTOM_CHAR[8-10]$%CUSTOM_CHAR[8-10]$%CUSTOM_CHAR[8-10]@%CUSTOM_DOMAIN.msn.com ([...])
%RNDDIGIT1025.%RNDDIGIT15%RNDLCCHAR15%RNDDIGIT110%RNDLCCHAR13@ ([...])
%RNDDIGIT1025.%RNDDIGIT15%RNDLCCHAR15%RNDDIGIT110ucp <at> yahoo.com ([...])
%RNDDIGIT1025.%RNDDIGIT15%RNDLCCHAR15%RNDDIGIT110vs <at> yahoo.com ([...])
%RNDDIGIT27eq52md1$9rg57p%RNDDIGIT14$277ts40lsh@%RNDWORD13ivo4068 ([...])
%RNDDIGIT27g10u874$3cqh62f%RNDDIGIT14$7fgo121wnwt@%RNDWORD13quw32712 ([...])
%RNDDIGIT27mog75vx711$541xqm480xc%RNDDIGIT14$031nq1pk@%RNDWORD13av2979 ([...])
%RNDDIGIT27nqf761drk7$7l4mza%RNDDIGIT14$96ijq17zq@%RNDWORD13b1779 ([...])
%RNDDIGIT27q0tcg10$94pcn1mw%RNDDIGIT14$7x77pztx@%RNDWORD13ny7619 ([...])
%RNDDIGIT27uiw866tv49$5c3rg%RNDDIGIT14$6jl43vv@%RNDWORD13uwh17820 ([...])
%RNDDIGIT27x966lug3$0pr016r%RNDDIGIT14$8ye15k@%RNDWORD13qps90907 ([...])
%RNDDIGIT310%RNDLCCHAR15%RNDDIGIT15%RNDLCCHAR15$%RNDDIGIT17%RNDDIGIT13%RNDLCCHAR13%RNDDIGIT13$%RNDDIGIT15%RNDLCCHAR13%RNDDIGIT13%RNDLCCHAR13%RNDDIGIT13@ ([...])
%RNDDIGIT310%RNDLCCHAR15%RNDDIGIT15%RNDLCCHAR15$%RNDDIGIT17%RNDDIGIT13%RNDLCCHAR13%RNDDIGIT13$%RNDDIGIT15%RNDLCCHAR13%RNDDIGIT13%RNDLCCHAR13%RNDDIGIT13 <at> bambi ([...])
%RNDDIGIT310%RNDLCCHAR15%RNDDIGIT15%RNDLCCHAR15$%RNDDIGIT17%RNDDIGIT13%RNDLCCHAR13%RNDDIGIT13$%RNDDIGIT15%RNDLCCHAR13%RNDDIGIT13%RNDLCCHAR13%RNDDIGIT13 <at> wheelchair ([...])
%RNDDIGIT520.%RNDDIGIT110.%RNDDIGIT110 <at> -%RNDLCCHAR13%RNDDIGIT13. ([...])
%RNDDIGIT520.%RNDDIGIT110.%RNDDIGIT110 <at> -hi3.yahoo.com ([...])
%RNDDIGIT520.%RNDDIGIT110.%RNDDIGIT110 <at> -xz24.yahoo.com ([...])
%RNDDIGIT520.%RNDDIGIT110.%RNDDIGIT110 <at> lutanist-%RNDLCCHAR13%RNDDIGIT13.msn.com ([...])
%RNDDIGIT520.%RNDDIGIT110.%RNDDIGIT110 <at> millipede-jfq402.yahoo.com ([...])
%RNDDIGIT520.%RNDDIGIT110.%RNDDIGIT110 <at> referenda-sgw04.yahoo.com ([...])
%RNDDIGIT715.h8OheY%RNDDIGIT28 <at> proffer5.o'brien%RNDDIGIT2yahoo.com ([...])
%RNDDIGIT715.jt36NNBvbF%RNDDIGIT28 <at> schematic5.myers%RNDDIGIT2yahoo.com ([...])
%RNDDIGIT715.wz394MICrdY%RNDDIGIT28 <at> agriculture6.city%RNDDIGIT2yahoo.com ([...])
%RNDLCCHAR13%RNDDIGIT13%RNDLCCHAR13%RNDDIGIT13-%RNDDIGIT520-%RNDDIGIT1035@%RNDDIGIT13 ([...])
%RNDLCCHAR13%RNDDIGIT13%RNDLCCHAR13%RNDDIGIT13-%RNDDIGIT520-%RNDDIGIT1035 <at> pontiac%RNDDIGIT13 ([...])
Someone needs to improve their scripting language abilities... But on
the other hand:
$ notmuch search --output=files -- 'id:"$MESSAGE_ID"' | wc -l
25
This goes by the lines of ``notmuch as a spam filter'': these are
different spam messages, but due to notmuch's Message-ID-based keying,
they are all coalesced into one.
|
|