Features Download
From: Tom Christiansen <tchrist <at> perl.com>
Subject: with malice aforethought (Re: Unicode cheatsheet for Perl)
Newsgroups: gmane.comp.lang.perl.perl5.porters
Date: Sunday 26th February 2012 19:22:27 UTC (over 6 years ago)
Christian Hansen  wrote
   on Tue, 21 Feb 2012 02:07:08 +0100:

>>> I would love for this to happen, I have advocated this on #p5p several
>>> times, but there is always the battle of  "backwards compatibility
>>> disease". About 10 months ago I reported a security issue reading the
>>> relaxed UTF-8 implementation (still undisclosed and still exploitable)
>>> on the perl security mailing list.

Then we are currently in a security-through-obscurity situation, wherein
only overall ignorance of an exploit "protects" us.  That's not protection;
it's a vulnerability.  Would you estimate the vulnerability is severe
enough for us to consider whether in this particular case we should
consider issuing patches for old releases, like make a 5.12.5 or 5.10.2?

>> There is absolutely no need to remain compatible with security-related
>> bugs, and every reason not to.  Indeed, security is the only thing that
>> we ever issue patches to releases that are past their end-of-life

> I lack the political skills to make this happen, but I'm more than
> to provide the proper UTF-8 implementation for this (as defined by
> Unicode/ISO/IEC 10646:2011) we could always discuss the need for the
> invented meaning of relaxed. During my years as a professional programmer
> for several high profile financial institutions in Sweden, I have only
> encountered Ill-formed UTF8 through malicious attempts or clients that
> thought that they where sending UTF-8 but using ISO-8959-1, thats my
> experience, perhaps yours looks different?

My own experiences are finding the wrong encoding used by accident, not by
malicious intent.  The situation you mention is therefore outside of my own
experiences, which makes me all the more concerned about it.  I have
of corrupt data because of Java having the wrong defaults for what to do 
with wrong encodings.  It was a design mistake, but they locked themselves
into it forever and everyone keeps paying for that blunder.  Let's not
mimic their bad decisions.  Let's fix ours.

The thing I don't want is to have to tell people that they cannot trust
perl -C, that they cannot trust PERL_UNICODE, that they cannot trust use
utf8, that that they cannot trust use open, that they cannot trust binmode,
that they cannot trust :encoding(UTF-8), and that the only thing they can
trust is laborious and error-prone manual encoding/decoding with FB_CROAK.

If that position is nonetheless correct, it drastically needs to be fixed.
Christian, I don't know what political skills you allude to as needed to
make this happen.  Political skills to achieve a consensus that backwards
compatibility with previous behavior known to be wrong is undesirable?

It seems to me that Python went through a transition where
errors changed from some sort of non-fatal to proper exceptions.  I don't
what sort of conniptions they experience there, since it's not a backwards-
contemptible change.  But it doesn't have to be b-c, and probably shouldn't
Jarkko is right.

It's better to fix bugs than to document them, and it's better to document
than not.  Right now I'm very hazy on the real status of all this stuff,
and I
am very uncomfortable with the idea of relentlessly charging ahead toward a
release like a freight train with no brakes.

Absolutely nothing depends upon any particular release date, but quite a
depends on correct behavior, especially if it is security-related.  I know
one of those *I* consider immeasurably more important, but Aristotle
appears to
be of the opposite opinion.  Is this the "poltical will" problem you

CD: 17ms