Features Download
From: Rasmus Lerdorf <rasmus <at> lerdorf.com>
Subject: Re: [PHP-DEV] default charset confusion
Newsgroups: gmane.comp.php.devel
Date: Monday 12th March 2012 07:41:08 UTC (over 5 years ago)
On 03/12/2012 12:10 AM, Stas Malyshev wrote:
> Hi!
>> What we really need is what we added in PHP 6. A runtime encoding ini
>> setting that is distinct from the output charset which we can use here.
>> That would allow people to fix all their legacy code to a specific
>> runtime encoding with a single ini setting instead of changing thousands
>> of lines of code. I propose that we add such a directive to 5.4.1 to
>> ease migration.
> One more charset INI setting? I'm not sure I like this. We have tons of
> INIs already, and adding a new one each time we change something makes
> both writing applications and configuring servers harder.
> But as the manual says, ISO-8859-1 and  UTF-8  are the same for
> htmlspecialchars() - is it wrong? If yes, what exactly is the different
> between old and new behavior? I tried to read #61354 but could make
> little sense out of it, it lacks expected result and I have hard time
> understanding what is the problem there. Could you explain?

Yes, it is a bit hard to understand from the bug report because
bugs.php.net is all utf-8, but we are talking about non utf-8 apps here.

This script should illustrate it: ( https://gist.github.com/2020502

$gb2312 = iconv('UTF-8','GB2312','我是测试');
$string = $string = "


"; echo htmlspecialchars($string); If you run that in PHP 5.3 you get: <pre><p>���Dz���</p></pre> The garbage-like chars there - if you don't see them, see https://gist.github.com/2020442 - is the expected output. In PHP 5.4 the output is nothing. The function recognizes that this is not valid UTF-8 and dumps the entire string. Ignoring 5.4 for a second, if you in 5.3 do this: echo htmlspecialchars($string); echo htmlspecialchars($string, NULL, "ISO-8859-1"); echo htmlspecialchars($string, NULL, "UTF-8"); You will see that the first two output the escaped string with the GB2312 bytes intact within it and the UTF-8 calls returns false because it correctly recognizes that GB2312 is not UTF-8. We don't have any such check for 8859-1, so yes, saying UTF-8 and 8859-1 are the same for htmlspecialchars() is wrong for PHP 5.3 as well as for 5.4. And as expected, under 5.4 because the default is now the UTF-8 behaviour only the second echo gives a result. -Rasmus -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php
CD: 3ms