|
Subject: Re: Standards and localization (was Dot-mapping) Newsgroups: gmane.ietf.idnabis Date: 2007-12-13 02:42:42 GMT (35 weeks, 6 days, 6 hours and 10 minutes ago) "The dots" that that are relevant are the full stops: U+002E FULL STOP U+3002 IDEOGRAPHIC FULL STOP and, because of fullwidth/halfwidth cloning in East Asian character sets: U+FF0E FULLWIDTH FULL STOP (explicitly fullwidth version of U+002E) U+FF61 HALFWIDTH IDEOGRAPHIC FULL STOP (explicitly halfwidth version of U+3002) These are exactly the full stops used in the IDNA2003 spec ( http://ietf.org/rfc/rfc3490.txt): 1) Whenever dots are used as label separators, the following characters MUST be recognized as dots: U+002E (full stop), U+3002 (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61 (halfwidth ideographic full stop). That's it. We shouldn't even be talking about "dots" here, because really what is at issue are these two full stops. The reason for adding them to IDNA2003 was to make it easier for about a third of the world to enter in URLs, because of the way that input methods work. The reason for keeping them in IDNAbis is for the same reason, plus backwards compatibility with IDNA2003. If we had wanted to extend this set to all the compatibility NFKC variants, then we would also add the following: 2024 ONE DOT LEADER FE12 PRESENTATION FORM FOR VERTICAL IDEOGRAPHIC FULL STOP FE52 SMALL FULL STOP However, there is no need for that at all, since those characters will not be entered in by accident on Chinese and Japanese computers. I'm agnostic about where FULL STOP and IDEOGRAPHIC FULL STOP equivalence get handled in the protocol stack, by the way. I leave that to others to sort out. While lots of scripts have different kinds of terminal punctuation, of all shapes, which function somewhat similarly to FULL STOP in Western punctuation conventions, they don't look like dots, and as far as I know nobody is advocating that those start to appear as internet label delimiters. I just want to emphasize the point that what you do about mapping full stops shouldn't be colored by the fear that a nonextensible specification for them will be broken and lead to cultural and political attacks on the specification. > If the list is not extensible, we run > into problems with scripts that have not been coded but whose > users believe that their dots are equally important. I will venture to assert that that is the null set. Unicode 5.1 is adding the Cham, Kayah Li, Lepcha, Ol Chiki, Rejang, Saurashtra, Sundanese and Vai scripts -- all in modern use. Many of those scripts have "danda" punctuation, but none of them adds a baseline dot delimiter FULL STOP to the standard. Unicode 5.2 will add the Tai Tham and Tai Viet scripts, and the same statement holds for those. There are about a dozen more regional current use scripts in the pipeline for eventual encoding, many of which have reasonably complete proposals to hand by now -- and as far as I know, none of them will add a baseline dot delimiter FULL STOP to the standard. As for the various archaic scripts, those aren't going to be appropriate for IDNs in any case, and don't have "users" with cultural expectations, even if they did have baseline delimiter dots. So we don't need to get distracted by worrying about extensibility issues for these particular delimiters. --Ken |
|
|