|
From: Katsumi Yamaoka <yamaoka <at> jpl.org>
Subject: Re: [BUG]What does this mean:"Mention that multibyte characters Newsgroups: gmane.emacs.gnus.general Date: 2004-10-20 05:36:48 GMT (4 years, 36 weeks, 5 days, 2 hours and 9 minutes ago)
>>>>> In <b9yacuji7t6.fsf <at> jpl.org> Katsumi Yamaoka wrote:
> I'll try translation of Kenichi Handa's advice next time.
This is practice of my English composition. No matter what mistake
may be there, the responsibility is in me.
>>>>> Katsumi Yamaoka wrote:
> With the following form, Emacs 21.3.50 returns non-nil, and 22.0.0
> returns nil. Could you let me know for reference what occurs there?
> (with-temp-buffer
> (set-buffer-multibyte t)
> (insert (string-as-unibyte "\200"))
> (goto-char (point-min))
> (search-forward (string-as-multibyte "\200") nil t))
;; Annotation by K.Y.:
;; At that time, I didn't know the possible insertion forms are
;; `(insert ?\200)' and `(insert (format "%c" ?\200))' yet.
>>>>> Kenichi Handa wrote:
Even with Emacs 21.3.50, the above form will return nil according to a
certain language environment (e.g., Vietnamese, etc.).
You have to understand first that a unibyte string is converted into a
multibyte string by `string-make-multibyte' when inserting a unibyte
^^^^
string in a multibyte buffer. Therefore,
the `(insert (string-as-unibyte "\200"))' form is identical to the
`(insert (string-make-multibyte (string-as-unibyte "\200")))' form.
Where how `string-make-multibyte' converts depends on the language
environment. As for Emacs 21, in the Latin-1 language environment,
for example, the string of "\200" will be converted into the character
which corresponds to \200 in the eight-bit-control charset since the
primary charset latin-iso8859-1 doesn't contain \200.
Second, `string-as-multibyte' converts STRING into the multibyte
string, keeping its byte sequence as much as possible. It works
``as much as possible'' but sometimes brings differences. For
example, the string of "\200" will be converted into the byte-sequence
of "\236\240" which is a character contained in the eight-bit-control
charset. It is the same as the character which the above program
inserted in the buffer.
Consequently, in the Latin-1 language environment, for example, the
above program returned non-nil, in Emacs 21.
On the other hand, in Emacs 22, since iso-8859-1 which is the primary
charset for Latin-1 contains \200, the form
(insert (string-as-unibyte "\200"))
inserts the character of U+0080 rather than the character which
belongs to eight-bit-control. However, `string-as-multibyte' always
converts \200 into the character of eight-bit-control. This is the
reason that program returns nil.
If you have a need to look for \200 after inserting it in a buffer, it
will go well in both Emacs 21 and 22 using the following way for
example:
(with-temp-buffer
(set-buffer-multibyte t)
(insert (string-to-multibyte "\200"))
(goto-char (point-min))
(search-forward (string-to-multibyte "\200") nil t))
;; Annotation by K.Y.:
;; I didn't use that way in the `gnus-update-summary-mark-positions'
;; function (which see).
`string-to-multibyte' always converts a string into the characters
which belong to eight-bit-control or eight-bit-graphic, so the string
which it makes will never match usual string.
P.S.
In Emacs 21, the form
`(insert (string-to-multibyte "\200"))'
does the same as the form
`(insert ?\200)'
does. It is because there is not the character corresponding to 128
in the multibyte buffer, and it is treated as the raw byte which
belongs to eight-bit-control.
However, it differs in Emacs 22. Since the character corresponding to
128 exists as U+0080, it will be inserted.
;; Annotation by K.Y.:
;; I deeply thank to Kenichi Handa. There was all knowledge that I
;; needed.
|
|
|