Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Chris Clark <Chris.Clark <at> ingres.com>
Subject: Re: Jython 2.5.1 and various encodings support - LookupError: unknown encoding
Newsgroups: gmane.comp.lang.jython.user
Date: Saturday 27th February 2010 00:53:31 UTC (over 6 years ago)
Chris Clark wrote:
> Philip Jenvey wrote:
>   
>> #1066 is the main bug for this issue -- we just currently lack support
for the asian codecs like shiftjis. The ImportError in sample #2 is a
symptom of that. The same ImportError happens when you attempt to use the
codec but it's masked as a LookupError.
>>
>> Supporting these via the JVM's nio codecs is definitely doable but
nobody's gotten around to it yet.
>>   
>>     
>
> Is http://java.sun.com/j2se/1.4.2/docs/guide/nio/
the package you are 
> referring to? I'm not a big Java guy but I may start hacking on a Python 
> layer on top of this as an experiment/proof-of-concept. Presumably 
> http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/package-summary.html

> is what needs wrapping?
>   

I had some time this afternoon whilst waiting for some builds to 
complete... So I started experimenting on using nio from Python along 
with a quick attempt at a shift_jis

I'm seeking feedback on a very INCOMPLETE demo that is attached. Sample 
session:

C:\users\clach04\python\jython_character_encoding>c:\jython2.5.1\jython.bat
Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02
Type "help", "copyright", "credits" or "license" for more information.
 >>> x=''
 >>> x.decode('shift_jis') # at this point there is a shift_jis.py in
curdir
Traceback (most recent call last):
File "", line 1, in 
LookupError: unknown encoding 'shift_jis'
 >>> import shift_jis # register the local module/encoding
 >>> x.decode('shift_jis')
u''
 >>>

There is no support for errors (or less strict conversion options), 
there are imports in the middle of the script and you have to import the 
encoding you need (and right now there is only one but it is easy to do 
multiple with a template). I'm beginning to wonder if it would simply be 
cleaner to use the CPython gencodec.py script and generate input to it 
by using the CPython encodings. I've done this for some Windows (single 
byte) encodings that are not supported by Python by auto-generating 
tables from Windows codepages like cp708. The tables would be pretty big 
though :-)

I'm really looking for "yes nio from Python approach is worth pursuing" 
or "this is stupid, you should stop now" comments. I'm pretty sure 
performance wise this approach is not a good idea but it is infinitely 
faster than "doesn't work at all" :-)


Here is a slightly more real example:
C:\users\clach04\python\jython_character_encoding>c:\jython2.5.1\jython.bat
Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54)
[Java HotSpot(TM) Client VM (Sun Microsystems Inc.)] on java1.6.0_02
Type "help", "copyright", "credits" or "license" for more information.
 >>> import shift_jis # register the local module/encoding
 >>> x = u"\u3042" # '3042 HIRAGANA LETTER A'
 >>> x.encode('shift_jis')
'\x82\xa0'
 >>> # hey! Looks like it matches 
http://demo.icu-project.org/icu-bin/convexp?conv=ibm-943_P15A-2003&b=82&s=ALL#layout

Finally, does anyone know how IronPython handles CJK (or do they simply 
make use of .NET strings)?

Chris
 
CD: 16ms