Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ MUSHclient ➜ International ➜ Full Unicode support

Full Unicode support

It is now over 60 days since the last post. This thread is closed.     Refresh page


Pages: 1  2  3  4  5 

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #60 on Mon 09 Jun 2008 12:20 AM (UTC)
Message
Yes I tried that, and the problem is not easily solved. For example, some things like the PCRE regexp-matcher don't use Unicode, they use 8-bit strings. It accepts UTF-8, but that means you need to convert back and forwards from wide strings to UTF-8. And then there are the MUDs, most of which send 8-bit text, not UTF-8 nor wide strings.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Atltais   (8 posts)  Bio
Date Reply #61 on Mon 09 Jun 2008 12:47 AM (UTC)

Amended on Mon 09 Jun 2008 12:56 AM (UTC) by Atltais

Message
Isn't Unicode (e: Well, UTF8 that is, causing additional fun/grief because Windows uses UTF16, which is a bit different.) more or less backwards compliant with ASCII characters below 0x7F anyways?

Granted, you would have to go from UTF16 to UTF8 for regexp, true enough.

But, for most MUDs, it shouldn't be a problem if they don't use characters over 0x7F, but if they do (if it's a non-unicode, non-english MUD), you end up with a bit of a problem. Which, I suppose, is one reason why other clients aren't unicode.
Top

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #62 on Mon 09 Jun 2008 06:33 AM (UTC)
Message
Unicode isn't one single thing for a start. Just check out www.unicode.org to see what I mean. Basically the idea is to represent various characters (glyphs) in a consistent way by assigning a different number to each one. But how that number is stored can vary somewhat. UTF-8 uses an encoding system that is indeed identical to non-Unicode for characters <= 0x7F, however once you move to higher values you have heaps of options. Do you want 16-bit characters? 32-bits? Which orders are the bytes? Big-endian or little-endian?

Under the Windows compiler, enabling the UNICODE define switches the representation of characters from char (8 bit) to long (16 bit). Straight away this won't work for Unicode characters > 0xFFFF. Also you can't just copy stuff from the MUD (8 bit characters) into the internal spaces (16-bit characters) without using a special call.

It's a can of worms, one I don't propose to open in the near future.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Atltais   (8 posts)  Bio
Date Reply #63 on Mon 09 Jun 2008 07:10 AM (UTC)

Amended on Mon 09 Jun 2008 10:09 PM (UTC) by Atltais

Message
It's a whole range, yes, and UTF8 is a superset of ASCII, which, as I understand, is one of its biggest advantages. The larger problem, I suppose, is that (as perviously stated) Windows uses UTF16 internally (source: http://msdn.microsoft.com/en-us/library/ms776459(VS.85).aspx), which complicates matters somewhat. (additionally, you get into the endianness issue) With UTF8 you get 'pretty much' any character in regular use. (the entirety of the BMP, past this most fonts don't even have representations anyways, but that's getting wildly off topic. e:Plus, to my understanding, UTF8 supports up to U+10FFFF anyways.)

All in all, I suppose it's a relatively minor issue (since those honestly needing client-side UTF8 support can't be all that numerous) and development time may be better spent elsewhere.

edit: That is to say, endianness doesn't matter in UTF8 as it does in UTF16/32, since UTF8 is byte oriented. One thing to note though, in both UTF8 [i]and[/i] UTF16, is that characters aren't fixed width. (as in size) Therefore, UTF16 can handle codes above U+FFFF (and indeed, so can UTF8)

UTF8 is as widely supported as it is simply because it's (more or less) backwards compatible with ASCII right out of the box, so it can take a standard ASCII string (if the characters are all <=0x7F, that is) and be happy with it. In any case, it's quite an undertaking to convert a program as big as MUSHclient into a 100% UTF8 program.
Top

Posted by Fiendish   USA  (2,534 posts)  Bio   Global Moderator
Date Reply #64 on Sun 05 Jun 2011 06:10 PM (UTC)
Message
the first plugin shown on http://www.gammon.com.au/forum/bbshowpost.php?id=2681&page=4 currently has two entries for date_written, which will cause the plugin to fail to load

https://github.com/fiendish/aardwolfclientpackage
Top

Posted by Nick Gammon   Australia  (23,133 posts)  Bio   Forum Administrator
Date Reply #65 on Sun 05 Jun 2011 09:40 PM (UTC)
Message
Changed to date_modified, thanks.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


230,906 views.

This is page 5, subject is 5 pages long:  [Previous page]  1  2  3  4  5 

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.