Register forum user name Search FAQ

Gammon Forum

Notice: Any messages purporting to come from this site telling you that your password has expired, or that you need to verify your details, confirm your email, resolve issues, making threats, or asking for money, are spam. We do not email users with any such messages. If you have lost your password you can obtain a new one by using the password reset link.

Due to spam on this forum, all posts now need moderator approval.

 Entire forum ➜ MUSHclient ➜ General ➜ Chinese trigger not loaded

Chinese trigger not loaded

It is now over 60 days since the last post. This thread is closed.     Refresh page


Posted by Flow   (5 posts)  Bio
Date Wed 13 Jun 2012 04:10 AM (UTC)
Message
Hi everyone,

I am new to Mushclient and I have been investigating one issues regarding chinese trigger using regex.

The mud is using Big5 and I cannot check the utf-8 box.
When the triggers contain some special chinese words(eg. "架","跋", "崙"), they are not loaded when I open the world.

It gave a error message saying "Failed: missing terminating ] for character class"

I tried to use some encoder to check those words, all of them contain "%5B" which can be decoded to "[".

I think that's the cause of the problem.

Is there any workarounds to make the regex treating the whole sentence as one string but to check it byte by byte?

Please help.

Thanks
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #1 on Wed 13 Jun 2012 06:15 AM (UTC)
Message
What is Big5?

Quote:

I cannot check the utf-8 box.


Why not?

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Flow   (5 posts)  Bio
Date Reply #2 on Wed 13 Jun 2012 06:43 AM (UTC)
Message
Hi Nick,

Big5 is an encoding for traditional chinese..
Because the mud is using big5, the words will be corrupted if I check the utf-8 box..

The second byte of the chinese character was translated to "[" which make the triggers failed.

Thanks.

Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #3 on Wed 13 Jun 2012 07:47 AM (UTC)
Message
I see. Well I suggest making a plugin that converts incoming packets from Big5 to UTF8, then you can check the UTF8 box and the trigger should work.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #4 on Wed 13 Jun 2012 07:48 AM (UTC)
Message
I don't know enough about Big5 to be much more specific, but check out this:

http://www.gammon.com.au/scripts/doc.php?general=plugin_callbacks

In particular:


OnPluginPacketReceived

You should be able to do a simple Lua global replace where it converts Big5 to UTF8 from a simple table.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Flow   (5 posts)  Bio
Date Reply #5 on Wed 13 Jun 2012 07:52 AM (UTC)
Message
Nick,

Thank you very much.

I will try that out first.

by the way, is there any way to make pcre work better on chinese?
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #6 on Wed 13 Jun 2012 07:53 AM (UTC)
Message
Sort of an example here:

http://www.gammon.com.au/forum/bbshowpost.php?bbsubject_id=8747

You basically want to match on "." (anything) and then look up each character in a table and convert it to UTF8.

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Nick Gammon   Australia  (23,173 posts)  Bio   Forum Administrator
Date Reply #7 on Wed 13 Jun 2012 07:55 AM (UTC)
Message
Flow said:

by the way, is there any way to make pcre work better on chinese?


Turn UTF-8 on, it can't know that the characters are not the usual meanings.

Although for triggers it *might* just work to put an underscore before it.

For example, instead of matching on 架 match on \架

- Nick Gammon

www.gammon.com.au, www.mushclient.com
Top

Posted by Flow   (5 posts)  Bio
Date Reply #8 on Wed 13 Jun 2012 08:20 AM (UTC)
Message
Nick Gammon said:

For example, instead of matching on 架 match on \架


this does not work..
Because chinese words have 2 bytes..
the problem is that the last byte become a special character..
there is no way to insert \ in between those 2 bytes..

I am looking for a way to group the words and then ignore all special characters inside the group..

seems no such method.
Top

Posted by Flow   (5 posts)  Bio
Date Reply #9 on Wed 13 Jun 2012 09:45 AM (UTC)
Message
Finally got one way to solve this...
Quote the word by /Q.../E ..
this will enclose the characters as literal and ignore all syntax...

Thanks all...

Top

The dates and times for posts above are shown in Universal Co-ordinated Time (UTC).

To show them in your local time you can join the forum, and then set the 'time correction' field in your profile to the number of hours difference between your location and UTC time.


29,494 views.

It is now over 60 days since the last post. This thread is closed.     Refresh page

Go to topic:           Search the forum


[Go to top] top

Information and images on this site are licensed under the Creative Commons Attribution 3.0 Australia License unless stated otherwise.