I am trying to clean clipboard data from multibyte unicode characters by looping through all characters, using SubStr() and Asc() and removing those with values larger than 254. During testing (with Unicode 64-bit version) I noticed that it doesn't recognise 3- or 4-Byte-characters. E.g. a string with one single U+1F60A character gives a StrLen() of 2 and looping through it with Asc() gives a character Chr(55357) [D83D] and Chr(56842) [DE0A] (UTF-16 representation of the character) when it should be one single Chr(128522) [1F60A].
However, for "❤", it works correctly. StrLen() returns 1 and Asc() returns 10084.
Is that a bug or a feature? How can I work around that to be able to parse character by character instead of having partial UTF-16 Bytes in between?
4-byte Unicode support?
Re: 4-byte Unicode support?
Thanks. using Ord() instead of Asc() was the trick. (And skipping a character whenever that returned values > 65535.)
Who is online
Users browsing this forum: Draken and 283 guests