Hi, noob here.
I'm editing subtitle (.srt) files
and I'm finding them looking a little cramped where double quotation marks are beside Chinese characters.
How can I use two lines of StrReplace()
to replace [“][Chinese character] with [space][“][Chinese character]
and then [Chinese character][”] with [Chinese character][”][space] ?
Every instance at once, I mean.
Thanks!
StrReplace Chinese characters help
Re: StrReplace Chinese characters help
Code: Select all
haystack := "“你好”`r`n“祝你有个美好的一天”"
msgbox A_Clipboard := editSubtitles(haystack)
editSubtitles(haystack) {
newStr := haystack
newStr := regExReplace(newStr, "“(?=\p{Han})", " ${0}")
newStr := regExReplace(newStr, "(?<=\p{Han})”", "${0} ")
return newStr
}
Last edited by Seven0528 on 02 Jun 2024, 18:37, edited 1 time in total.
- English is not my native language. Please forgive any awkward expressions.
- 영어는 제 모국어가 아닙니다. 어색한 표현이 있어도 양해해 주세요.
Re: StrReplace Chinese characters help
@Seven0528
Thanks!
Are there any other ways to specify a Chinese character?
Also what's the difference between ${0} and $1 ?
And the difference between \p{Han} and ?=\p{Han} ?
Thanks!
Are there any other ways to specify a Chinese character?
Also what's the difference between ${0} and $1 ?
And the difference between \p{Han} and ?=\p{Han} ?
Re: StrReplace Chinese characters help
Yes, there are other methods, but ultimately, that would involve specifying the Unicode range directly.
The most widely known range is called CJK Unified Ideographs, which spans from U+4E00 to U+9FFF.
Represented in regex, it would be something like [一-鿿].
There are additional areas such as CJK Unified Ideographs Extension A, where supplementary characters gather.
Also, considering supplementary characters like U+3007 (〇), pinpointing Chinese characters is quite a challenging task.
I once tried to achieve this by examining the entire Unicode chart, but it wasn't easy.
Moreover, Chinese characters are not only used in China but also in Japan and Korea, so specifying only those used in China would require a lot of effort.
Distinguishing between simplified and traditional characters adds even more complexity.
Personally, I think \p{Han} is sufficient. Since it can be utilized in AHK regular expressions implemented with PCRE, it's quite handy.
For more detailed information, please refer to the document below.
How to detect Chinese characters with punctuation in regex?
[〇一-鿿㐀-䶿豈-𠀀-𪛟𪜀-𫝀-丽-⼀-⿕⺀-⻳"#$%&'()*+,-/:;<=>@[\]^_`{|}~⦅⦆「」、 、〃〈〉《》「」『』【】〔〕〖〗〘〙〚〛〜〝〞〟〰〾〿–—‘’‛“”„‟…‧﹏﹑﹔·]*[!?。。][」﹂”』’》)]}〕〗〙〛〉】]*
The difference between ${1} and ${0} in regular expressions is whether they refer to the first subpattern or the entire pattern.
The presence or absence of curly braces is merely a stylistic difference. Of course, when the number exceeds two digits, curly braces are necessary.
A subpattern usually refers to a pattern enclosed in parentheses.
(The parentheses used here are not subpatterns but are called assertions in another syntax. Subpatterns are not used here.)
The reason for using assertions instead of directly specifying characters in this regular expression is efficiency.
Regular expressions typically evaluate patterns from left to right, sequentially checking for matches.
Code: Select all
“祝你有个美好的一天”
If the pattern \p{Han}” were used instead, it would find all Chinese characters and then check if there is a ” following them. This would require 20 steps.
Code: Select all
“发布这一世界人权宣言,作为所有人民和所有国家努力实现的共同标准,以期每一个人和社会机构经常铭念本宣言,努力通过教诲和教育促进对权利和自由的尊重,并通过国家的和国际的渐进措施,使这些权利和自由在各会员国本身人民及在其管辖下领土的人民中得到普遍和有效的承认和遵行”
For instance, in a longer text such as the one provided, the pattern \p{Han}” would need to perform 255 steps because it has to examine all Chinese characters. On the other hand, (?<=\p{Han})” still only requires 5 steps because it only needs to find the character ” and then check if there is a Chinese character before it.
Using appropriate assertions like this can significantly enhance the efficiency of regular expressions.
- English is not my native language. Please forgive any awkward expressions.
- 영어는 제 모국어가 아닙니다. 어색한 표현이 있어도 양해해 주세요.
Re: StrReplace Chinese characters help
@Seven0528
Wow that's amazing man, thanks!
Wow that's amazing man, thanks!