emoji表情其實就是四位的unicode
所以我們可以通過unicode來識別emoji表情
<U+1F300> - <U+1F5FF> # symbols & pictographs
<U+1F600> - <U+1F64F> # emoticons
<U+1F680> - <U+1F6FF> # transport & map symbols
<U+2600> - <U+2B55> # other
目標是要匹配文本兩個emoji表情中間的文字
例如
🔐testtest🔐
代碼:
readline=['🔐testtest🔐']
import re
pat=re.compile(u'['u'\U0001F300-\U0001F64F'u'\U0001F680-\U0001F6FF'u'\u2600-\u2B55]'+'(.*?)'+u'['u'\U0001F300-\U0001F64F'u'\U0001F680-\U0001F6FF'u'\u2600-\u2B55]', re.UNICODE)
for line in readline:
print(pat.findall(line))
結果:
testtest