UTF-8一个汉字占2个字符,GBK一个汉字占3个字符,ASCII码最大能表示到十进制128,16进制0x80,所以当某个字符当ASCII码超出最大值128时,则代表中文汉字,再根据编码跳过对应字节数即可
function mbstrlen($str, $type = 'utf-8')
{
$len = strlen($str);
if ($len <= 0)
{
return 0;
}
$count = 0;
$step = $type == 'utf-8' ? 2 : 3;
for ($i = 0; $i < $len; $i++)
{
$count++;
if (ord($str{$i}) >= 128)
{
$i += $step;
}
}
return $count;
}
$str = '程序猿';
echo strlen($str) . "\n";
echo mbstrlen($str) . "\n";
$str = mb_convert_encoding($str, 'utf-8', 'gbk');;
echo mbstrlen($str, 'gbk') . "\n";
[Running] php "/Users/why/Desktop/php/shmop.php"
9
3
3
[Done] exited with code=0 in 0.288 seconds