web中的字符问题总结

计算机中的字符表示都是2进制,1B(字节)=8bit 用八个内存单元为一个位标号(编码),这是ASCII编码 8bit用了7bit 2^7=128个 扩展的ASCII用了8bit 2^8=256个

256对汉字是远远不够,代替汉字就要用多位表示,gb2312是2字节 utf8是3字节


1、ASCII表中的字符分 为可打印与非打印字符

非打印字符 : 0-31 + 127

剩下的是可打印字符

ASCII码表

Bin Dec Hex 缩写/字符 解释
00000000 0 00 NUL(null) 空字符
00000001 1 01 SOH(start of headling) 标题开始
00000010 2 02 STX (start of text) 正文开始
00000011 3 03 ETX (end of text) 正文结束
00000100 4 04 EOT (end of transmission) 传输结束
00000101 5 05 ENQ (enquiry) 请求
00000110 6 06 ACK (acknowledge) 收到通知
00000111 7 07 BEL (bell) 响铃
00001000 8 08 BS (backspace) 退格
00001001 9 09 HT (horizontal tab) 水平制表符
00001010 10 0A LF (NL line feed, new line) 换行键
00001011 11 0B VT (vertical tab) 垂直制表符
00001100 12 0C FF (NP form feed, new page) 换页键
00001101 13 0D CR (carriage return) 回车键
00001110 14 0E SO (shift out) 不用切换
00001111 15 0F SI (shift in) 启用切换
00010000 16 10 DLE (data link escape) 数据链路转义
00010001 17 11 DC1 (device control 1) 设备控制1
00010010 18 12 DC2 (device control 2) 设备控制2
00010011 19 13 DC3 (device control 3) 设备控制3
00010100 20 14 DC4 (device control 4) 设备控制4
00010101 21 15 NAK (negative acknowledge) 拒绝接收
00010110 22 16 SYN (synchronous idle) 同步空闲
00010111 23 17 ETB (end of trans. block) 传输块结束
00011000 24 18 CAN (cancel) 取消
00011001 25 19 EM (end of medium) 介质中断
00011010 26 1A SUB (substitute) 替补
00011011 27 1B ESC (escape) 溢出
00011100 28 1C FS (file separator) 文件分割符
00011101 29 1D GS (group separator) 分组符
00011110 30 1E RS (record separator) 记录分离符
00011111 31 1F US (unit separator) 单元分隔符
00100000 32 20 (space) 空格
00100001 33 21 !  
00100010 34 22 "  
00100011 35 23 #  
00100100 36 24 $  
00100101 37 25 %  
00100110 38 26 &  
00100111 39 27 '  
00101000 40 28 (  
00101001 41 29 )  
00101010 42 2A *  
00101011 43 2B +  
00101100 44 2C ,  
00101101 45 2D -  
00101110 46 2E .  
00101111 47 2F /  
00110000 48 30 0  
00110001 49 31 1  
00110010 50 32 2  
00110011 51 33 3  
00110100 52 34 4  
00110101 53 35 5  
00110110 54 36 6  
00110111 55 37 7  
00111000 56 38 8  
00111001 57 39 9  
00111010 58 3A :  
00111011 59 3B ;  
00111100 60 3C <  
00111101 61 3D =  
00111110 62 3E >  
00111111 63 3F ?  
01000000 64 40 @  
01000001 65 41 A  
01000010 66 42 B  
01000011 67 43 C  
01000100 68 44 D  
01000101 69 45 E  
01000110 70 46 F  
01000111 71 47 G  
01001000 72 48 H  
01001001 73 49 I  
01001010 74 4A J  
01001011 75 4B K  
01001100 76 4C L  
01001101 77 4D M  
01001110 78 4E N  
01001111 79 4F O  
01010000 80 50 P  
01010001 81 51 Q  
01010010 82 52 R  
01010011 83 53 S  
01010100 84 54 T  
01010101 85 55 U  
01010110 86 56 V  
01010111 87 57 W  
01011000 88 58 X  
01011001 89 59 Y  
01011010 90 5A Z  
01011011 91 5B [  
01011100 92 5C \  
01011101 93 5D ]  
01011110 94 5E ^  
01011111 95 5F _  
01100000 96 60 `  
01100001 97 61 a  
01100010 98 62 b  
01100011 99 63 c  
01100100 100 64 d  
01100101 101 65 e  
01100110 102 66 f  
01100111 103 67 g  
01101000 104 68 h  
01101001 105 69 i  
01101010 106 6A j  
01101011 107 6B k  
01101100 108 6C l  
01101101 109 6D m  
01101110 110 6E n  
01101111 111 6F o  
01110000 112 70 p  
01110001 113 71 q  
01110010 114 72 r  
01110011 115 73 s  
01110100 116 74 t  
01110101 117 75 u  
01110110 118 76 v  
01110111 119 77 w  
01111000 120 78 x  
01111001 121 79 y  
01111010 122 7A z  
01111011 123 7B {  
01111100 124 7C |  
01111101 125 7D }  
01111110 126 7E ~  
01111111 127 7F DEL (delete) 删除

9    制表符  \t

10  换行符  \r

13  回车符  \n


2 浏览器对url的编码

浏览器会对表单中的key/val 进行编码之后再传递 php脚本会自动将之解码

编码规则: %+字符对应的十六进制


这个是HTML实体的对照表:

php函数中 html_entity_decode 函数将HTML实体转化成对应的char,

HTML实体表示方式 &entity_name; &#numner;(数字可以是10进制与十六进制x两种)

php函数 html_entity_decode 的一个bug就是 不能识别 没有分号的,但是这种没有分号的却可以被浏览器识别

字符

十进制字符编号 实体名字 说明
--- &#00; --- 未使用Unused
--- &#01; --- 未使用Unused
--- &#02; --- 未使用Unused
--- &#03; --- 未使用Unused
--- &#04; --- 未使用Unused
--- &#05; --- 未使用Unused
--- &#06; --- 未使用Unused
--- &#07; --- 未使用Unused
--- &#08; --- 未使用Unused
--- &#09; --- 制表符Horizontal tab
--- &#10; --- 换行Line feed
--- &#11; --- 未使用Unused
--- &#12; --- 未使用Unused
--- &#13; --- 回车Carriage Return
--- &#14; --- 未使用Unused
--- &#15; --- 未使用Unused
--- &#16; --- 未使用Unused
--- &#17; --- 未使用Unused
--- &#18; --- 未使用Unused
--- &#19; --- 未使用Unused
--- &#20; --- 未使用Unused
--- &#21; --- 未使用Unused
--- &#22; --- 未使用Unused
--- &#23; --- 未使用Unused
--- &#24; --- 未使用Unused
--- &#25; --- 未使用Unused
--- &#26; --- 未使用Unused
--- &#27; --- 未使用Unused
--- &#28; --- 未使用Unused
--- &#29; --- 未使用Unused
--- &#30; --- 未使用Unused
--- &#31; --- 未使用Unused
  &#32; --- Space
! &#33; --- 惊叹号Exclamation mark
" &#34; &quot; 双引号Quotation mark
# &#35; --- 数字标志Number sign
$ &#36; --- 美元标志Dollar sign
% &#37; --- 百分号Percent sign
& &#38; &amp; Ampersand
" &#39; --- 单引号Apostrophe
( &#40; --- 小括号左边部分Left parenthesis
) &#41; --- 小括号右边部分Right parenthesis
* &#42; --- 星号Asterisk
+ &#43; --- 加号Plus sign
, &#44; --- 逗号Comma
- &#45; --- 连字号Hyphen
. &#46; --- 句号Period (fullstop)
/ &#47; --- 斜杠Solidus (slash)
0 &#48; --- 数字0 Digit 0
1 &#49; --- 数字1 Digit 1
2 &#50; --- 数字2 Digit 2
3 &#51; --- 数字3 Digit 3
4 &#52; --- 数字4 Digit 4
5 &#53; --- 数字5 Digit 5
6 &#54; --- 数字6 Digit 6
7 &#55; --- 数字7 Digit 7
8 &#56; --- 数字8 Digit 8
9 &#57; --- 数字9 Digit 9
: &#58; --- 冒号Colon
; &#59; --- 分号Semicolon
< &#60; &lt; 小于号Less than
= &#61; --- 等于符号Equals sign
> &#62; &gt; 大于号Greater than
? &#63; --- 问号Question mark
@ &#64; --- Commercial at
A &#65; --- 大写A Capital A
B &#66; --- 大写B Capital B
C &#67; --- 大写C Capital C
D &#68; --- 大写D Capital D
E &#69; --- 大写E Capital E
F &#70; --- 大写F Capital F
G &#71; --- 大写G Capital G
H &#72; --- 大写H Capital H
I &#73; --- 大写J Capital I
J &#74; --- 大写K Capital J
K &#75; --- 大写L Capital K
L &#76; --- 大写K Capital L
M &#77; --- 大写M Capital M
N &#78; --- 大写N Capital N
O &#79; --- 大写O Capital O
P &#80; --- 大写P Capital P
Q &#81; --- 大写Q Capital Q
R &#82; --- 大写R Capital R
S &#83; --- 大写S Capital S
T &#84; --- 大写T Capital T
U &#85; --- 大写U Capital U
V &#86; --- 大写V Capital V
W &#87; --- 大写W Capital W
X &#88; --- 大写X Capital X
Y &#89; --- 大写Y Capital Y
Z &#90; --- 大写Z Capital Z
[ &#91; --- 中括号左边部分Left square bracket
/ &#92; --- 反斜杠Reverse solidus (backslash)
] &#93; --- 中括号右边部分Right square bracket
^ &#94; --- Caret
_ &#95; --- 下划线Horizontal bar (underscore)
` &#96; --- 尖重音符Acute accent
a &#97; --- 小写a Small a
b &#98; --- 小写b Small b
c &#99; --- 小写c Small c
d &#100; --- 小写d Small d
e &#101; --- 小写e Small e
f &#102; --- 小写f Small f
g &#103; --- 小写g Small g
h &#104; --- 小写h Small h
i &#105; --- 小写i Small i
j &#106; --- 小写j Small j
k &#107; --- 小写k Small k
l &#108; --- 小写l Small l
m &#109; --- 小写m Small m
n &#110; --- 小写n Small n
o &#111; --- 小写o Small o
p &#112; --- 小写p Small p
q &#113; --- 小写q Small q
r &#114; --- 小写r Small r
s &#115; --- 小写s Small s
t &#116; --- 小写t Small t
u &#117; --- 小写u Small u
v &#118; --- 小写v Small v
w &#119; --- 小写w Small w
x &#120; --- 小写x Small x
y &#121; --- 小写y Small y
z &#122; --- 小写z Small z
&#123; --- 大括号左边部分Left curly brace
| &#124; --- 竖线Vertical bar
&#125; --- 大括号右边部分Right curly brace
~ &#126; --- Tilde
--- &#127; --- 未使用Unused
  &#160; &nbsp; 空格Nonbreaking space
? &#161; &iexcl; Inverted exclamation
&#162; &cent; 货币分标志Cent sign
&#163; &pound; 英镑标志Pound sterling
¤ &#164; &curren; 通用货币标志General currency sign
&#165; &yen; 日元标志Yen sign
| &#166; &brvbar; or &brkbar; 断竖线Broken vertical bar
§ &#167; &sect; 分节号Section sign
¨ &#168; &uml; or &die; 变音符号Umlaut
? &#169; &copy; 版权标志Copyright
a &#170; &ordf; Feminine ordinal
? &#171; &laquo; Left angle quote, guillemet left
? &#172; &not Not sign
  &#173; &shy; Soft hyphen
? &#174; &reg; 注册商标标志Registered trademark
ˉ &#175; &macr; or &hibar; 长音符号Macron accent
° &#176; &deg; 度数标志Degree sign
± &#177; &plusmn; 加或减Plus or minus
2 &#178; &sup2; 上标2 Superscript two
3 &#179; &sup3; 上标3 Superscript three
&#180; &acute; 尖重音符Acute accent
μ &#181; &micro; Micro sign
? &#182; &para; Paragraph sign
· &#183; &middot; Middle dot
? &#184; &cedil; Cedilla
1 &#185; &sup1; 上标1 Superscript one
o &#186; &ordm; Masculine ordinal
? &#187; &raquo; Right angle quote, guillemet right
? &#188; &frac14; 四分之一Fraction one-fourth
? &#189; &frac12; 二分之一Fraction one-half
? &#190; &frac34; 四分之三Fraction three-fourths
? &#191; &iquest; Inverted question mark
à &#192; &Agrave; Capital A, grave accent
á &#193; &Aacute; Capital A, acute accent
? &#194; &Acirc; Capital A, circumflex
? &#195; &Atilde; Capital A, tilde
? &#196; &Auml; Capital A, di?esis / umlaut
? &#197; &Aring; Capital A, ring
? &#198; &AElig; Capital AE ligature
? &#199; &Ccedil; Capital C, cedilla
è &#200; &Egrave; Capital E, grave accent
é &#201; &Eacute; Capital E, acute accent
ê &#202; &Ecirc; Capital E, circumflex
? &#203; &Euml; Capital E, di?esis / umlaut
ì &#204; &Igrave; Capital I, grave accent
í &#205; &Iacute; Capital I, acute accent
? &#206; &Icirc; Capital I, circumflex
? &#207; &Iuml; Capital I, di?esis / umlaut
D &#208; &ETH; Capital Eth, Icelandic
? &#209; &Ntilde; Capital N, tilde
ò &#210; &Ograve; Capital O, grave accent
ó &#211; &Oacute; Capital O, acute accent
? &#212; &Ocirc; Capital O, circumflex
? &#213; &Otilde; Capital O, tilde
? &#214; &Ouml; Capital O, di?esis / umlaut
× &#215; &times; 乘号Multiply sign
? &#216; &Oslash; Capital O, slash
ù &#217; &Ugrave; Capital U, grave accent
ú &#218; &Uacute; Capital U, acute accent
? &#219; &Ucirc; Capital U, circumflex
ü &#220; &Uuml; Capital U, di?esis / umlaut
Y &#221; &Yacute; Capital Y, acute accent
T &#222; &THORN; Capital Thorn, Icelandic
? &#223; &szlig; Small sharp s, German sz
à &#224; &agrave; Small a, grave accent
á &#225; &aacute; Small a, acute accent
a &#226; &acirc; Small a, circumflex
? &#227; &atilde; Small a, tilde
? &#228; &auml; Small a, di?esis / umlaut
? &#229; &aring; Small a, ring
? &#230; &aelig; Small ae ligature
? &#231; &ccedil; Small c, cedilla
è &#232; &egrave; Small e, grave accent
é &#233; &eacute; Small e, acute accent
ê &#234; &ecirc; Small e, circumflex
? &#235; &euml; Small e, di?esis / umlaut
ì &#236; &igrave; Small i, grave accent
í &#237; &iacute; Small i, acute accent
? &#238; &icirc; Small i, circumflex
? &#239; &iuml; Small i, di?esis / umlaut
e &#240; &eth; Small eth, Icelandic
? &#241; &ntilde; Small n, tilde
ò &#242; &ograve; Small o, grave accent
ó &#243; &oacute; Small o, acute accent
? &#244; &ocirc; Small o, circumflex
? &#245; &otilde; Small o, tilde
? &#246; &ouml; Small o, di?esis / umlaut
÷ &#247; &divide; 除号Division sign
? &#248; &oslash; Small o, slash
ù &#249; &ugrave; Small u, grave accent
ú &#250; &uacute; Small u, acute accent
? &#251; &ucirc; Small u, circumflex
ü &#252; &uuml; Small u, di?esis / umlaut
y &#253; &yacute; Small y, acute accent
t &#254; &thorn; Small thorn, Icelandic
? &#255; &yuml; Small y, umlaut


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章