記一次Gin框架使用url參數值獲取+號出現空格的排查
問題描述:
前幾天使用gin框架遇到一個問題,接口請求之後,後臺拿到的數據中出現取url參數,url參數值如果有攜帶+,後臺拿到的是一個空格。
經排查客戶端請求時如果對url參數沒有進行編碼直接傳遞+號過來,go語言的net包會將其替換成空格。+號只有轉換成url編碼, %2b時,gin框架才能正確的解析出+號。
源碼部分
可以查看net/url/url.go這個文件,url.go文件處理參數認爲傳遞過來的是進行url編碼之後的參數。
在這個函數QueryUnescape下面有一個PathUnsecape。給出的解釋是:PathUnescape與QueryUnescape完全相同,只是它不是 unescape ‘+’ to ’ '(空格)。
// QueryUnescape does the inverse transformation of QueryEscape,
// converting each 3-byte encoded substring of the form "%AB" into the
// hex-decoded byte 0xAB.
// It returns an error if any % is not followed by two hexadecimal
// digits.
func QueryUnescape(s string) (string, error) {
return unescape(s, encodeQueryComponent)
}
// PathUnescape does the inverse transformation of PathEscape,
// converting each 3-byte encoded substring of the form "%AB" into the
// hex-decoded byte 0xAB. It returns an error if any % is not followed
// by two hexadecimal digits.
//
// PathUnescape is identical to QueryUnescape except that it does not
// unescape '+' to ' ' (space).
func PathUnescape(s string) (string, error) {
return unescape(s, encodePathSegment)
}
重點看一下unsecape函數
// unescape unescapes a string; the mode specifies
// which section of the URL string is being unescaped.
func unescape(s string, mode encoding) (string, error) {
// Count %, check that they're well-formed.
n := 0
hasPlus := false
for i := 0; i < len(s); {
switch s[i] {
case '%':
n++
if i+2 >= len(s) || !ishex(s[i+1]) || !ishex(s[i+2]) {
s = s[i:]
if len(s) > 3 {
s = s[:3]
}
return "", EscapeError(s)
}
// Per https://tools.ietf.org/html/rfc3986#page-21
// in the host component %-encoding can only be used
// for non-ASCII bytes.
// But https://tools.ietf.org/html/rfc6874#section-2
// introduces %25 being allowed to escape a percent sign
// in IPv6 scoped-address literals. Yay.
if mode == encodeHost && unhex(s[i+1]) < 8 && s[i:i+3] != "%25" {
return "", EscapeError(s[i : i+3])
}
if mode == encodeZone {
// RFC 6874 says basically "anything goes" for zone identifiers
// and that even non-ASCII can be redundantly escaped,
// but it seems prudent to restrict %-escaped bytes here to those
// that are valid host name bytes in their unescaped form.
// That is, you can use escaping in the zone identifier but not
// to introduce bytes you couldn't just write directly.
// But Windows puts spaces here! Yay.
v := unhex(s[i+1])<<4 | unhex(s[i+2])
if s[i:i+3] != "%25" && v != ' ' && shouldEscape(v, encodeHost) {
return "", EscapeError(s[i : i+3])
}
}
i += 3
case '+':
hasPlus = mode == encodeQueryComponent
i++
default:
if (mode == encodeHost || mode == encodeZone) && s[i] < 0x80 && shouldEscape(s[i], mode) {
return "", InvalidHostError(s[i : i+1])
}
i++
}
}
if n == 0 && !hasPlus {
return s, nil
}
var t strings.Builder
t.Grow(len(s) - 2*n)
for i := 0; i < len(s); i++ {
switch s[i] {
case '%':
t.WriteByte(unhex(s[i+1])<<4 | unhex(s[i+2]))
i += 2
case '+':
if mode == encodeQueryComponent {
t.WriteByte(' ')
} else {
t.WriteByte('+')
}
default:
t.WriteByte(s[i])
}
}
return t.String(), nil
}
解析的過程中如果遇到了+號,會將這裏hasPlus置爲true,表示有加號的標誌。這裏做的事情就是爲了檢查+加號
case '+':
hasPlus = mode == encodeQueryComponent
i++
如果沒有加號出現 這裏url.go line249,便直接返回了
if n == 0 && !hasPlus {
return s, nil
}
如果確實出現了加號,這裏會繼續向下,具體轉換url編碼爲值的時候,如果遇到了加號,line259,會判斷前面的模式model
然後替換加號爲空格
var t strings.Builder
t.Grow(len(s) - 2*n)
for i := 0; i < len(s); i++ {
switch s[i] {
case '%':
t.WriteByte(unhex(s[i+1])<<4 | unhex(s[i+2]))
i += 2
case '+':
if mode == encodeQueryComponent {
t.WriteByte(' ') //這裏會替換成空格
} else {
t.WriteByte('+')
}
default:
t.WriteByte(s[i])
}
}
改進措施:
後臺可以考慮對這種情況,將url中的+號都替換成%2b,正確情況url編碼之後加號是%2b。
c.Request.URL.RawQuery = strings.ReplaceAll(c.Request.URL.RawQuery, "+", "%2b")