文章目錄
本文通過分析源碼,深究GLIBC
中strtok
和strtok_r
函數的實現原理和使用過程中的注意事項。
函數說明
#include <string.h>
char *strtok(char *str, const char *delim);
char *strtok_r(char *str, const char *delim, char **saveptr);
說明
strtok
以包含在delim
中的字符爲分割符,將str
分割成一個個子串;若str
爲空值NULL
,則函數內部保存的靜態指針(指向上一次分割位置後一個字節)在下一次調用中將作爲起始位置。strtok_r
功能同strtok
,不過其將strtok
函數內部保存的指針顯示化,通過saveptr
輸入,以saveptr
作爲分割的起始位置。
參數
str
: 待分割的源字符串delim
: 分割符字符集合saveptr
: 一個指向char *
的指針變量,保存分割時的上下文
返回值
- 若未提取到子串,返回值爲指向源字符串首地址的指針,可以完整打印源字符串
- 若提取到子串,返回值爲提取出的子串的指針,這個指針指向的是子串在源字符串中的起始位置,因爲子串末尾的下一個字符在提取前爲分割符,提取後被修改成了
'/0’
,所以可以成功打印子串的內容 - 若在成功提取到子串後,沒有可以被分割的子串,返回NULL
示例
#include <stdio.h>
#include <string.h>
int main(void) {
char str[12] = "hello,world\0";
char *token = strtok(str, ",");
while (token != NULL) {
printf("%s\n", token);
token = strtok(NULL, ",");
}
return 0;
}
使用注意事項
不會生成新的字符串,只是在源字符串上做了修改,源字符串會發生變化
char str[12] = "hello,world\0";
printf("str before strtok: %s\n", str);
char *token = strtok(str, ",");
printf("str after strtok: %s\n", str);
$ str before strtok: hello,world
$ str after strtok: hello
如上實驗,str
的值,在對其做strtok
操作之後,發生了變化,分割符之後的內容不見了。事實上,strtok
函數是根據輸入的分割符(即,
),找到其首次出現的位置(即world
之前的,
),將其修改爲'/0’
。
第一個參數不可爲字符串常量
因爲strtok
函數會修改源字符串,所以第一個參數不可爲字符串常量,不然程序會拋出異常。
若在第一次提取子串後,繼續對源字符串進行提取,應在其後的調用中將第一個參數置爲空值NULL
char str[12] = "hello,world\0";
char *token = strtok(str, ",");
while (token != NULL) {
printf("%s\n", token);
token = strtok(NULL, ",");
}
$ hello
$ world
在第一次提取子串時,strtok
用一個指針指向了分割符的下一位,即’w’所在的位置,後續的提取給strtok
的第一個參數傳遞了空值NULL
,strtok
會從上一次調用隱式保存的位置,繼續分割字符串。
第二個參數是分割符的集合,支持多個分割符
char str[12] = "hello,world\0";
char *token = strtok(str, ",l");
printf("%s\n", token);
$ he
由上可見,strtok
函數在分割字符串時,不是完整匹配第二個參數傳入的分割符,而是使用包含在分割符集合中的字符進行匹配。
若首字符爲分割符,則會被忽略
char str[13] = ",hello,world\0";
char *token = strtok(str, ",");
printf("%s\n", token);
$ hello
如上所示,若首字符爲分割符,strtok
採用了比常規處理更快的方式,直接跳過了首字符。
strtok
爲不可重入函數,使用strtok_r
更靈活和安全
strtok
函數在內部使用了靜態變量,即用靜態指針保存了下一次調用的起始位置,對調用者不可見;strtok_r
則將strtok
內部隱式保存的指針,以參數的形式由調用者進行傳遞、保存甚至是修改,使函數更具靈活性和安全性;此外,在windows
也有分割字符串安全函數strtok_s
。
源碼
strtok.c:
/* Copyright (C) 1991-2018 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
#include <string.h>
/* Parse S into tokens separated by characters in DELIM.
If S is NULL, the last string strtok() was called with is
used. For example:
char s[] = "-abc-=-def";
x = strtok(s, "-"); // x = "abc"
x = strtok(NULL, "-="); // x = "def"
x = strtok(NULL, "="); // x = NULL
// s = "abc\0=-def\0"
*/
char *
strtok (char *s, const char *delim)
{
static char *olds;
return __strtok_r (s, delim, &olds);
}
strtok_r.c:
/* Reentrant string tokenizer. Generic version.
Copyright (C) 1991-2018 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
#ifdef HAVE_CONFIG_H
# include <config.h>
#endif
#include <string.h>
#ifndef _LIBC
/* Get specification. */
# include "strtok_r.h"
# define __strtok_r strtok_r
#endif
/* Parse S into tokens separated by characters in DELIM.
If S is NULL, the saved pointer in SAVE_PTR is used as
the next starting point. For example:
char s[] = "-abc-=-def";
char *sp;
x = strtok_r(s, "-", &sp); // x = "abc", sp = "=-def"
x = strtok_r(NULL, "-=", &sp); // x = "def", sp = NULL
x = strtok_r(NULL, "=", &sp); // x = NULL
// s = "abc\0-def\0"
*/
char *
__strtok_r (char *s, const char *delim, char **save_ptr)
{
char *end;
if (s == NULL)
s = *save_ptr;
if (*s == '\0')
{
*save_ptr = s;
return NULL;
}
/* Scan leading delimiters. */
s += strspn (s, delim);
if (*s == '\0')
{
*save_ptr = s;
return NULL;
}
/* Find the end of the token. */
end = s + strcspn (s, delim);
if (*end == '\0')
{
*save_ptr = end;
return s;
}
/* Terminate the token and make *SAVE_PTR point past it. */
*end = '\0';
*save_ptr = end + 1;
return s;
}
#ifdef weak_alias
libc_hidden_def (__strtok_r)
weak_alias (__strtok_r, strtok_r)
#endif
Reference
微信公衆號同步更新,微信搜索"AnSwEr不是答案"或者掃描二維碼,即可訂閱。
- GitHub:AnSwErYWJ
- Blog:http://www.answerywj.com
- Email:[email protected]
- Weibo:@AnSwEr不是答案