深究strtok系列函數

本文通過分析源碼,深究GLIBCstrtokstrtok_r函數的實現原理和使用過程中的注意事項。


函數說明

#include <string.h>

char *strtok(char *str, const char *delim);
char *strtok_r(char *str, const char *delim, char **saveptr);

說明

  • strtok以包含在delim中的字符爲分割符,將str分割成一個個子串;若str爲空值NULL,則函數內部保存的靜態指針(指向上一次分割位置後一個字節)在下一次調用中將作爲起始位置。
  • strtok_r功能同strtok,不過其將strtok函數內部保存的指針顯示化,通過saveptr輸入,以saveptr作爲分割的起始位置。

參數

  • str: 待分割的源字符串
  • delim: 分割符字符集合
  • saveptr: 一個指向char *的指針變量,保存分割時的上下文

返回值

  • 若未提取到子串,返回值爲指向源字符串首地址的指針,可以完整打印源字符串
  • 若提取到子串,返回值爲提取出的子串的指針,這個指針指向的是子串在源字符串中的起始位置,因爲子串末尾的下一個字符在提取前爲分割符,提取後被修改成了'/0’,所以可以成功打印子串的內容
  • 若在成功提取到子串後,沒有可以被分割的子串,返回NULL

示例

#include <stdio.h>
#include <string.h>

int main(void) {
  char str[12] = "hello,world\0";
  char *token = strtok(str, ",");

  while (token != NULL) {
    printf("%s\n", token);
    token = strtok(NULL, ",");
  }
   
  return 0;
}

使用注意事項

不會生成新的字符串,只是在源字符串上做了修改,源字符串會發生變化

char str[12] = "hello,world\0";
printf("str before strtok: %s\n", str);
char *token = strtok(str, ",");
printf("str after strtok: %s\n", str);
$ str before strtok: hello,world
$ str after strtok: hello

如上實驗,str的值,在對其做strtok操作之後,發生了變化,分割符之後的內容不見了。事實上,strtok函數是根據輸入的分割符(即,),找到其首次出現的位置(即world之前的,),將其修改爲'/0’

第一個參數不可爲字符串常量

因爲strtok函數會修改源字符串,所以第一個參數不可爲字符串常量,不然程序會拋出異常。

若在第一次提取子串後,繼續對源字符串進行提取,應在其後的調用中將第一個參數置爲空值NULL

char str[12] = "hello,world\0";
char *token = strtok(str, ",");   
while (token != NULL) {
    printf("%s\n", token);
    token = strtok(NULL, ",");
}
$ hello
$ world

在第一次提取子串時,strtok用一個指針指向了分割符的下一位,即’w’所在的位置,後續的提取給strtok的第一個參數傳遞了空值NULLstrtok會從上一次調用隱式保存的位置,繼續分割字符串。

第二個參數是分割符的集合,支持多個分割符

char str[12] = "hello,world\0";
char *token = strtok(str, ",l");
printf("%s\n", token);
$ he

由上可見,strtok函數在分割字符串時,不是完整匹配第二個參數傳入的分割符,而是使用包含在分割符集合中的字符進行匹配。

若首字符爲分割符,則會被忽略

char str[13] = ",hello,world\0";
char *token = strtok(str, ",");
printf("%s\n", token);
$ hello

如上所示,若首字符爲分割符,strtok採用了比常規處理更快的方式,直接跳過了首字符。

strtok爲不可重入函數,使用strtok_r更靈活和安全

strtok函數在內部使用了靜態變量,即用靜態指針保存了下一次調用的起始位置,對調用者不可見;strtok_r則將strtok內部隱式保存的指針,以參數的形式由調用者進行傳遞、保存甚至是修改,使函數更具靈活性和安全性;此外,在windows也有分割字符串安全函數strtok_s

源碼

strtok.c:

/* Copyright (C) 1991-2018 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

#include <string.h>


/* Parse S into tokens separated by characters in DELIM.
   If S is NULL, the last string strtok() was called with is
   used.  For example:
	char s[] = "-abc-=-def";
	x = strtok(s, "-");		// x = "abc"
	x = strtok(NULL, "-=");		// x = "def"
	x = strtok(NULL, "=");		// x = NULL
		// s = "abc\0=-def\0"
*/
char *
strtok (char *s, const char *delim)
{
  static char *olds;
  return __strtok_r (s, delim, &olds);
}

strtok_r.c:

/* Reentrant string tokenizer.  Generic version.
   Copyright (C) 1991-2018 Free Software Foundation, Inc.
   This file is part of the GNU C Library.

   The GNU C Library is free software; you can redistribute it and/or
   modify it under the terms of the GNU Lesser General Public
   License as published by the Free Software Foundation; either
   version 2.1 of the License, or (at your option) any later version.

   The GNU C Library is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
   Lesser General Public License for more details.

   You should have received a copy of the GNU Lesser General Public
   License along with the GNU C Library; if not, see
   <http://www.gnu.org/licenses/>.  */

#ifdef HAVE_CONFIG_H
# include <config.h>
#endif

#include <string.h>

#ifndef _LIBC
/* Get specification.  */
# include "strtok_r.h"
# define __strtok_r strtok_r
#endif

/* Parse S into tokens separated by characters in DELIM.
   If S is NULL, the saved pointer in SAVE_PTR is used as
   the next starting point.  For example:
	char s[] = "-abc-=-def";
	char *sp;
	x = strtok_r(s, "-", &sp);	// x = "abc", sp = "=-def"
	x = strtok_r(NULL, "-=", &sp);	// x = "def", sp = NULL
	x = strtok_r(NULL, "=", &sp);	// x = NULL
		// s = "abc\0-def\0"
*/
char *
__strtok_r (char *s, const char *delim, char **save_ptr)
{
  char *end;

  if (s == NULL)
    s = *save_ptr;

  if (*s == '\0')
    {
      *save_ptr = s;
      return NULL;
    }

  /* Scan leading delimiters.  */
  s += strspn (s, delim);
  if (*s == '\0')
    {
      *save_ptr = s;
      return NULL;
    }

  /* Find the end of the token.  */
  end = s + strcspn (s, delim);
  if (*end == '\0')
    {
      *save_ptr = end;
      return s;
    }

  /* Terminate the token and make *SAVE_PTR point past it.  */
  *end = '\0';
  *save_ptr = end + 1;
  return s;
}
#ifdef weak_alias
libc_hidden_def (__strtok_r)
weak_alias (__strtok_r, strtok_r)
#endif

Reference


微信公衆號同步更新,微信搜索"AnSwEr不是答案"或者掃描二維碼,即可訂閱。

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章