strtok()的實現原理是什麼?

按理說這是對所有ANSI C的問題,不針對具體編譯器

大家都知道strtok()函數能把一個字符串按特定的分割方式一段一段取出來,典型的用法是:
char input[] = "abc,defgh,ij,klm";
char *p;
p = strtok(input, ",");
if (p) printf("%s/n", p); //顯示"abc"

p = strtok(NULL, ",");
if (p) printf("%s/n", p); //顯示"defgh"

p = strtok(NULL, ",");
if (p) printf("%s/n", p); //顯示"ij"

p = strtok(NULL, ",");
if (p) printf("%s/n", p); //顯示"klm"

對於第一次調用strtok(),大家都很明白,函數把abc後面的逗號改成NULL,返回值p指向&input[0],顯示出來就是"abc"
後面的幾次蠻奇怪,目標字符串竟然是NULL!
爲什麼要這樣用呢?strtok是用什麼辦法記住上一次調用的目標字符串是input的呢?

尋找原程序:
>man 3 strtok
... ...
LIBRARY
Standard C Library (libc, -lc)
... ...

>strings /usr/lib/libc.a | grep strtok
strtok_r
__strtok_r
... ...
$FreeBSD: src/lib/libc/string/strtok.c,v 1.9 2002/09/07 02:53:19 tjr Exp $
... ...

>cat /usr/src/lib/libc/string/strtok.c
/*-
* Copyright (c) 1998 Softweyr LLC. All rights reserved.
*
* strtok_r, from Berkeley strtok
* Oct 13, 1998 by Wes Peters <[email protected]>
*
* Copyright (c) 1988, 1993
* The Regents of the University of California. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
* 1. Redistributions of source code must retain the above copyright
* notices, this list of conditions and the following disclaimer.
* 2. Redistributions in binary form must reproduce the above copyright
* notices, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* 3. All advertising materials mentioning features or use of this software
* must display the following acknowledgement:
* This product includes software developed by Softweyr LLC, the
* University of California, Berkeley, and its contributors.
* 4. Neither the name of the University nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY SOFTWEYR LLC, THE REGENTS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
* PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL SOFTWEYR LLC, THE
* REGENTS, OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
* SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
* TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
* PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
* LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
* NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
* SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/

#if defined(LIBC_SCCS) && !defined(lint)
static char sccsid[] = "@(#)strtok.c 8.1 (Berkeley) 6/4/93";
#endif /* LIBC_SCCS and not lint */
#include <sys/cdefs.h>
__FBSDID("$FreeBSD: src/lib/libc/string/strtok.c,v 1.9 2002/09/07 02:53:19 tjr Exp $");

#include <stddef.h>
#ifdef DEBUG_STRTOK
#include <stdio.h>
#endif
#include <string.h>

char *__strtok_r(char *, const char *, char **);

__weak_reference(__strtok_r, strtok_r);

char *
__strtok_r(char *s, const char *delim, char **last)
{
char *spanp, *tok;
int c, sc;

if (s == NULL && (s = *last) == NULL)
return (NULL);

/*
* Skip (span) leading delimiters (s += strspn(s, delim), sort of).
*/
cont:
c = *s++;
for (spanp = (char *)delim; (sc = *spanp++) != 0;) {
if (c == sc)
goto cont;
}

if (c == 0) { /* no non-delimiter characters */
*last = NULL;
return (NULL);
}
tok = s - 1;

/*
* Scan token (scan for delimiters: s += strcspn(s, delim), sort of).
* Note that delim must have one NUL; we stop if we see that, too.
*/
for (;;) {
c = *s++;
spanp = (char *)delim;
do {
if ((sc = *spanp++) == c) {
if (c == 0)
s = NULL;
else
s[-1] = '/0';
*last = s;
return (tok);
}
} while (sc != 0);
}
/* NOTREACHED */
}

char *
strtok(char *s, const char *delim)
{
static char *last; /*定義一個靜態變量*/

return (__strtok_r(s, delim, &last));
}

#ifdef DEBUG_STRTOK
/*
* Test the tokenizer.
*/
int
main(void)
{
char blah[80], test[80];
char *brkb, *brkt, *phrase, *sep, *word;

sep = "///:;=-";
phrase = "foo";

printf("String tokenizer test:/n");
strcpy(test, "This;is.a:test:of=the/string//tokenizer-function.");
for (word = strtok(test, sep); word; word = strtok(NULL, sep))
printf("Next word is /"%s/"./n", word);
strcpy(test, "This;is.a:test:of=the/string//tokenizer-function.");

for (word = strtok_r(test, sep, &brkt); word;
word = strtok_r(NULL, sep, &brkt)) {
strcpy(blah, "blah:blat:blab:blag");

for (phrase = strtok_r(blah, sep, &brkb); phrase;
phrase = strtok_r(NULL, sep, &brkb))
printf("So far we're at %s:%s/n", word, phrase);
}

return (0);
}

#endif /* DEBUG_STRTOK */


>cat /usr/include/string.h | grep strtok
char *strtok(char * __restrict, const char * __restrict);
char *strtok_r(char *, const char *, char **);

注:
restrict表示,在這個函數內,這兩個指針的值得任何改變,都是通過這兩個指針進行的。這樣,編譯器就可以自由優化了。從而使C可以達到Fortran一樣的運算效率。
C99支持才支持restrict。

GCC 是支持C99的,但其默認值不是C99標準,爲了使用C99語法可以在編譯參數中加入 -std=c99(使用了但是還是出現error: invalid use of `restrict',gcc version 3.4.2 [FreeBSD] 20040728)

自己認爲,最好使用strtok_r,而不使用strtok.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章