字符串轉換成整數，字符串匹配問題

本文轉自csdn大神v_JULY_v的博客

地址：

http://blog.csdn.net/v_july_v/article/details/9024123

閱讀心得：自己原先想得太天真了。。。

第三十~三十一章：字符串轉換成整數，字符串匹配問題

前言

之前本一直想寫寫神經網絡算法和EM算法，但寫這兩個算法實在需要大段大段的時間，而平時上班，週末則跑去北大教室自習看書（順便以時間爲序，說下過去半年看過的自覺還不錯的幾本書：《數理統計學簡史》《微積分概念發展史》《微積分的歷程：從牛頓到勒貝格》《數學恩仇錄》《數學與知識的探求》《古今數學思想》《素數之戀》），故一直未曾有時間寫。

然最近在負責一款在線編程挑戰平臺：http://hero.pongo.cn/（簡稱hero，通俗理解是中國的topcoder，當然，一直在不斷完善中，與一般OJ不同點在於，OJ側重爲參與ACM競賽者提供刷題練習的場所，而hero則着重爲企業招聘面試服務），在上面出了幾道編程面試題，有些題目看似簡單，但一coding，很多問題便立馬都在hero上給暴露出來了，故就從hero上的編程挑戰題切入，繼續更新本程序員編程藝術系列吧。

況且，前幾天與一朋友聊天，他說他認識的今年360招進來的三四十人應屆生包括他自己找工作時基本都看過我的博客，則更增加了更新此編程藝術系列的動力。

OK，本文講兩個問題：

第三十章、字符串轉換成整數；
第三十一章、字符串匹配問題

還是這句老話，有問題懇請隨時批評指正，感謝。

第三十章、字符串轉換成整數

先看題目：

輸入一個表示整數的字符串，把該字符串轉換成整數並輸出，例如輸入字符串"345"，則輸出整數345。
請完成函數StrToInt，實現字符串轉換成整數的功能，不得用庫函數atoi。

我們來一步一步分析，直至寫出第一份準確的代碼：

1、本題考查的實際上就是字符串轉換成整數的問題，或者說是要你自行實現atoi函數。那如何實現把表示整數的字符串正確地轉換成整數呢？以"345"作爲例子：

當我們掃描到字符串的第一個字符'3'時，由於我們知道這是第一位，所以得到數字3。
當掃描到第二個數字'4'時，而之前我們知道前面有一個3，所以便在後面加上一個數字4，那前面的3相當於30，因此得到數字：3*10+4=34。
繼續掃描到字符'5'，'5'的前面已經有了34，由於前面的34相當於340，加上後面掃描到的5，最終得到的數是：34*10+5=345。

因此，此題的思路便是：每掃描到一個字符，我們便把在之前得到的數字乘以10，然後再加上當前字符表示的數字。

2、思路有了，有一些細節需要注意，如zhedahht所說：

“由於整數可能不僅僅之含有數字，還有可能以'+'或者'-'開頭，表示整數的正負。因此我們需要把這個字符串的第一個字符做特殊處理。如果第一個字符是'+'號，則不需要做任何操作；如果第一個字符是'-'號，則表明這個整數是個負數，在最後的時候我們要把得到的數值變成負數。
接着我們試着處理非法輸入。由於輸入的是指針，在使用指針之前，我們要做的第一件是判斷這個指針是不是爲空。如果試着去訪問空指針，將不可避免地導致程序崩潰。
另外，輸入的字符串中可能含有不是數字的字符。每當碰到這些非法的字符，我們就沒有必要再繼續轉換。
最後一個需要考慮的問題是溢出問題。由於輸入的數字是以字符串的形式輸入，因此有可能輸入一個很大的數字轉換之後會超過能夠表示的最大的整數而溢出。”

比如，當給的字符串是如左邊圖片所示的時候，有考慮到麼？當然，它們各自對應的正確輸出如右邊圖片所示（假定你是在32位系統下，且編譯環境是VS2008以上）：

3、很快，可能你就會寫下如下代碼：

//copyright@zhedahht 2007
enum Status {kValid = 0, kInvalid};
int g_nStatus = kValid;
// Convert a string into an integer
int StrToInt(const char* str)
{
g_nStatus = kInvalid;
long long num = 0;
if(str != NULL)
{
const char* digit = str;
// the first char in the string maybe '+' or '-'
bool minus = false;
if(*digit == '+')
digit ++;
else if(*digit == '-')
{
digit ++;
minus = true;
}
// the remaining chars in the string
while(*digit != '\0')
{
if(*digit >= '0' && *digit <= '9')
{
num = num * 10 + (*digit - '0');
// overflow
if(num > std::numeric_limits<int>::max())
{
num = 0;
break;
}
digit ++;
}
// if the char is not a digit, invalid input
else
{
num = 0;
break;
}
}
if(*digit == '\0')
{
g_nStatus = kValid;
if(minus)
num = 0 - num;
}
}
return static_cast<int>(num);
}

run下上述程序，會發現當輸入字符串是下圖中紅叉叉部分所對應的時候，程序結果出錯：

兩個問題：

當輸入的字符串不是數字，而是字符的時候，比如“1a”，上述程序直接返回了0（而正確的結果應該是得到1）：
1. // if the char is not a digit, invalid input
2. else
3. {
4. num = 0;
5. break;
6. }
處理溢出時，有問題。

4、把代碼做下微調，如下：

//copyright@SP_daiyq 2013/5/29
int StrToInt(const char* str)
{
int res = 0; // result
int i = 0; // index of str
int signal = '+'; // signal '+' or '-'
int cur; // current digit
if (!str)
return 0;
// skip backspace
while (isspace(str[i]))
i++;
// skip signal
if (str[i] == '+' || str[i] == '-')
{
signal = str[i];
i++;
}
// get result
while (str[i] >= '0' && str[i] <= '9')
{
cur = str[i] - '0';
// judge overlap or not
if ( (signal == '+') && (cur > INT_MAX - res*10) )
{
res = INT_MAX;
break;
}
else if ( (signal == '-') && (cur -1 > INT_MAX - res*10) )
{
res = INT_MIN;
break;
}
res = res * 10 + cur;
i++;
}
return (signal == '-') ? -res : res;
}

會發現，上面第4小節所述的第1個問題解決了：

但，即使這樣，上述代碼也還是有問題的。當給定下述測試數據的時候，問題就來了：

需要轉換的字符串代碼運行結果理應得到的正確結果

" 10522545459" 1932610867 2147483647

" +10523538441s" 1933603849 2147483647

" +10432359437" 1842424845 2147483647

什麼問題呢？比如說用上述代碼轉換這個字符串：" 10522545459"，它本應得到的正確結果應該是2147483647，但程序實際得到的結果卻是：1932610867。故很明顯，程序沒有很好的解決上面的第2個小問題：溢出問題。

5、上面說給的程序沒有“很好的解決溢出問題。由於輸入的數字是以字符串的形式輸入，因此有可能輸入一個很大的數字轉換之後會超過能夠表示的最大的整數而溢出。”那麼，到底代碼該如何寫呢？

//copyright@淹死鯊魚ronkins 2013/5/29
//挑戰題目：http://hero.pongo.cn/Question/Details?ID=47&ExamID=45
int atoi(const char* str)
{
long long res = 0;
int sign = 1;
while(isspace(*str))++str;
if('+' == *str){
++str;
}else if('-' == *str){
sign = -1;
++str;
}
for(; isdigit(*str); ++str){
res *= 10;
if(sign > 0)
res += (*str - '0');
else
res -= (*str - '0');
if(res >= INT_MAX)return INT_MAX;
else if(res <= INT_MIN)return INT_MIN;
}
return res;
}

上面的代碼看似能處理數據溢出的問題，其實它只是做了個取巧，即把返回的值res定義成了long long，如下所示：

long long res = 0;

故嚴格說來，我們依然未寫出準確的規範代碼。

6、那到底該如何解決這個數據溢出的問題呢？庫函數atoi的規定超過int值，按最大值maxint：2147483647來，超過-int按最小值minint：-2147483648來。咱們先來看看Microsoft是如何實現atoi的吧：

//atol函數
//Copyright (c) 1989-1997, Microsoft Corporation. All rights reserved.
long __cdecl atol(
const char *nptr
)
{
int c; /* current char */
long total; /* current total */
int sign; /* if ''-'', then negative, otherwise positive */
/* skip whitespace */
while ( isspace((int)(unsigned char)*nptr) )
++nptr;
c = (int)(unsigned char)*nptr++;
sign = c; /* save sign indication */
if (c == ''-'' || c == ''+'')
c = (int)(unsigned char)*nptr++; /* skip sign */
total = 0;
while (isdigit(c)) {
total = 10 * total + (c - ''0''); /* accumulate digit */
c = (int)(unsigned char)*nptr++; /* get next char */
}
if (sign == ''-'')
return -total;
else
return total; /* return result, negated if necessary */
}

其中，isspace和isdigit函數的實現代碼爲：

isspace(int x)
{
if(x==' '||x=='/t'||x=='/n'||x=='/f'||x=='/b'||x=='/r')
return 1;
else
return 0;
}
isdigit(int x)
{
if(x<='9'&&x>='0')
return 1;
else
return 0;
}

然後atoi調用上面的atol函數，如下所示：

//atoi調用上述的atol
int __cdecl atoi(
const char *nptr
)
{
//Overflow is not detected. Because of this, we can just use
return (int)atol(nptr);
}

但很遺憾的是，上述atoi標準代碼依然返回的是long：

long total; /* current total */
if (sign == ''-'')
return -total;
else
return total; /* return result, negated if necessary */

再者，下面這裏定義成long的total與10相乘，即total*10很容易溢出：

long total; /* current total */
total = 10 * total + (c - ''0''); /* accumulate digit */

7、繼續尋找。接下來，咱們來看看linux內核中是如何實現此字符串轉換爲整數的問題的。

linux內核中提供了以下幾個函數：

simple_strtol，把一個字符串轉換爲一個有符號長整數；
simple_strtoll，把一個字符串轉換爲一個有符號長長整數；
simple_strtoul，把一個字符串轉換爲一個無符號長整數；
simple_strtoull，把一個字符串轉換爲一個無符號長長整數

相關源碼及分析如下。

首先，atoi調下面的strtol：

//linux/lib/vsprintf.c
//Copyright (C) 1991, 1992 Linus Torvalds
//simple_strtol - convert a string to a signed long
long simple_strtol(const char *cp, char **endp, unsigned int base)
{
if (*cp == '-')
return -simple_strtoul(cp + 1, endp, base);
return simple_strtoul(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtol);

然後，上面的strtol調下面的strtoul：

//simple_strtoul - convert a string to an unsigned long
unsigned long simple_strtoul(const char *cp, char **endp, unsigned int base)
{
return simple_strtoull(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtoul);

接着，上面的strtoul調下面的strtoull：

//simple_strtoll - convert a string to a signed long long
long long simple_strtoll(const char *cp, char **endp, unsigned int base)
{
if (*cp == '-')
return -simple_strtoull(cp + 1, endp, base);
return simple_strtoull(cp, endp, base);
}
EXPORT_SYMBOL(simple_strtoll);

最後，strtoull調_parse_integer_fixup_radix和_parse_integer來處理相關邏輯：

//simple_strtoull - convert a string to an unsigned long long
unsigned long long simple_strtoull(const char *cp, char **endp, unsigned int base)
{
unsigned long long result;
unsigned int rv;
cp = _parse_integer_fixup_radix(cp, &base);
rv = _parse_integer(cp, base, &result);
/* FIXME */
cp += (rv & ~KSTRTOX_OVERFLOW);
if (endp)
*endp = (char *)cp;
return result;
}
EXPORT_SYMBOL(simple_strtoull);

重頭戲來了。接下來，我們來看上面strtoull函數中的parse_integer_fixup_radix和_parse_integer兩段代碼。如鯊魚所說

“真正的處理邏輯主要是在_parse_integer裏面，關於溢出的處理，_parse_integer處理的很優美，
而_parse_integer_fixup_radix是用來自動根據字符串判斷進制的”。

先來看_parse_integer函數：

//lib/kstrtox.c, line 39
//Convert non-negative integer string representation in explicitly given radix to an integer.
//Return number of characters consumed maybe or-ed with overflow bit.
//If overflow occurs, result integer (incorrect) is still returned.
unsigned int _parse_integer(const char *s, unsigned int base, unsigned long long *p)
{
unsigned long long res;
unsigned int rv;
int overflow;
res = 0;
rv = 0;
overflow = 0;
while (*s) {
unsigned int val;
if ('0' <= *s && *s <= '9')
val = *s - '0';
else if ('a' <= _tolower(*s) && _tolower(*s) <= 'f')
val = _tolower(*s) - 'a' + 10;
else
break;
if (val >= base)
break;
/*
* Check for overflow only if we are within range of
* it in the max base we support (16)
*/
if (unlikely(res & (~0ull << 60))) {
if (res > div_u64(ULLONG_MAX - val, base))
overflow = 1;
}
res = res * base + val;
rv++;
s++;
}
*p = res;
if (overflow)
rv |= KSTRTOX_OVERFLOW;
return rv;
}

解釋下兩個小細節：

上頭出現了個unlikely，其實unlikely和likely經常出現在linux相關內核源碼中
1. if(likely(value)){
2. //等價於if(likely(value)) == if(value)
3. }
4. else{
5. }
likely表示value爲真的可能性更大，而unlikely表示value爲假的可能性更大，這兩個宏被定義成：
1. //include/linux/compiler.h
2. # ifndef likely
3. # define likely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 1))
4. # endif
5. # ifndef unlikely
6. # define unlikely(x) (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0))
7. # endif
呈現下div_u64的代碼：
1. //include/linux/math64.h
2. //div_u64
3. static inline u64 div_u64(u64 dividend, u32 divisor)
4. {
5. u32 remainder;
6. return div_u64_rem(dividend, divisor, &remainder);
7. }
9. //div_u64_rem
10. static inline u64 div_u64_rem(u64 dividend, u32 divisor, u32 *remainder)
11. {
12. *remainder = dividend % divisor;
13. return dividend / divisor;
14. }

最後看下_parse_integer_fixup_radix函數：

//lib/kstrtox.c, line 23
const char *_parse_integer_fixup_radix(const char *s, unsigned int *base)
{
if (*base == 0) {
if (s[0] == '0') {
if (_tolower(s[1]) == 'x' && isxdigit(s[2]))
*base = 16;
else
*base = 8;
} else
*base = 10;
}
if (*base == 16 && s[0] == '0' && _tolower(s[1]) == 'x')
s += 2;
return s;
}

OK，至此，字符串轉換成整數的問題算是已經解決。如果面試官繼續問你，如何把整數轉換成字符串呢？請讀者思考，同時也歡迎於本文評論下或hero上show your code。

第三十一章、字符串匹配問題

字符串匹配問題，給定一串字符串，按照指定規則對其進行匹配，並將匹配的結果保存至output數組中，多個匹配項用空格間隔，最後一個不需要空格。

要求：

匹配規則中包含通配符？和*，其中？表示匹配任意一個字符，*表示匹配任意多個（>=0）字符。
匹配規則要求匹配最大的字符子串，例如a*d,匹配abbdd而非abbd,即最大匹配子串。
匹配後的輸入串不再進行匹配，從當前匹配後的字符串重新匹配其他字符串。

請實現函數：char* my_find(char input[], char rule[])

舉例說明

input:abcadefg
rule:a?c
output:abc

input :newsadfanewfdadsf
rule: new
output: new new

input :breakfastfood
rule: f*d
output:fastfood

注意事項：

自行實現函數my_find，勿在my_find函數裏夾雜輸出，且不準用C、C++庫，和Java的String對象；
請注意代碼的時間，空間複雜度，及可讀性，簡潔性；
input=aaa，rule=aa時，返回一個結果aa，即可。

1、本題與上述第三十章的題不同，上題字符串轉換成整數更多考察對思維的全面性和對細節的處理，本題則更多的是編程技巧。閒不多說，直接上代碼：

int str_len(char *a) { //字符串長度
if (a == 0) {
return 0;
}
char *t = a;
for (;*t;++t)
;
return (int) (t - a);
}
void str_copy(char *a,const char *b,int len) { //拷貝字符串 a = b
for (;len > 0; --len, ++b,++a) {
*a = *b;
}
*a = 0;
}
char *str_join(char *a,const char *b,int lenb) { //連接字符串第一個字符串被回收
char *t;
if (a == 0) {
t = (char *) malloc(sizeof(char) * (lenb + 1));
str_copy(t, b, lenb);
return t;
}
else {
int lena = str_len(a);
t = (char *) malloc(sizeof(char) * (lena + lenb + 2));
str_copy(t, a, lena);
*(t + lena) = ' ';
str_copy(t + lena + 1, b, lenb);
free(a);
return t;
}
}
int canMatch(char *input, char *rule) { // 返回最長匹配長度 -1表示不匹配　
if (*rule == 0) { //已經到rule尾端
return 0;
}
int r = -1 ,may;
if (*rule == '*') {
r = canMatch(input, rule + 1); // *匹配0個字符
if (*input) {
may = canMatch(input + 1, rule); // *匹配非0個字符
if ((may >= 0) && (++may > r)) {
r = may;
}
}
}
if (*input == 0) { //到尾端
return r;
}
if ((*rule == '?') || (*rule == *input)) {
may = canMatch(input + 1, rule + 1);
if ((may >= 0) && (++may > r)) {
r = may;
}
}
return r;
}
char * my_find(char input[], char rule[]) {
int len = str_len(input);
int *match = (int *) malloc(sizeof(int) * len); //input第i位最多能匹配多少位匹配不上是-1
int i,max_pos = - 1;
char *output = 0;
for (i = 0; i < len; ++i) {
match[i] = canMatch(input + i, rule);
if ((max_pos < 0) || (match[i] > match[max_pos])) {
max_pos = i;
}
}
if ((max_pos < 0) || (match[max_pos] <= 0)) { //不匹配
output = (char *) malloc(sizeof(char));
*output = 0; // \0
return output;
}
for (i = 0; i < len;) {
if (match[i] == match[max_pos]) { //找到匹配
output = str_join(output, input + i, match[i]);
i += match[i];
}
else {
++i;
}
}
free(match);
return output;
}

2、本題也可以直接寫出DP方程，如下代碼所示：

char * my_find(char input[], char rule[]) {
int len = str_len(input);
int *match = (int *) malloc(sizeof(int) * len); //input第i位最多能匹配多少位匹配不上是-1
int i,max_pos = - 1;
char *output = 0;
for (i = 0; i < len; ++i) {
match[i] = canMatch(input + i, rule);
if ((max_pos < 0) || (match[i] > match[max_pos])) {
max_pos = i;
}
}
if ((max_pos < 0) || (match[max_pos] <= 0)) { //不匹配
output = (char *) malloc(sizeof(char));
*output = 0; // \0
return output;
}
for (i = 0; i < len;) {
if (match[i] == match[max_pos]) { //找到匹配
output = str_join(output, input + i, match[i]);
i += match[i];
}
else {
++i;
}
}
free(match);
return output;
}
char* my_find(char input[], char rule[])
{
//write your code here
int len1,len2;
for(len1 = 0;input[len1];len1++);
for(len2 = 0;rule[len2];len2++);
int MAXN = len1>len2?(len1+1):(len2+1);
int **dp;
//dp[i][j]表示字符串1和字符串2分別以i j結尾匹配的最大長度
//記錄dp[i][j]是由之前那個節點推算過來 i*MAXN+j
dp = new int *[len1+1];
for (int i = 0;i<=len1;i++)
{
dp[i] = new int[len2+1];
}
dp[0][0] = 0;
for(int i = 1;i<=len2;i++)dp[0][i] = -1;
for(int i = 1;i<=len1;i++)dp[i][0] = 0;
for (int i = 1;i<=len1;i++)
{
for (int j = 1;j<=len2;j++)
{
if(rule[j-1]=='*'){
dp[i][j] = -1;
if (dp[i-1][j-1]!=-1)
{
dp[i][j] = dp[i-1][j-1]+1;
}
if (dp[i-1][j]!=-1 && dp[i][j]<dp[i-1][j]+1)
{
dp[i][j] = dp[i-1][j]+1;
}
}else if (rule[j-1]=='?')
{
if(dp[i-1][j-1]!=-1){
dp[i][j] = dp[i-1][j-1]+1;
}else dp[i][j] = -1;
}
else
{
if(dp[i-1][j-1]!=-1 && input[i-1]==rule[j-1]){
dp[i][j] = dp[i-1][j-1]+1;
}else dp[i][j] = -1;
}
}
}
int m = -1;//記錄最大字符串長度
int *ans = new int[len1];
int count_ans = 0;//記錄答案個數
char *returnans = new char[len1+1];
int count = 0;
for(int i = 1;i<=len1;i++)
if (dp[i][len2]>m){
m = dp[i][len2];
count_ans = 0;
ans[count_ans++] = i-m;
}else if(dp[i][len2]!=-1 &&dp[i][len2]==m){
ans[count_ans++] = i-m;
}
if (count_ans!=0)
{
int len = ans[0];
for (int i = 0;i<m;i++)
{
printf("%c",input[i+ans[0]]);
returnans[count++] = input[i+ans[0]];
}
for (int j = 1;j<count_ans;j++)
{
printf(" ");
returnans[count++] = ' ';
len = ans[j];
for (int i = 0;i<m;i++)
{
printf("%c",input[i+ans[j]]);
returnans[count++] = input[i+ans[j]];
}
}
printf("\n");
returnans[count++] = '\0';
}
return returnans;
}

歡迎於本文評論下或hero上show your code。

參考文獻及推薦閱讀

http://zhedahht.blog.163.com/blog/static/25411174200731139971/；
http://hero.pongo.cn/，本文大部分代碼都取自左邊hero上參與答題者提交的代碼，歡迎你也去挑戰；
字符串轉換成整數題目完整描述：http://hero.pongo.cn/Question/Details?ID=47&ExamID=45；
字符串匹配問題題目完整描述：http://hero.pongo.cn/Question/Details?ID=28&ExamID=28；
linux3.8.4版本下的相關字符串整數轉換函數概覽：https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/tree/lib/vsprintf.c?id=refs/tags/v3.9.4；
關於linux中的likely和unlikely：http://blog.21ic.com/user1/5593/archives/2010/68193.html；

草原面朝大海

發佈了59 篇原創文章 · 獲贊 9 · 訪問量 18萬+

私信關注

字符串轉換成整數，字符串匹配問題

本文轉自csdn大神v_JULY_v的博客

閱讀心得：自己原先想得太天真了。。。

第三十~三十一章：字符串轉換成整數，字符串匹配問題

前言

第三十章、字符串轉換成整數

第三十一章、字符串匹配問題

參考文獻及推薦閱讀

認識 Ubuntu 各種桌面

c++ main() 標準定義式

子集和問題及 c++ 代碼實現

Ubuntu 系統備份恢復

常見排序算法的實現與性能比較

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結