Perl統計英文單詞(沒有考慮縮寫和連寫詞)

英文中單詞之間直接有空格,而且Perl利用正則直接能匹配上,同時perl考慮可以用hash表存儲,而C++ 可以考慮用STL map存取。隨便在intermediate perl教程上copy了一段文字,下面是perl 的簡單實現方法:


my @data = <DATA>;
my $data = join('', @data);
my %words;
while ($data =~ m#\b(\w+)\b#smg)
{
if (not exists $words{$1})
{
$words{$1} = 1;
}
else
{
$words{$1}++;
}
}


for my $key ( sort keys %words)
{
print "$key => $words{$key}\n";
}


__DATA__
Each new release of the debugger works slightly differently than previous releases, so
our screen might not look exactly like what we show here. Also, if we get stuck at any
time, we can type h for help, or look at perldebug.
The debugger shows each line of code before it executes it. That means that, at this
point, we’re about to invoke the autovivification, and we’ve got our keys established.
The s command single-steps the program, while the x command dumps a list of values
in a nice format. We can see that $source, $destination, and $bytes are correct, and
now it’s time to update the data:


運行結果:

Also => 1
Each => 1
That => 1
The => 2
We => 1
a => 2
about => 1
and => 3
any => 1
are => 1
at => 3
autovivification => 1
before => 1
bytes => 1
can => 2
code => 1
command => 2
correct => 1
data => 1
debugger => 2
destination => 1
differently => 1
dumps => 1
each => 1
established => 1
exactly => 1
executes => 1
for => 1
format => 1
get => 1
got => 1
h => 1
help => 1
here => 1
if => 1
in => 1
invoke => 1
it => 3
keys => 1
like => 1
line => 1
list => 1
look => 2
means => 1
might => 1
new => 1
nice => 1
not => 1
now => 1
of => 3
or => 1
our => 2
perldebug => 1
point => 1
previous => 1
program => 1
re => 1
release => 1
releases => 1
s => 2
screen => 1
see => 1
show => 1
shows => 1
single => 1
slightly => 1
so => 1
source => 1
steps => 1
stuck => 1
than => 1
that => 2
the => 5
this => 1
time => 2
to => 2
type => 1
update => 1
values => 1
ve => 1
we => 5
what => 1
while => 1
works => 1
x => 1




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章