Perl统计英文单词(没有考虑缩写和连写词)

英文中单词之间直接有空格,而且Perl利用正则直接能匹配上,同时perl考虑可以用hash表存储,而C++ 可以考虑用STL map存取。随便在intermediate perl教程上copy了一段文字,下面是perl 的简单实现方法:


my @data = <DATA>;
my $data = join('', @data);
my %words;
while ($data =~ m#\b(\w+)\b#smg)
{
if (not exists $words{$1})
{
$words{$1} = 1;
}
else
{
$words{$1}++;
}
}


for my $key ( sort keys %words)
{
print "$key => $words{$key}\n";
}


__DATA__
Each new release of the debugger works slightly differently than previous releases, so
our screen might not look exactly like what we show here. Also, if we get stuck at any
time, we can type h for help, or look at perldebug.
The debugger shows each line of code before it executes it. That means that, at this
point, we’re about to invoke the autovivification, and we’ve got our keys established.
The s command single-steps the program, while the x command dumps a list of values
in a nice format. We can see that $source, $destination, and $bytes are correct, and
now it’s time to update the data:


运行结果:

Also => 1
Each => 1
That => 1
The => 2
We => 1
a => 2
about => 1
and => 3
any => 1
are => 1
at => 3
autovivification => 1
before => 1
bytes => 1
can => 2
code => 1
command => 2
correct => 1
data => 1
debugger => 2
destination => 1
differently => 1
dumps => 1
each => 1
established => 1
exactly => 1
executes => 1
for => 1
format => 1
get => 1
got => 1
h => 1
help => 1
here => 1
if => 1
in => 1
invoke => 1
it => 3
keys => 1
like => 1
line => 1
list => 1
look => 2
means => 1
might => 1
new => 1
nice => 1
not => 1
now => 1
of => 3
or => 1
our => 2
perldebug => 1
point => 1
previous => 1
program => 1
re => 1
release => 1
releases => 1
s => 2
screen => 1
see => 1
show => 1
shows => 1
single => 1
slightly => 1
so => 1
source => 1
steps => 1
stuck => 1
than => 1
that => 2
the => 5
this => 1
time => 2
to => 2
type => 1
update => 1
values => 1
ve => 1
we => 5
what => 1
while => 1
works => 1
x => 1




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章