How can convert character “%xx” in html using Perl

Submitted a question on StackOverflow just now. 

I intended to extract content from a web page which contains many unicode characters represented in the form of "%xx". As I used Perl module LWP to get web page, naturally handled these unicode characters using Perl Regex as below.

my $html = "%20%26%40 ";
$html =~ s#%([0-9a-f]+)#\x{\1}#ig;
print "$html\n";

But above code dosen't work, it output nothing but "00". Get stuck now ... Any hint would be appreciated.



Some people replied very quickly. Below are their answers. 

Perl has functions built in the URI::Escape module for this already. You don't need to mess with regular expressions

use URI::Escape;
my $encode = uri_unescape($string);

See this page for more

Funny and ugly code :

my $html = "%20%26%40 ";
$html =~ s#%([0-9a-f]{2})#"chr(0x$1)"#igee;
print "$html\n";

Edit : (I'm obliged to say) this code is maybe cute, but do not use this in production ! (there are many cases where it's not working)


You can observe all the discussion here  http://stackoverflow.com/questions/12144401/how-can-convert-character-xx-in-html-using-perl.

I should say StackOverflow is indeed a great place for technical people:-)


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章