issues with a parser-script: running Perl and www::mechanize - Joomla! Forum - community, help and support
good evening dear friends here @ lounge
well - come joomla-questions. today have interesting question.
well - have troubles perl-script turns out not 100% optimal. tryin find better solution either in perl or ruby - if have ideas re-work perl-script. glad too.
the question: there way specify net::telnet timeout www::mechanize::firefox?
at moment internet connection [a quite fast dsl one] slow , error
[php]
with $mech->get():
command timed-out @ /usr/local/share/perl/5.12.3/mozrepl/client.pm line 186[/php]
[php]see one: $mech->repl->repl->timeout(100000);
[/php]
unfortunatly not work: can't locate object method "timeout" via package "mozrepl"
documentation says should:
[php]$mech->repl->repl->setup_client( { extra_client_args => { timeout => 1 +80 } } );
[/php]
problem: have list of 2500 websites , need grab thumbnail screenshot (!) of them. how do that?
i try parse sites either perl.- mechanize thing.
note: need results thumbnails maximum 240 pixels in long dimension.
at moment have solution slow , not give thumbnails:
how make script running faster less overhead - spiting out thumbnails
[php]
my prerequisites: addon/mozrepl/
the module www::mechanize::firefox;
the module imager
[/php]
this source ... see snippet [example]of sites have in url-list.
urls.txt [the list of sources in file]
www.google.com
www.cnn.com
www.msnbc.com
news.bbc.co.uk
www.bing.com
www.yahoo.com - , on , forth...:
what have tried allready; here is:
[php]
#!/usr/bin/perl
use strict;
use warnings;
use www::mechanize::firefox;
my $mech = new www::mechanize::firefox();
open(input, "<urls.txt") or die $!;
while (<input>) {
chomp;
print "$_\n";
$mech->get($_);
$png = $mech->content_as_png();
$name = "$_";
$name =~s/^www\.//;
$name .= ".png";
open(output, ">$name");
print output $png;
sleep (5);
}[/php]
well not care size:
see output commandline:
[php]linux-vi17:/home/martin/perl # perl mecha_test_1.pl
www.google.com
www.cnn.com
www.msnbc.com
command timed-out @ /usr/lib/perl5/site_perl/5.12.3/mozrepl/client.pm line 186
linux-vi17:/home/martin/perl #
[/php]
question: how extend solution either make sure not stop in time out. note again: need results thumbnails maximum 240 pixels in long dimension.
as prerequisites, allready have installed module imager.
how make script running faster less overhead - spiting out thumbnails
i tried out 1 here:
[php]$mech->repl->repl->setup_client( { extra_client_args => { timeout => 5*60 } } );
[/php]
putting links @list , use eval
[php]
while (scalar(@list)) {
$link = pop(@list);
print "trying $link\n";
eval{
$mech->get($link);
sleep (5);
$png = $mech->content_as_png();
$name = "$_";
$name =~s/^www\.//;
$name .= ".png";
open(output, ">$name");
print output $png;
close(output);
}
if ($@){
print "link: $link failed\n";
push(@list,$link);#put end of list
next;
}
print "$link done!\n";
}
[/php]
question: there ruby / python /php-solution runs more efficient - or can suggest perl-solution more stable..
look forward hear you
thx , in advance
well dear joomla-buddies - think taht there many many programmers here.
do have idea!?
have great day
greetings unleash
well - come joomla-questions. today have interesting question.
well - have troubles perl-script turns out not 100% optimal. tryin find better solution either in perl or ruby - if have ideas re-work perl-script. glad too.
the question: there way specify net::telnet timeout www::mechanize::firefox?
at moment internet connection [a quite fast dsl one] slow , error
[php]
with $mech->get():
command timed-out @ /usr/local/share/perl/5.12.3/mozrepl/client.pm line 186[/php]
[php]see one: $mech->repl->repl->timeout(100000);
[/php]
unfortunatly not work: can't locate object method "timeout" via package "mozrepl"
documentation says should:
[php]$mech->repl->repl->setup_client( { extra_client_args => { timeout => 1 +80 } } );
[/php]
problem: have list of 2500 websites , need grab thumbnail screenshot (!) of them. how do that?
i try parse sites either perl.- mechanize thing.
note: need results thumbnails maximum 240 pixels in long dimension.
at moment have solution slow , not give thumbnails:
how make script running faster less overhead - spiting out thumbnails
[php]
my prerequisites: addon/mozrepl/
the module www::mechanize::firefox;
the module imager
[/php]
this source ... see snippet [example]of sites have in url-list.
urls.txt [the list of sources in file]
www.google.com
www.cnn.com
www.msnbc.com
news.bbc.co.uk
www.bing.com
www.yahoo.com - , on , forth...:
what have tried allready; here is:
[php]
#!/usr/bin/perl
use strict;
use warnings;
use www::mechanize::firefox;
my $mech = new www::mechanize::firefox();
open(input, "<urls.txt") or die $!;
while (<input>) {
chomp;
print "$_\n";
$mech->get($_);
$png = $mech->content_as_png();
$name = "$_";
$name =~s/^www\.//;
$name .= ".png";
open(output, ">$name");
print output $png;
sleep (5);
}[/php]
well not care size:
see output commandline:
[php]linux-vi17:/home/martin/perl # perl mecha_test_1.pl
www.google.com
www.cnn.com
www.msnbc.com
command timed-out @ /usr/lib/perl5/site_perl/5.12.3/mozrepl/client.pm line 186
linux-vi17:/home/martin/perl #
[/php]
question: how extend solution either make sure not stop in time out. note again: need results thumbnails maximum 240 pixels in long dimension.
as prerequisites, allready have installed module imager.
how make script running faster less overhead - spiting out thumbnails
i tried out 1 here:
[php]$mech->repl->repl->setup_client( { extra_client_args => { timeout => 5*60 } } );
[/php]
putting links @list , use eval
[php]
while (scalar(@list)) {
$link = pop(@list);
print "trying $link\n";
eval{
$mech->get($link);
sleep (5);
$png = $mech->content_as_png();
$name = "$_";
$name =~s/^www\.//;
$name .= ".png";
open(output, ">$name");
print output $png;
close(output);
}
if ($@){
print "link: $link failed\n";
push(@list,$link);#put end of list
next;
}
print "$link done!\n";
}
[/php]
question: there ruby / python /php-solution runs more efficient - or can suggest perl-solution more stable..
look forward hear you
thx , in advance
well dear joomla-buddies - think taht there many many programmers here.
do have idea!?
have great day
greetings unleash
Comments
Post a Comment