Feb 17, 2012
Slides made for "Mongueurs of Paris" show
It's a simple parallel processing fork manager
dLux (Szabó, Balázs)
I will show you cases of usage, it could be usefull for everyday usage.
Transform a simple process into a multi-process with Parallel::ForkManager
This is your code:
for my $job(@jobs) {
compute_this_job($job);
}
This is with Parallel::ForkManager
use Parallel::ForkManager;
my $MAX_PROCESS = 4; #number of your CPUs for example
my $pfm = Parallel::ForkManager->new($MAX_PROCESS);
for my $job(@jobs) {
$pfm->start and next;
compute_this_job($job);
$pfm->finish;
}
$pfm->wait_all_children;
Compute your job and return the result to the parent to do furthermore
run_on_finish method:
my $pfm = p_fork; #return a Parallel::ForkManager instance
my $result = {};
$pm->run_on_finish(sub {
my ($pid, $exit, $id, $exit_signal, $core_dump, $data) = @_;
croak "Failed to process on one job, stop here !"
if $exit || $exit_signal;
$result->{$id} = $data;
});
for my $job(@jobs) {
$pfm->start($job->{id}) and next;
my $job_result = compute_this_job($job);
my $job_error = ref $job_result eq 'HASH' ? 0 : 1;
$pfm->finish($job_error, $job_result);
}
$pfm->wait_all_children;
do_more_stuff_with($result);
Issue with DBI and how to handle it !
What kind of problem can I have with DBI and fork ?
How can I solve this?
Let's see an example (children don't need connection):
#return a Parallel::ForkManager instance
my ($pfm, $MAX_PROCESS) = p_fork;
my $result = {};
#here the run_on_finish stuff
my $conn = get_my_dbi_connection();
$conn->prepare("SQL to fetch stuff");
$conn->execute;
while(my $job = $conn->fetchrow_hashref) {
$pfm->start($job->{id}) and next;
my $job_result = compute_this_job($job);
my $job_error = ref $job_result eq 'HASH' ? 0 : 1;
$pfm->finish($job_error, $job_result);
}
$pfm->wait_all_children;
$conn->finish;
do_more_stuff_with($result);
You will loose your parent connection.
Let's see how to remove properly the parent connection:
#...
$pfm->start($job->{id}) and next;
if ($MAX_PROCESS) { #if we have fork
$conn->{InactiveDestroy} = 1;
$conn = undef;
}
#connection is useless,
#autodestroy of DBI is call
#InactiveDestroy is made for Fork,
#and let the connection intact
If you need a connection, just do this:
#...
$pfm->start($job->{id}) and next;
if ($MAX_PROCESS) { #if we have fork
$conn->{InactiveDestroy} = 1;
$conn = get_my_new_dbi_connection();
}
#autodestroy of DBI is call for previous connection
Compute MAX_PROCESS properly
Why ?
Let see how to do this
The p_fork method:
use strict; use warnings;
use Parallel::ForkManager;
use Sys::Info; use Sys::Statistics::Linux::MemStats;
use 5.010; #for new feature
sub p_fork {
#minimum require memory for your process
my ($min_mem) = @_; # default 1 Go
$min_mem //= 1024 ** 2; #1 GO => expr in Kb
# get number of cpus on the machine
my $cpu_info = Sys::Info->new;
my $cpu = $cpu_info->device('CPU');
my $MAX_PROCESSES_FOR_CPU = $cpu->count || 1;
# get real free mem in KB
my $freemem = Sys::Statistics::Linux::MemStats->new->get->{realfree};
# 3GB by fork max
my $MAX_PROCESSES_FOR_MEM = int($freemem / ($min_mem));
# get the min between cpu and memory slot,
# 0 mean no fork because not enough memory
my $MAX_PROCESSES =
min($MAX_PROCESSES_FOR_CPU, $MAX_PROCESSES_FOR_MEM);
# return the process, ready to use
my $pm = new Parallel::ForkManager($MAX_PROCESSES);
wantarray and return ($pm, $MAX_PROCESSES) or return $pm;
}
Don't fork if your job is too short :
#return a Parallel::ForkManager instance
my ($pfm,$MAX_PROCESS) = p_fork;
my $result = {};
#here the run_on_finish stuff
my $conn = get_my_dbi_connection();
$conn->prepare("SQL to fetch stuff"); $conn->execute;
my @jobs = $conn->fetchall_arrayref; $conn->finish;
my $step = max(50, @jobs / $MAX_PROCESS + 1); #min 50 job per pack
for(my $job_start = 0; $job_start < @jobs, $job_start += $step) {
$pfm->start($job->{id}) and next;
if ($MAX_PROCESS) { #if we have fork
$conn->{InactiveDestroy} = 1;
$conn = undef;
}
my $job_result = {};
for(my $job_current = $job_start;
$job_current <= min(@jobs - 1,$job_start + $step);
$job_current ++) {
my $job = $jobs[$job_current];
my $job_result->{$job->{id}} = compute_this_job($job)
or croak "Error !";
}
$pfm->finish(0, $job_result);
}
$pfm->wait_all_children;
do_more_stuff_with($result);
Feb 17, 2012