Cheat Sheet – Shortcuts (incl. presenter mode)

Key Function
enter/space/right arrow advance
backspace/left arrow go back
g prompt: go to slide
home/pos1 jump to start slide
end jump to end slide
shift +/- zoom in/out
F11 fullscreen mode
n show/hide presenter notes
t toggle transitions
p open presenter frame
. sneak forward in presenter frame
, sneak backward in presenter frame

Parallel::ForkManager

celogeek Feb 17, 2012

Slides made for "Mongueurs of Paris" show

Contents

Presentation

What is Parallel::ForkManager ?

It's a simple parallel processing fork manager
dLux (Szabó, Balázs)

What can you do with Parallel::ForkManager ?

  • Process faster a high amount of simple job
  • Use the maximum ressource of your server
  • Reduce your process time

How to do that efficiently ?

I will show you cases of usage, it could be usefull for everyday usage.

Case 1: Transform a simple process into multi-process

Case 1

Transform a simple process into a multi-process with Parallel::ForkManager

This is your code:

for my $job(@jobs) {
    compute_this_job($job);
}

This is with Parallel::ForkManager

use Parallel::ForkManager;
my $MAX_PROCESS = 4; #number of your CPUs for example
my $pfm = Parallel::ForkManager->new($MAX_PROCESS);
for my $job(@jobs) {
    $pfm->start and next;
    compute_this_job($job);
    $pfm->finish;
}
$pfm->wait_all_children;

Simple ? Yeah, let's do a bit more

Case 2: Get back result from your fork

Case 2

Compute your job and return the result to the parent to do furthermore

run_on_finish method:

my $pfm = p_fork; #return a Parallel::ForkManager instance
my $result = {};
$pm->run_on_finish(sub {
    my ($pid, $exit, $id, $exit_signal, $core_dump, $data) = @_;
    croak "Failed to process on one job, stop here !" 
        if $exit || $exit_signal;
    $result->{$id} = $data;
});
for my $job(@jobs) {
    $pfm->start($job->{id}) and next;
    my $job_result = compute_this_job($job);
    my $job_error = ref $job_result eq 'HASH' ? 0 : 1;
    $pfm->finish($job_error, $job_result);
}
$pfm->wait_all_children;
do_more_stuff_with($result);

Great ! I want database connection now !

Case 3: Handle DBI connexion

Case 3

Issue with DBI and how to handle it !

Parent lose connection if sons use it :

If you use the parent connection in a fork,
your SQL server will close the parent one.

What can I do if I need it in the parent and sons ?

Solution :

Directly remove the parent connection just after a fork.

Connect the son with a fresh connection.

Missing to remove connection disconnect the parent:

If you forget to remove the parent connection in sons, the autodestroy of Perl will call the DBI autodestroy which close the parent connection.

Solution:

Directly remove the parent connection just after a fork.

Let's see an example: (sons don't need connection)

my ($pfm, $MAX_PROCESS) = p_fork; #return a Parallel::ForkManager instance
my $result = {};
#here the run_on_finish stuff
my $conn = get_my_dbi_connection();
$conn->prepare("SQL to fetch stuff");
$conn->execute;
while(my $job = $conn->fetchrow_hashref) {
    $pfm->start($job->{id}) and next;
    my $job_result = compute_this_job($job);
    my $job_error = ref $job_result eq 'HASH' ? 0 : 1;
    $pfm->finish($job_error, $job_result);
}
$pfm->wait_all_children;
$conn->finish;
do_more_stuff_with($result);

You will loose your parent connection.

Let's see how to remove properly the parent connection:

#...
$pfm->start($job->{id}) and next;
if ($MAX_PROCESS) { #if we have fork
    $conn->{InactiveDestroy} = 1;
    $conn = undef;
}
#connection is useless, 
#autodestroy of DBI is call
#InactiveDestroy is made for Fork, 
#and let the connection intact

If you need a connection, just do this:

#...
$pfm->start($job->{id}) and next;
if ($MAX_PROCESS) { #if we have fork
    $conn->{InactiveDestroy} = 1;
    $conn = get_my_new_dbi_connection();
}
#autodestroy of DBI is call for previous connection

Great ! Now how to handle my ressource properly ?

Case 4: Compute the MAX_PROCESS to use efficiently ressources

Case 4

Why and how to compute MAX_PROCESS properly ?

Why ?

You want to use all your CPU

You want to avoid swapping, because it could lead to an out of memory or slow down your process

How ?

use strict; use warnings;
use Parallel::ForkManager;
use Sys::Info; use Sys::Statistics::Linux::MemStats;
use 5.010; #for new feature
sub p_fork {
    #minimum require memory for your process
    my ($min_mem) = @_; # default 1 Go
    $min_mem //= 1024 ** 2; #1 GO => expr in Kb
    # get number of cpus on the machine
    my $cpu_info = Sys::Info->new;
    my $cpu = $cpu_info->device('CPU');
    my $MAX_PROCESSES_FOR_CPU = $cpu->count || 1;
    # get real free mem in KB
    my $freemem = Sys::Statistics::Linux::MemStats->new->get->{realfree};
    # 3GB by fork max
    my $MAX_PROCESSES_FOR_MEM = int($freemem / ($min_mem));
    # get the min between cpu and memory slot, 
    # 0 mean no fork because not enough memory
    my $MAX_PROCESSES = 
          min($MAX_PROCESSES_FOR_CPU, $MAX_PROCESSES_FOR_MEM);
    # return the process, ready to use
    my $pm = new Parallel::ForkManager($MAX_PROCESSES);
    wantarray and return ($pm, $MAX_PROCESSES) or return $pm;
}
Warning !

!!! WARNING !!!

Don't fork if your job is too short :

Fork is time and memory consuming.

If your job take less a second, regroup many jobs together and fork once.

Let's see an example:

#return a Parallel::ForkManager instance
my ($pfm,$MAX_PROCESS) = p_fork;
my $result = {};
#here the run_on_finish stuff
my $conn = get_my_dbi_connection();
$conn->prepare("SQL to fetch stuff"); $conn->execute;
my @jobs = $conn->fetchall_arrayref; $conn->finish;
my $step = max(50, @jobs / $MAX_PROCESS + 1); #min 50 job per pack
for(my $job_start = 0; $job_start < @jobs, $job_start += $step) {
    $pfm->start($job->{id}) and next;
    if ($MAX_PROCESS) { #if we have fork
        $conn->{InactiveDestroy} = 1; 
        $conn = undef;
    }
    my $job_result = {};
    for(my $job_current = $job_start; 
            $job_current <= min(@jobs - 1,$job_start + $step); 
            $job_current ++) { 
        my $job = $jobs[$job_current];
        my $job_result->{$job->{id}} = compute_this_job($job)
        or croak "Error !";
    }
    $pfm->finish(0, $job_result);
}
$pfm->wait_all_children;
do_more_stuff_with($result);
Thank you