Message passing is one way of distributing work to multiple processes that run indepentdently and concurrently on either a single computer or a cluster of computers. The processes, which are designated to start up in the software and are run by the operating system of the computer, serve as the processing units. This type of parallel programming has been used for quite some time and the software libraries that make it available follow a standard called Message Passing Interface, or MPI.
One feature of MPI programming is that one single program designates what all the various processes will do– a single program runs on all the processes. Each process has a number or rank, and the value of the rank is used in the code to determine what each process will do. In the following code, the process numbered 0 does some additional work that the other processes do not do. This is very common in message passing solutions, and process 0 is often referred to as the master, and the other processes are the workers. In the code below, look for three places where a block of code starts with this line:
if (rank == MASTER) {
This is where the master is doing special work that only needs to be done once by one process. In this case, it is the initialization of the arrays at the beginning of the computation, the check to ensure accuracy after the main computation of vector addition is completed, and freeing the memory for the arrays at the end.
The MPI syntax in this code takes some getting used to, but you should see the pattern of how the data decomposition is occuring for a process running this code:
The following code contains comments with these numbered steps where they occur. This is the file VectorAdd/MPI/VA-MPI-simple.c in the compressed tar file of examples that accompanies this reading.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | /*
* Prerequisties:
* This code runs using an MPI library, either OpenMPI or MPICH2.
* These libraries can be installed in either a cluster of computers
* or a multicore machine.
*
* How to compile:
* mpicc -o vec-add VA-MPI-simple.c
*
* How to execute:
* mpirun -np 2 ./vec-add
*
* Note that this executes the code on 2 processes, using the -np command line flag.
* See ideas for further exploration of MPI using this code at the end of this file.
*/
#include "mpi.h" // must have a system with an MPI library
#include <stdio.h> //printf
#include <stdlib.h> //malloc
/*
* Definitions
*/
#define MASTER 0 //One process will take care of initialization
#define ARRAY_SIZE 8 //Size of arrays that will be added together.
/*
* In MPI programs, the main function for the program is run on every
* process that gets initialized when you start up this code using mpirun.
*/
int main (int argc, char *argv[])
{
// elements of arrays a and b will be added
// and placed in array c
int * a;
int * b;
int * c;
int total_proc; // total nuber of processes
int rank; // rank of each process
int n_per_proc; // elements per process
int n = ARRAY_SIZE; // number of array elements
int i; // loop index
MPI_Status status; // not used in this arguably poor example
// that is devoid of error checking.
// 1. Initialization of MPI environment
MPI_Init (&argc, &argv);
MPI_Comm_size (MPI_COMM_WORLD, &total_proc);
// 2. Now you know the total number of processes running in parallel
MPI_Comm_rank (MPI_COMM_WORLD,&rank);
// 3. Now you know the rank of the current process
// Smaller arrays that will be held on each separate process
int * ap;
int * bp;
int * cp;
// 4. We choose process rank 0 to be the root, or master,
// which will be used to initialize the full arrays.
if (rank == MASTER) {
a = (int *) malloc(sizeof(int)*n);
b = (int *) malloc(sizeof(int)*n);
c = (int *) malloc(sizeof(int)*n);
// initialize arrays a and b with consecutive integer values
// as a simple example
for(i=0;i<n;i++)
a[i] = i;
for(i=0;i<n;i++)
b[i] = i;
}
// All processes take part in the calculations concurrently
// determine how many elements each process will work on
n_per_proc = n/total_proc;
/////// NOTE:
// In this simple version, the number of processes needs to
// divide evenly into the number of elements in the array
///////////
// 5. Initialize my smaller subsections of the larger array
ap = (int *) malloc(sizeof(int)*n_per_proc);
bp = (int *) malloc(sizeof(int)*n_per_proc);
cp = (int *) malloc(sizeof(int)*n_per_proc);
// 6.
//scattering array a from MASTER node out to the other nodes
MPI_Scatter(a, n_per_proc, MPI_INT, ap, n_per_proc, MPI_INT, MASTER, MPI_COMM_WORLD);
//scattering array b from MASTER node out to the other node
MPI_Scatter(b, n_per_proc, MPI_INT, bp, n_per_proc, MPI_INT, MASTER, MPI_COMM_WORLD);
// 7. Compute the addition of elements in my subsection of the array
for(i=0;i<n_per_proc;i++)
cp[i] = ap[i]+bp[i];
// 8. MASTER node gathering array c from the workers
MPI_Gather(cp, n_per_proc, MPI_INT, c, n_per_proc, MPI_INT, MASTER, MPI_COMM_WORLD);
/////////////////////// all concurrent processes are finished once they all communicate
/////////////////////// data back to the master via the gather function.
// Master process gets to here only when it has been able to gather from all processes
if (rank == MASTER) {
// sanity check the result (a test we would eventually leave out)
int good = 1;
for(i=0;i<n;i++) {
//printf ("%d ", c[i]);
if (c[i] != a[i] + b[i]) {
printf("problem at index %lld\n", i);
good = 0;
break;
}
}
if (good) {
printf ("Values correct!\n");
}
}
// clean up memory
if (rank == MASTER) {
free(a); free(b); free(c);
}
free(ap); free(bp); free(cp);
// 9. Terminate MPI Environment and Processes
MPI_Finalize();
return 0;
}
|