To begin, let me do a demonstration of what 'firstprivate' or 'lastprivate' do.
extern int n; // external storage linked to some variable i've defined somewhere else.
void demo_firstprivate(void) {
int i, indx, TID;
int a[n];
for(i = 0; i < n; i++ )
a[i] = -i-1;
indx = 4;
int n1 = 1;
#pragma omp parallel default(none) firstprivate(indx) private(i, TID) shared(n1,a)
{
TID = omp_get_thread_num();
indx += n1*TID;
for( i = indx; i < indx + n1; i++)
a[i] = TID + 1;
}// end of parallel region
printf("After the parallel region:\n");
for( i = 0; i < n; i++ )
printf("a[%d] = %d\n", i, a[i]);
}
The output is
Demo 'firstprivate' clause begin..
After the parallel region:
a[0] = -1
a[1] = -2
a[2] = -3
a[3] = -4
a[4] = 1
a[5] = 2
a[6] = -7
a[7] = -8
a[8] = -9
a[9] = -10
Demo 'firstprivate' clause end..
There are two things you need to realize when using firstprivate
(1) The firstprivate variable is initialized once per thread
(2) In C++, the firstprivate object is constructed by calling its copy constructor with the master thread's copy of the variable as its argument.
Demo of lastprivate is the following code
extern int n;
void demo_lastprivate(void) {
int a = 0;
int i;
#pragma omp parallel for private(i) lastprivate(a)
for(i = 0; i < n; i++) {
a = i + 1;
printf("Thread:%d got value: %d, iteration: %d\n", omp_get_thread_num(), a, i);
}
// End of parallel region
printf("value of 'lastprivate' variable 'a' is %d\n", a);
}
The output is
Demo 'lastprivate' clause begin..
Thread:1 got value: 6, iteration: 5
Thread:0 got value: 1, iteration: 0
Thread:1 got value: 7, iteration: 6
Thread:0 got value: 2, iteration: 1
Thread:1 got value: 8, iteration: 7
Thread:0 got value: 3, iteration: 2
Thread:1 got value: 9, iteration: 8
Thread:1 got value: 10, iteration: 9
Thread:0 got value: 4, iteration: 3
Thread:0 got value: 5, iteration: 4
value of 'lastprivate' variable 'a' is 10
Demo 'lastprivate' clause end..
The caveat in using the lastprivate clause is
(1) If the lastprivate variable is some sort of an array or structure and only some elements or fields are assigned in the last iteration; then after the parallel execution, the elements or fields that were not assigned in the final iteration are undefined.
(2) In C++, this variable/object needs to have its copy assignment operator invoked with the master thread's copy with the sequentially last value of the variable as the argument.
That is, both copy assignment operator and copy constructor must be publicly available otherwise you'll find yourself in quite a fix - basically a complier error will be reflected and depending on your compiler e.g. Xcode, VS, gcc you might be able to figure out why the error is there in the first place.
The last thing i wanted to share is that under OpenMP, variables pointing to heap storage are shared by all threads in the program. So you need to be careful while dealing with memory allocation otherwise you will get runtime errors like
Non-aligned pointer being freed ... or double free
0 comments:
Post a Comment