Proposed HPF Extension for Reduction Operations in Independent Loops 

Rob Schreiber and Chuck Koelbel, and Joel Saltz 

September 8, 1995


1. Rationale

It is often the case that a data parallel computation cannot be expressed 
in HPF as an INDEPENDENT loop because several of the loop iterations 
update one or more variables. In such cases, parallelism is possible 
because the order of updates is immaterial to the final result. This is 
most often the case with accumulations, such as the following loop. 

DO I = 1, 1000000000
X = X + COMPLICATED_FCN(I)
END DO

This loop can run in parallel as long as its iterations make their 
modifications to the shared variable (X) in an atomic manner. Alternately, 
the loop can be run in parallel by making updates to temporary copies of 
X, with a (short) final phase to merge these changes. In either case, the 
computation is conceptually parallel but not allowed by the original 
definition of INDEPENDENT.

When programming this kind of computation to run in parallel in HPF 
Version 1, the programmer must store update information in a temporary 
array whose size is equal to the number of loop iterations, and then use a 
reduction intrinsic or XXX_SCATTER library function after the loop. The 
problems with this approach are twofold: the temporary array may be 
excessively large; and the reduction operation has to be intrinsic.

For these reasons, we have added a reduction feature to HPF that allows 
loops that update shared variables (shared means that they are updated by 
more than one loop iteration) to be labeled as independent. Interestingly, 
with this extension it is feasible to write general purpose _SCATTER and 
reduction functions, that can work with user-defined operators.

2. Technical Proposal

2.1 Add a REDUCE Directive

H499 reduce-directive	is !HPF$ REDUCE

Constraint: The first non-comment, non-blank line following a 
reduce-directive an assignment-stmt of a special form, called a reduction 
statement. 

2.2 Reduction Statments

H498 reduction-stmt	is assignment-stmt

A reduction statement is any statement immediately preceded by a reduce 
directive. 

Constraint: The form of a reduction-stmt must be 

reduction-variable = reduction-variable reduction-op expr 

or
reduction-variable = &
reduction-function(reduction-variable, expr) 

Constraint: The two reduction-variable references that occur in a 
reduction statement must be lexically identical. 

H498.1 reduction-variable is variable

H498.2 reduction-op	is	intrinsic-op
or defined-binary-op

H498.3 reduction-function is function-name 

Constraint: A reduction-function must be PURE. 

Constraint: A reduction-op may not be a rel-op; and it may not be power-op 
(i.e. **). 

Constraint: A reduction-stmt must occur in the range of an INDEPENDENT DO 
loop in which the reduction variable does not have the NEW attribute. 
Addition constraints on the location of a reduction statement are covered 
in Section 2.4. 

The operator or function that follows the reduction variable reference 
after the = sign is the reduction operator. 

Any variable that appears on the left hand side of a reduction statement 
is said to be a reduction variable. 

2.3 Allow a reduction-clause in the independent directive: 

H413 independent-directive is INDEPENDENT [, new-clause ] [, 
reduction-list] 

H413.1 reduction-clause	is
REDUCTION(reduction-variable-list; COMBINE = combining-op; IDENTITY = 
value) 

H413.2 combining-op	is	intrinsic-op
or defined-binary-op
or reduction-function

Constraint: The combining operator must be a binary operator or a function 
of two arguments, defined for two objects of the type of each of the the 
reduction variables, and producing an object of that same type (it may not 
be a rel-op); it may not be the power operator **. The only intrinsic 
operators and functions that are allowed are commutative; they are listed 
below (in Section 3).

Constraint: The value in the identity clause must be an expression 
conformable with each of the reduction variables, and for which the 
assignment var = value is well-defined for each of the reduction 
variables; or it may the name of a PURE function of one argument that can 
accept every member of the reduction variable list as an argument and that 
produceds a result conformable with its argument and for which the 
assignment var = value(var) is well-defined.

The reduction clause is mandatory for user-defined reduction operators; it 
is optional for intrinsic operators applied to a reduction variable of 
intrinsic type. 

In either case, it must appear in the correct INDEPENDENT directive. This 
is the INDEPENDENT directive that labels the outermost DO construct in 
which the
reduction variable does not occur in a NEW clause, and which contains no 
INDEPENDENT directive in which the reduction variable occurs in a NEW 
clause. See Section 2.4.

Examples:

There may be more than one reduction clause in the INDEPENDENT directive: 

!HPF$ INDEPENDENT, REDUCTION(X, COMBINE=+, IDENTITY=0), & !HPF$ & 
REDUCTION(Y, COMBINE=MAX, IDENTITY= (-HUGE(Y))) DO I = 1, N
!HPF$ REDUCE
X = X + A(I)
!HPF$ REDUCE
Y = MAX(Y,A(I))
END DO

The reduction variable may be of derived type, and the reduction operator 
may be an overloaded operator:

interface operator (+)
type (bignum) function rank0_big_plus(x,y) type (bignum) x,y
end function rank0_big_plus
end interface

interface zero
real function rank0_real_zero(x)
real x
end function rank0_real_zero

real function rank1_real_zero(x)
real x(:), rank1_real_zero(size(x))
end function rank1_real_zero

double precision function rank0_dbl_zero(x) double precision x
end function rank0_dbl_zero

type (bignum) function rank0_big_zero(x) type (bignum) x
end function rank0_big_zero
end interface

real x0, x1(10)
double precision d
type (bignum) b

!HPF$ INDEPENDENT, REDUCTION(x0, x1, d, b; combine = +; identity = zero) 
do ...
...
!hpf$ reduce
b = b + <bignum-expr>
...
!hpf$ reduce
b = b - <bignum-expr>
enddo


2.4 Restrictions on the Lexical Location of Reduction Statements, and on 
the Use of Reduction Variables.

In this discussion, we use the term *range* (of a DO construct) and 
*active* (an active DO construct) in their technical sense as defined by 
Fortran 90. The range of a DO construct consists of the statements 
lexically bounded by the DO statement and the terminal statement of the 
construct. Thus, a subprogram invoked from inside a DO loop is not in its 
range, although the CALL statement or function reference that invokes it 
is. On the other hand, the DO construct is active from the time the DO 
statement is encountered until the time that the matching execution of the 
terminal statement is executed, or the program stops, or control is 
transferred out of the construct in one of the other ways allowed for 
INDEPENDENT loops (see page 302 of the Handbook and Section 4.4 of the HPF 
Standard.) 

When a reduction operation is executed, some nest of DO constructs will be 
active; this is guaranteed by the third constraint in Section 2.2. The 
reduction variable is said to be *protected* in some, and possibly all of 
the active nest; the rule is this: it is protected in the outermost DO 
construct that is INDEPENDENT, has no NEW clause that names the reduction 
variable, and does not *contain* an independent
directive in which the reduction variable occurs in a NEW clause (see 
Example 2, below). It is protected in all of this DO construct, including 
any other control structures nested within it. 

HPF requires that any reduction statment occur in the range of the DO 
construct in which its variable is protected: it must be lexically 
contained in the body of the outermost loop in which it is protected. 

A protected reduction variable may only occur in reduction statments, and 
only in references to the left of the assignment operator and the matching 
reference to the right of the assignment operator. Outside of the loop in 
which it is protected, however, it is treated as an ordinary variable.

2.4.1 Example loop nests.

When loops are nested, a reduction variable may be protected in an 
independent outer loop even though the reduction operations in which it 
occurs are nested inside an inner loop. Moreover, the inner loop and any 
intervening loops may or may not be independent. 

! Nested Loop Example 1. Inner loop is sequential 

x = 10
outer: do while (x < 1 000 000)
!hpf$ independent
middle: do i = 1, n
inner: do j = 1, m
!hpf$ reduce
x = x + j
!	illegal to refer to x here, except in a reduction statement
enddo inner
!	illegal to refer to x here, except in a reduction statement
enddo middle
print *, x
enddo outer

Since the variable x occurs in a reduction statement in the body on the 
independent
middle loop, it is a protected reduction variable throughout that loop, 
including inside the inner loop. The outermost loop is not INDEPENDENT, 
and so x is not protected in that part of its range not included in the 
middle loop. 

This remains true if the inner loop is independent. On the other hand, a 
variable that occurs in a new clause may not be a reduction variable in 
the body of a loop, although it may be in the body of a contained loop:

! Nested Loop Example 2. Outer loop new clause. 

!hpf$ independent
outer: do k = 1, 100
!hpf$ independent, new x
middle: do i = 1, n
x = 10
!hpf$ independent
inner: do j = 1, m
!hpf$ reduce
x = x + j**2
! illegal to refer to x here, except in a reduction statement enddo inner
y(i) = x
enddo middle
enddo outer

Here, x is a protected reduction variable only in the inner loop. 


The reduction statment may not occur in a subprogram called from the body 
of the independent loop. To get this effect, return the updating value 
from the subprogram and perform the update with a reduction statement in 
the lexical loop body.

program example
integer x, xinc, i

x = 10
!hpf$ independent, new(xinc), reduction(x, combine=+, identity = 0) do i = 
1, n
call sub(i, xinc)
!hpf$ reduce
x = x + xinc
enddo

subroutine sub(i, xinc)
integer i, xinc, local_b
local_b = (i-1)**2
xinc = i * local_b
end subroutine sub

2.5 Relaxation of the HPF Standard's requirement for INDEPENDENT loops. 

In Section 4.4, the first proviso on what may not occur in the loop is 
modified to: 

* Any two operations that assign to the same object interfere with each 
other,
(begin added text)
unless the object is assigned a new value in a reduction statement. (end 
added text)

Thus, the user's assertion implied by INDEPENDENT about the behavior of 
the program is weakened, and no longer applies to reduction variables that 
are protected in the DO construct. The value of such a variable on exit 
from the loop is processor dependent, although it is constrained by the 
model implementation mechanism given in Section 3. 


2.6 Reduction Operators May Be Mixed.

In most cases, only one operator will be used in the reduction statements 
(if there are more than one) that update a given reduction variable. It is 
sensible, however, to use different operators, such as + and -, together 
on the same reduction variable: mathematically, subtraction is just 
addition of the additive inverse. The language makes no restriction on the 
mix of operators the programmer chooses to use.

3. Implementation and Semantics

Following an independent loop, the value of any reduction variable that 
was protected in the loop is not completely defined by HPF. One possible 
value is that which would have been computed by sequential execution of 
the loop, but others are also possible.

While HPF does not explicitly define
the set of possible values, it does give a precise implementation 
mechanism that serves to define this set. 

Since no reference to a protected reduction variable can occur outside of 
a reduction statement in the loop body, it is not necessary to define the 
values that these variables may have inside the loop. 

In this discussion, the term processor means a single physical processor 
OR a group of physical processors that together sequentially execute some 
or all of the iterations of an independent loop. 

The value on exit from the loop is to be computed as follows. Each 
processor makes a local copy of the reduction variable, and initializes it 
to the identity value specified in the reduction-clause if any, or the the 
identity element for the intrinsic reduction operator if there is no 
reduction-clause.
The identity elements for the intrinsic operators are as follows: 

Operator	Identity Element
-------	----------------
+	0
-	0
*	1
/	1
.and.	.true.
.or.	.false.
.eqv.	.true.
.neqv.	.false.

//	''

The following intrinsics may be used as reduction-functions, with the 
indicated identity elements:


Operator	Identity Element
-------	----------------
IAND(I,J)	(all ones)
IOR(I,J)	0
IEOR(I,J)	0
MIN(X,Y)	-HUGE(X) where x has the same type and kind type
parameter as the reduction variable
MAX(X,Y)	+HUGE(X) where x has the same type and kind type
parameter as the reduction variable
MATMUL(A,B)	The identity matrix of the same shape, type, and kind type
parameter as A, provided the shape of A is square, i.e.
RANK(A) is 2 and SIZE(A,1) is the same as SIZE(A,2). As a
consequence of the form A = MATMUL(A, B) of the reduction
operation, it must also be true that
SIZE(A,2) = SIZE(B,1) = SIZE(B,2).


Each processor performs a subset of the loop iterations; when it 
encounters a reduction statement, it updates its
own copy of the reduction variable. A processor is free to perform its 
loop iterations in any order; furthermore, it may start an iteration, 
suspend work on it, do some or all of the work of other iterations, and 
resume work on the suspended iteration. However, any update of a local 
reduction variable occurs through the execution of a reduction statement, 
and reduction statements *are* executed atomically.

The final value of the reduction variable is computed by combining the 
local values with the value of the global reduction variable on entry to 
the loop, using the combining operator. The ordering of this reduction is 
language processor dependent, just as it is for the intrinsic reduction 
functions (SUM, etc.) 

3.1 Loop nests; protection of reduction variables 

When loops are nested, a reduction variable may be protected in an 
independent outer loop even though the reduction operations in which it 
occurs are nested inside an inner loop. Moreover, the inner loop and any 
intervening loops may or may not be independent. The general rule is that
it is protected while is the *outermost* independent loop in which it does 
not appear in a NEW clause is active.
It is therefore protected throughout all iterations of any independent 
loop in which it occurs in a reduction statment, unless it appears in a 
NEW clause in the independent directive that applies to the loop. 

For example:

! Nested Loop Example 1. Inner loop is sequential 

x = 10
!hpf$ independent
do i = 1, n
do j = 1, m
!hpf$ reduce
x = x + j
!	illegal to refer to x here, except in a reduction statement
enddo
!	illegal to refer to x here, except in a reduction statement
enddo
print *, x

Since the variable x occurs in a reduction statement in the body on the 
independent outer loop, it is a protected reduction variable throughout 
that loop, including inside the inner loop.

This remains true if the inner loop is independent. On the other hand, a 
variable that occurs in a new clause may not be a reduction variable in 
the body of a loop, although it may be in the body of a contained loop:

! Nested Loop Example 2. Outer loop new clause. 

!hpf$ independent, new x
do i = 1, n
x = 10
!hpf$ independent
do j = 1, m
!hpf$ reduce
x = x + j**2
!	illegal to refer to x here, except in a reduction statement
enddo
y(i) = x
enddo
! In contrast to the example above, the value of x is not defined after 
the loop, ! although it can be redefined here

Here, x is a protected reduction variable only in the inner loop. 

The reduction statment may not occur in a subprogram called from the body 
of the independent loop. To get this effect, return the updating value 
from the subprogram and perform the update with a reduction statement in 
the lexical loop body.

program example
integer x, xinc, i

x = 10
!hpf$ independent, new(xinc), reduction(x, combine=+, identity = 0) do i = 
1, n
call sub(i, xinc)
!hpf$ reduce
x = x + xinc
enddo

subroutine sub(i, xinc)
integer i, xinc, local_b
! Since local_b is automatic it is effectively NEW 
local_b = (i-1)**2
xinc = i * local_b
end subroutine sub

4. Examples.

For intrinsic operators and scalars, the reduction-clause is unnecessary. 

!hpf$ independent
do i = 1, n
...
!hpf$ reduce
x = x + <expr1>
...
!hpf$ reduce
x = x - <expr2>
enddo

The reference may be to a scalar array element: 
!hpf$ independent
do i = 1, n
!hpf$ reduce
x(indx(i)) = x(indx(i)) + <expr>
enddo


It may be to a section of an array:

!hpf$ independent
do i = 1, n
!hpf$ reduce
x(i: i+4) = x(i: i+4) + <expr> ! as many as 5 updates to elements of x 
enddo

Note that the compiler, if it distributes iterations of this loop in a 
block-wise manner, will not need to make a private copy of the entire 
array x. 

Consider the loop:

!hpf$ independent
do i = 1, n
!hpf$ reduce
x(ix(i)) = <expr>
enddo

This is syntactically disallowed. The effect that is apparently wanted 
here, namely to store in x(k) some unspecified one of the subset of 
instances of <expr> for which ix(i) == k (a copy_scatter) can be achieved 
by using the second form of reduction statement.

In the following example the user decides to accumulate sparse local 
updates to a distributed dense matrix.

type real_sparse_matrix
integer nrows, ncols, nnz ! shape, number of nonzeros real, pointer :: 
vals(:)
end type real_sparse_matrix

interface operator (+)
function full_plus_sparse(x,y)
real x(:,:)
type (real_sparse_matrix) :: y
end function full_plus_sparse

function sparse_plus_sparse(x,y)
type (real_sparse_matrix) :: x, y
end function sparse_plus_sparse
end interface

function spzeros(nr, nc) result(res)
type (real_sparse_matrix) res
integer nr, nc
res%nrows = nr
res%ncols = nc
res%nnz = 0
allocate(res%vals(0))
end function spzeros

function elt(i, j, x) result(res)
type (real_sparse_matrix) res
integer i,j
real x
res%nrows = nr
res%ncols = nc
res%nnz = 1
allocate(res%vals(1))
res%vals(1) = x
end function elt

real gs(n,n)
!hpf$ distribute (block, block) :: gs
type (real_sparse_matrix) :: s

s = spzeros(n,n)
!hpf$ independent, reduction(s; combine = +, identity = spzeros(n,n)) do i 
= 1, num_updates
...
!hpf$ reduce
s = s + elt(row, col, val) ! adds in elt at position (row, col) enddo
gs = gs + s	! full_plus_sparse


--------------------------------------------------------------------------------
[A response from David Presberg]

Quick reaction concerning the simplest reduction cases after having
only read about 1/6th of the text:

 - it is good that the "reduction-clause" is optional for the simple
cases: "... intrinsic operators applied to a reduction variable of
intrinsic type."  BUT that leaves the status of reductions formed from
(Fortran) intrinsic functions in doubt as read with the mandate for
"reduction-clause" for "...user-defined reduction operators". I.e., no
mention of optional/mandatory for "reduction-function"-using forms. Note
that your first example in section 2.3 uses MAX(...) --- clearly not a
"user-defined reduction operator" nor an "intrinsic operator".

  [<Formalistic-aside> The proper Fortran standardeze phrase
  would be '"intrinsic-operator" of an "intrinsic binary operation"'.
  There is English pseudo-syntax in F90 Section 7.1.2 that covers that
  topic.  It is there because the actual syntax doesn't slice the topic in
  a convenient way to allow talking collectively about all binary
  operators. </Formalistic-aside>]

 - having seen the !HPF$ REDUCTION early in the text, I wondered
immediately about the need for the "reduction-clause" when I
encountered it.  Not having read further ahead (yet) I don't know if
_both_ are required...  But I note that the simplest cases (e.g., that
intrinsic operator case and the one that is the easy generalization
from "x = max( x, expr )" ) are very easily discovered by any compiler
with the technology for doing the rest of HPF.  (I cite an early
1970's automatic parallelizing (extended) Fortran 77 compiler for a
SIMD machine -- David L. and Carl O. stop grimacing.)  Thus for that
simple case, neither the (statement form) directive nor the clause
ought to be required.

 - there is a serious flaw in the design of the "reduction-clause":
the "; IDENTITY = value" phrase immediately leads to some (potential)
semantic difference in the HPF as processed by an HPF system and the
(serial) Fortran as processed by a non-HPF Fortran system.  

==> Please drop that initialization phrase in favor of:
    
    (a) tabulating in the HPF standard the "zero"-element-value for
    each of the intrinsic operators and suitable binary intrinsic
    functions, and
    
    (b) thinking a bit harder about how to convey that same idea for
    "defined-binary-op"s or "reduction-function"s (particularly
    user-defined functions).  Perhaps you'll have to settle on e.g. "0
    of the correct data type" for these cases.

(Is there text later in the discussion about an initial value of the
reduction variable prior to the INDEPENDENT loop with the reduction?
Slap me down if I didn't read far enough before reacting.)


Have fun next week.

-- Pres


--------------------------------------------------------------------------------
[And replies^2 from Rob Schreiber]


        Major items:

         - I am still not comfortable with the " ; IDENTITY = value" clause.
        What if the user codes:

               program AbsurdInitialization
               x=0
        !HPF$ INDEPENDENT, REDUCTION( x; COMBINE = +; IDENTITY = 2. )
               do i=1,100
        !HPF$    REDUCE
                 x = x + log(i)
               enddo
               print *,x
               end

        Should they be allowed to state an absurd "zero" value for an
        intrinsic operation?  
Absurd, hardly.  The semantics are well defined.   It prints a
processor dependent approiximation to 2*number_of_processors() +
sigma(i=1 to 100) log(i)

I have changed it, however, to forbid combine and identity phrases 
(clauses?) in the optional reduction clause for an intrinsic operators.  
Maybe we should just bar the whole REDUCTION clause on the independent 
directive for intrinsic reductions.  Should we define the term "intrinsic 
reduction" in the text?

        (I wonder if the keyword would be more intuitive
        as "ZERO" rather than "IDENTITY".)  
As in, like, 
!HPF$ INDEPENDENT, REDUCTION(X, COMBINE = *, ZERO = 1)
The mind boggles.

        If each processor initializes its
        local value of x to 2., it would seem that the HPF interpretation of
        this code (if run on more than one processor) would give wildly
        different results than a sequential F90 interpretation.
Yes, a very BAD processor dependent approximation to the sequential result.
This is disconcerting, but I dont see any way around this for user defined
operators, except to REQUIRE that the user create a value "foo" for
his operator bar(x,y) that has the property that bar(foo,y) = y for all y.
Should we.

        I agree that you need a hook for the "zero" value of a user-defined
        operator or reduction function, but I think you also need either:

         (a) constraint language about the defined state, _and_ _defining_
        _value_ of the reduction variable prior to entry to the loop nest with
        the reduction, or
Do you mean what I said above about the "identity element" property of the
reduction operator and the identity value?

         (b) prohibit use of the clause for the trivial cases you have
        enumerated in section 3.
Those are the intrinsic cases, where we could certainly make this prohibition.

         - Again, I think you've surrounded the syntax with so many lexical
        restrictions that any undergraduate senior-level compiler course
        project could find and deal with the reductions with absolutely _no_
        directives!!  Could you at least relax the form of the reduction
        statement to include:

          var = expr op var

        and maybe even:

          var = expr1 op var op expr2   ! with constraint that the 2 ops are indentical

No problem here semantically, I guess;   not my call.


         - It would seem that one cannot:

         ...
        !HPF$ REDUCE
                  var = var .op. f(x)
         ...
        !HPF$ REDUCE
                  var = var .op. f(x)
         ...

        Why not?
Oh?  Where is this rule?


         - I think the constraints on types of lists of variables and on other
        things that should agree in type, etc., are a little slack.
Got to show me a bug, however, before I can fix them.


        Bottom line:  For user-defined operators and reduction functions, and
        defined-type reduction variables, the clause on the INDEPENDENT and
        the REDUCTION directive are (probably) necessary.  I think you really
        _only_ have to define *protected* for the other cases without any
        clauses and directives, and then you can do the relaxation of
        constraints on the body of INDEPENDENT for the simple situations and
        not have to require any extra stuff for those simple cases.

Can you be a little clearer?   If i understand you, in the case of
reductions with intrinsic operators, you suggest no reduction clause on
the loop and no reduce directive on the operation.   I like this idea;
however, we have, in that case, to prohibit mixed operators, so that
the combine operator and identity value are implied by the (one class
of) reduction operator/function that occurs in the loop.

Rob