Proposed HPF Extension for Reduction Operations in Independent Loops Rob Schreiber and Chuck Koelbel, and Joel Saltz September 8, 1995 1. Rationale It is often the case that a data parallel computation cannot be expressed in HPF as an INDEPENDENT loop because several of the loop iterations update one or more variables. In such cases, parallelism is possible because the order of updates is immaterial to the final result. This is most often the case with accumulations, such as the following loop. DO I = 1, 1000000000 X = X + COMPLICATED_FCN(I) END DO This loop can run in parallel as long as its iterations make their modifications to the shared variable (X) in an atomic manner. Alternately, the loop can be run in parallel by making updates to temporary copies of X, with a (short) final phase to merge these changes. In either case, the computation is conceptually parallel but not allowed by the original definition of INDEPENDENT. When programming this kind of computation to run in parallel in HPF Version 1, the programmer must store update information in a temporary array whose size is equal to the number of loop iterations, and then use a reduction intrinsic or XXX_SCATTER library function after the loop. The problems with this approach are twofold: the temporary array may be excessively large; and the reduction operation has to be intrinsic. For these reasons, we have added a reduction feature to HPF that allows loops that update shared variables (shared means that they are updated by more than one loop iteration) to be labeled as independent. Interestingly, with this extension it is feasible to write general purpose _SCATTER and reduction functions, that can work with user-defined operators. 2. Technical Proposal 2.1 Add a REDUCE Directive H499 reduce-directive is !HPF$ REDUCE Constraint: The first non-comment, non-blank line following a reduce-directive an assignment-stmt of a special form, called a reduction statement. 2.2 Reduction Statments H498 reduction-stmt is assignment-stmt A reduction statement is any statement immediately preceded by a reduce directive. Constraint: The form of a reduction-stmt must be reduction-variable = reduction-variable reduction-op expr or reduction-variable = & reduction-function(reduction-variable, expr) Constraint: The two reduction-variable references that occur in a reduction statement must be lexically identical. H498.1 reduction-variable is variable H498.2 reduction-op is intrinsic-op or defined-binary-op H498.3 reduction-function is function-name Constraint: A reduction-function must be PURE. Constraint: A reduction-op may not be a rel-op; and it may not be power-op (i.e. **). Constraint: A reduction-stmt must occur in the range of an INDEPENDENT DO loop in which the reduction variable does not have the NEW attribute. Addition constraints on the location of a reduction statement are covered in Section 2.4. The operator or function that follows the reduction variable reference after the = sign is the reduction operator. Any variable that appears on the left hand side of a reduction statement is said to be a reduction variable. 2.3 Allow a reduction-clause in the independent directive: H413 independent-directive is INDEPENDENT [, new-clause ] [, reduction-list] H413.1 reduction-clause is REDUCTION(reduction-variable-list; COMBINE = combining-op; IDENTITY = value) H413.2 combining-op is intrinsic-op or defined-binary-op or reduction-function Constraint: The combining operator must be a binary operator or a function of two arguments, defined for two objects of the type of each of the the reduction variables, and producing an object of that same type (it may not be a rel-op); it may not be the power operator **. The only intrinsic operators and functions that are allowed are commutative; they are listed below (in Section 3). Constraint: The value in the identity clause must be an expression conformable with each of the reduction variables, and for which the assignment var = value is well-defined for each of the reduction variables; or it may the name of a PURE function of one argument that can accept every member of the reduction variable list as an argument and that produceds a result conformable with its argument and for which the assignment var = value(var) is well-defined. The reduction clause is mandatory for user-defined reduction operators; it is optional for intrinsic operators applied to a reduction variable of intrinsic type. In either case, it must appear in the correct INDEPENDENT directive. This is the INDEPENDENT directive that labels the outermost DO construct in which the reduction variable does not occur in a NEW clause, and which contains no INDEPENDENT directive in which the reduction variable occurs in a NEW clause. See Section 2.4. Examples: There may be more than one reduction clause in the INDEPENDENT directive: !HPF$ INDEPENDENT, REDUCTION(X, COMBINE=+, IDENTITY=0), & !HPF$ & REDUCTION(Y, COMBINE=MAX, IDENTITY= (-HUGE(Y))) DO I = 1, N !HPF$ REDUCE X = X + A(I) !HPF$ REDUCE Y = MAX(Y,A(I)) END DO The reduction variable may be of derived type, and the reduction operator may be an overloaded operator: interface operator (+) type (bignum) function rank0_big_plus(x,y) type (bignum) x,y end function rank0_big_plus end interface interface zero real function rank0_real_zero(x) real x end function rank0_real_zero real function rank1_real_zero(x) real x(:), rank1_real_zero(size(x)) end function rank1_real_zero double precision function rank0_dbl_zero(x) double precision x end function rank0_dbl_zero type (bignum) function rank0_big_zero(x) type (bignum) x end function rank0_big_zero end interface real x0, x1(10) double precision d type (bignum) b !HPF$ INDEPENDENT, REDUCTION(x0, x1, d, b; combine = +; identity = zero) do ... ... !hpf$ reduce b = b + ... !hpf$ reduce b = b - enddo 2.4 Restrictions on the Lexical Location of Reduction Statements, and on the Use of Reduction Variables. In this discussion, we use the term *range* (of a DO construct) and *active* (an active DO construct) in their technical sense as defined by Fortran 90. The range of a DO construct consists of the statements lexically bounded by the DO statement and the terminal statement of the construct. Thus, a subprogram invoked from inside a DO loop is not in its range, although the CALL statement or function reference that invokes it is. On the other hand, the DO construct is active from the time the DO statement is encountered until the time that the matching execution of the terminal statement is executed, or the program stops, or control is transferred out of the construct in one of the other ways allowed for INDEPENDENT loops (see page 302 of the Handbook and Section 4.4 of the HPF Standard.) When a reduction operation is executed, some nest of DO constructs will be active; this is guaranteed by the third constraint in Section 2.2. The reduction variable is said to be *protected* in some, and possibly all of the active nest; the rule is this: it is protected in the outermost DO construct that is INDEPENDENT, has no NEW clause that names the reduction variable, and does not *contain* an independent directive in which the reduction variable occurs in a NEW clause (see Example 2, below). It is protected in all of this DO construct, including any other control structures nested within it. HPF requires that any reduction statment occur in the range of the DO construct in which its variable is protected: it must be lexically contained in the body of the outermost loop in which it is protected. A protected reduction variable may only occur in reduction statments, and only in references to the left of the assignment operator and the matching reference to the right of the assignment operator. Outside of the loop in which it is protected, however, it is treated as an ordinary variable. 2.4.1 Example loop nests. When loops are nested, a reduction variable may be protected in an independent outer loop even though the reduction operations in which it occurs are nested inside an inner loop. Moreover, the inner loop and any intervening loops may or may not be independent. ! Nested Loop Example 1. Inner loop is sequential x = 10 outer: do while (x < 1 000 000) !hpf$ independent middle: do i = 1, n inner: do j = 1, m !hpf$ reduce x = x + j ! illegal to refer to x here, except in a reduction statement enddo inner ! illegal to refer to x here, except in a reduction statement enddo middle print *, x enddo outer Since the variable x occurs in a reduction statement in the body on the independent middle loop, it is a protected reduction variable throughout that loop, including inside the inner loop. The outermost loop is not INDEPENDENT, and so x is not protected in that part of its range not included in the middle loop. This remains true if the inner loop is independent. On the other hand, a variable that occurs in a new clause may not be a reduction variable in the body of a loop, although it may be in the body of a contained loop: ! Nested Loop Example 2. Outer loop new clause. !hpf$ independent outer: do k = 1, 100 !hpf$ independent, new x middle: do i = 1, n x = 10 !hpf$ independent inner: do j = 1, m !hpf$ reduce x = x + j**2 ! illegal to refer to x here, except in a reduction statement enddo inner y(i) = x enddo middle enddo outer Here, x is a protected reduction variable only in the inner loop. The reduction statment may not occur in a subprogram called from the body of the independent loop. To get this effect, return the updating value from the subprogram and perform the update with a reduction statement in the lexical loop body. program example integer x, xinc, i x = 10 !hpf$ independent, new(xinc), reduction(x, combine=+, identity = 0) do i = 1, n call sub(i, xinc) !hpf$ reduce x = x + xinc enddo subroutine sub(i, xinc) integer i, xinc, local_b local_b = (i-1)**2 xinc = i * local_b end subroutine sub 2.5 Relaxation of the HPF Standard's requirement for INDEPENDENT loops. In Section 4.4, the first proviso on what may not occur in the loop is modified to: * Any two operations that assign to the same object interfere with each other, (begin added text) unless the object is assigned a new value in a reduction statement. (end added text) Thus, the user's assertion implied by INDEPENDENT about the behavior of the program is weakened, and no longer applies to reduction variables that are protected in the DO construct. The value of such a variable on exit from the loop is processor dependent, although it is constrained by the model implementation mechanism given in Section 3. 2.6 Reduction Operators May Be Mixed. In most cases, only one operator will be used in the reduction statements (if there are more than one) that update a given reduction variable. It is sensible, however, to use different operators, such as + and -, together on the same reduction variable: mathematically, subtraction is just addition of the additive inverse. The language makes no restriction on the mix of operators the programmer chooses to use. 3. Implementation and Semantics Following an independent loop, the value of any reduction variable that was protected in the loop is not completely defined by HPF. One possible value is that which would have been computed by sequential execution of the loop, but others are also possible. While HPF does not explicitly define the set of possible values, it does give a precise implementation mechanism that serves to define this set. Since no reference to a protected reduction variable can occur outside of a reduction statement in the loop body, it is not necessary to define the values that these variables may have inside the loop. In this discussion, the term processor means a single physical processor OR a group of physical processors that together sequentially execute some or all of the iterations of an independent loop. The value on exit from the loop is to be computed as follows. Each processor makes a local copy of the reduction variable, and initializes it to the identity value specified in the reduction-clause if any, or the the identity element for the intrinsic reduction operator if there is no reduction-clause. The identity elements for the intrinsic operators are as follows: Operator Identity Element ------- ---------------- + 0 - 0 * 1 / 1 .and. .true. .or. .false. .eqv. .true. .neqv. .false. // '' The following intrinsics may be used as reduction-functions, with the indicated identity elements: Operator Identity Element ------- ---------------- IAND(I,J) (all ones) IOR(I,J) 0 IEOR(I,J) 0 MIN(X,Y) -HUGE(X) where x has the same type and kind type parameter as the reduction variable MAX(X,Y) +HUGE(X) where x has the same type and kind type parameter as the reduction variable MATMUL(A,B) The identity matrix of the same shape, type, and kind type parameter as A, provided the shape of A is square, i.e. RANK(A) is 2 and SIZE(A,1) is the same as SIZE(A,2). As a consequence of the form A = MATMUL(A, B) of the reduction operation, it must also be true that SIZE(A,2) = SIZE(B,1) = SIZE(B,2). Each processor performs a subset of the loop iterations; when it encounters a reduction statement, it updates its own copy of the reduction variable. A processor is free to perform its loop iterations in any order; furthermore, it may start an iteration, suspend work on it, do some or all of the work of other iterations, and resume work on the suspended iteration. However, any update of a local reduction variable occurs through the execution of a reduction statement, and reduction statements *are* executed atomically. The final value of the reduction variable is computed by combining the local values with the value of the global reduction variable on entry to the loop, using the combining operator. The ordering of this reduction is language processor dependent, just as it is for the intrinsic reduction functions (SUM, etc.) 3.1 Loop nests; protection of reduction variables When loops are nested, a reduction variable may be protected in an independent outer loop even though the reduction operations in which it occurs are nested inside an inner loop. Moreover, the inner loop and any intervening loops may or may not be independent. The general rule is that it is protected while is the *outermost* independent loop in which it does not appear in a NEW clause is active. It is therefore protected throughout all iterations of any independent loop in which it occurs in a reduction statment, unless it appears in a NEW clause in the independent directive that applies to the loop. For example: ! Nested Loop Example 1. Inner loop is sequential x = 10 !hpf$ independent do i = 1, n do j = 1, m !hpf$ reduce x = x + j ! illegal to refer to x here, except in a reduction statement enddo ! illegal to refer to x here, except in a reduction statement enddo print *, x Since the variable x occurs in a reduction statement in the body on the independent outer loop, it is a protected reduction variable throughout that loop, including inside the inner loop. This remains true if the inner loop is independent. On the other hand, a variable that occurs in a new clause may not be a reduction variable in the body of a loop, although it may be in the body of a contained loop: ! Nested Loop Example 2. Outer loop new clause. !hpf$ independent, new x do i = 1, n x = 10 !hpf$ independent do j = 1, m !hpf$ reduce x = x + j**2 ! illegal to refer to x here, except in a reduction statement enddo y(i) = x enddo ! In contrast to the example above, the value of x is not defined after the loop, ! although it can be redefined here Here, x is a protected reduction variable only in the inner loop. The reduction statment may not occur in a subprogram called from the body of the independent loop. To get this effect, return the updating value from the subprogram and perform the update with a reduction statement in the lexical loop body. program example integer x, xinc, i x = 10 !hpf$ independent, new(xinc), reduction(x, combine=+, identity = 0) do i = 1, n call sub(i, xinc) !hpf$ reduce x = x + xinc enddo subroutine sub(i, xinc) integer i, xinc, local_b ! Since local_b is automatic it is effectively NEW local_b = (i-1)**2 xinc = i * local_b end subroutine sub 4. Examples. For intrinsic operators and scalars, the reduction-clause is unnecessary. !hpf$ independent do i = 1, n ... !hpf$ reduce x = x + ... !hpf$ reduce x = x - enddo The reference may be to a scalar array element: !hpf$ independent do i = 1, n !hpf$ reduce x(indx(i)) = x(indx(i)) + enddo It may be to a section of an array: !hpf$ independent do i = 1, n !hpf$ reduce x(i: i+4) = x(i: i+4) + ! as many as 5 updates to elements of x enddo Note that the compiler, if it distributes iterations of this loop in a block-wise manner, will not need to make a private copy of the entire array x. Consider the loop: !hpf$ independent do i = 1, n !hpf$ reduce x(ix(i)) = enddo This is syntactically disallowed. The effect that is apparently wanted here, namely to store in x(k) some unspecified one of the subset of instances of for which ix(i) == k (a copy_scatter) can be achieved by using the second form of reduction statement. In the following example the user decides to accumulate sparse local updates to a distributed dense matrix. type real_sparse_matrix integer nrows, ncols, nnz ! shape, number of nonzeros real, pointer :: vals(:) end type real_sparse_matrix interface operator (+) function full_plus_sparse(x,y) real x(:,:) type (real_sparse_matrix) :: y end function full_plus_sparse function sparse_plus_sparse(x,y) type (real_sparse_matrix) :: x, y end function sparse_plus_sparse end interface function spzeros(nr, nc) result(res) type (real_sparse_matrix) res integer nr, nc res%nrows = nr res%ncols = nc res%nnz = 0 allocate(res%vals(0)) end function spzeros function elt(i, j, x) result(res) type (real_sparse_matrix) res integer i,j real x res%nrows = nr res%ncols = nc res%nnz = 1 allocate(res%vals(1)) res%vals(1) = x end function elt real gs(n,n) !hpf$ distribute (block, block) :: gs type (real_sparse_matrix) :: s s = spzeros(n,n) !hpf$ independent, reduction(s; combine = +, identity = spzeros(n,n)) do i = 1, num_updates ... !hpf$ reduce s = s + elt(row, col, val) ! adds in elt at position (row, col) enddo gs = gs + s ! full_plus_sparse -------------------------------------------------------------------------------- [A response from David Presberg] Quick reaction concerning the simplest reduction cases after having only read about 1/6th of the text: - it is good that the "reduction-clause" is optional for the simple cases: "... intrinsic operators applied to a reduction variable of intrinsic type." BUT that leaves the status of reductions formed from (Fortran) intrinsic functions in doubt as read with the mandate for "reduction-clause" for "...user-defined reduction operators". I.e., no mention of optional/mandatory for "reduction-function"-using forms. Note that your first example in section 2.3 uses MAX(...) --- clearly not a "user-defined reduction operator" nor an "intrinsic operator". [ The proper Fortran standardeze phrase would be '"intrinsic-operator" of an "intrinsic binary operation"'. There is English pseudo-syntax in F90 Section 7.1.2 that covers that topic. It is there because the actual syntax doesn't slice the topic in a convenient way to allow talking collectively about all binary operators. ] - having seen the !HPF$ REDUCTION early in the text, I wondered immediately about the need for the "reduction-clause" when I encountered it. Not having read further ahead (yet) I don't know if _both_ are required... But I note that the simplest cases (e.g., that intrinsic operator case and the one that is the easy generalization from "x = max( x, expr )" ) are very easily discovered by any compiler with the technology for doing the rest of HPF. (I cite an early 1970's automatic parallelizing (extended) Fortran 77 compiler for a SIMD machine -- David L. and Carl O. stop grimacing.) Thus for that simple case, neither the (statement form) directive nor the clause ought to be required. - there is a serious flaw in the design of the "reduction-clause": the "; IDENTITY = value" phrase immediately leads to some (potential) semantic difference in the HPF as processed by an HPF system and the (serial) Fortran as processed by a non-HPF Fortran system. ==> Please drop that initialization phrase in favor of: (a) tabulating in the HPF standard the "zero"-element-value for each of the intrinsic operators and suitable binary intrinsic functions, and (b) thinking a bit harder about how to convey that same idea for "defined-binary-op"s or "reduction-function"s (particularly user-defined functions). Perhaps you'll have to settle on e.g. "0 of the correct data type" for these cases. (Is there text later in the discussion about an initial value of the reduction variable prior to the INDEPENDENT loop with the reduction? Slap me down if I didn't read far enough before reacting.) Have fun next week. -- Pres -------------------------------------------------------------------------------- [And replies^2 from Rob Schreiber] Major items: - I am still not comfortable with the " ; IDENTITY = value" clause. What if the user codes: program AbsurdInitialization x=0 !HPF$ INDEPENDENT, REDUCTION( x; COMBINE = +; IDENTITY = 2. ) do i=1,100 !HPF$ REDUCE x = x + log(i) enddo print *,x end Should they be allowed to state an absurd "zero" value for an intrinsic operation? Absurd, hardly. The semantics are well defined. It prints a processor dependent approiximation to 2*number_of_processors() + sigma(i=1 to 100) log(i) I have changed it, however, to forbid combine and identity phrases (clauses?) in the optional reduction clause for an intrinsic operators. Maybe we should just bar the whole REDUCTION clause on the independent directive for intrinsic reductions. Should we define the term "intrinsic reduction" in the text? (I wonder if the keyword would be more intuitive as "ZERO" rather than "IDENTITY".) As in, like, !HPF$ INDEPENDENT, REDUCTION(X, COMBINE = *, ZERO = 1) The mind boggles. If each processor initializes its local value of x to 2., it would seem that the HPF interpretation of this code (if run on more than one processor) would give wildly different results than a sequential F90 interpretation. Yes, a very BAD processor dependent approximation to the sequential result. This is disconcerting, but I dont see any way around this for user defined operators, except to REQUIRE that the user create a value "foo" for his operator bar(x,y) that has the property that bar(foo,y) = y for all y. Should we. I agree that you need a hook for the "zero" value of a user-defined operator or reduction function, but I think you also need either: (a) constraint language about the defined state, _and_ _defining_ _value_ of the reduction variable prior to entry to the loop nest with the reduction, or Do you mean what I said above about the "identity element" property of the reduction operator and the identity value? (b) prohibit use of the clause for the trivial cases you have enumerated in section 3. Those are the intrinsic cases, where we could certainly make this prohibition. - Again, I think you've surrounded the syntax with so many lexical restrictions that any undergraduate senior-level compiler course project could find and deal with the reductions with absolutely _no_ directives!! Could you at least relax the form of the reduction statement to include: var = expr op var and maybe even: var = expr1 op var op expr2 ! with constraint that the 2 ops are indentical No problem here semantically, I guess; not my call. - It would seem that one cannot: ... !HPF$ REDUCE var = var .op. f(x) ... !HPF$ REDUCE var = var .op. f(x) ... Why not? Oh? Where is this rule? - I think the constraints on types of lists of variables and on other things that should agree in type, etc., are a little slack. Got to show me a bug, however, before I can fix them. Bottom line: For user-defined operators and reduction functions, and defined-type reduction variables, the clause on the INDEPENDENT and the REDUCTION directive are (probably) necessary. I think you really _only_ have to define *protected* for the other cases without any clauses and directives, and then you can do the relaxation of constraints on the body of INDEPENDENT for the simple situations and not have to require any extra stuff for those simple cases. Can you be a little clearer? If i understand you, in the case of reductions with intrinsic operators, you suggest no reduction clause on the loop and no reduce directive on the operation. I like this idea; however, we have, in that case, to prohibit mixed operators, so that the combine operator and identity value are implied by the (one class of) reduction operator/function that occurs in the loop. Rob