From chk@erato.cs.rice.edu  Mon Mar 16 13:08:02 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA13226); Mon, 16 Mar 92 13:08:02 CST
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA13052); Mon, 16 Mar 92 13:08:00 CST
Message-Id: <9203161908.AA13052@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Word-Of-The-Day: roustabout : (n) 1. a dock worker  2. a laborer in an oil
	field or refinery  3. a circus worker with a variety of jobs
Subject: Welcome to Working Group 4!
Date: Mon, 16 Mar 92 13:07:59 -0600
From: chk@erato.cs.rice.edu


If you are receiving this message, then you are on the mailing list
for working group 4.  This group is charged with discussing the FORALL
statement, local subroutines, and related issues in High Performance
Fortran.

						Chuck

From chk@erato.cs.rice.edu  Fri Mar 20 11:05:58 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA26964); Fri, 20 Mar 92 11:05:58 CST
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA15777); Fri, 20 Mar 92 11:05:55 CST
Message-Id: <9203201705.AA15777@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu, wu@cs.buffalo.edu
Word-Of-The-Day: campestral : (adj) of or relating to fields or open country
Subject: Working Group 4 - Issues for discussion
Date: Fri, 20 Mar 92 11:05:54 -0600
From: chk@erato.cs.rice.edu


Hello, group -

The following is my first cut at defining the issues for group 4.
Unfortunately, my original notes from the meeting have disappeared.
Please add to this list if you see obvious omissions.  In short, I'm
looking for volunteers to write up position papers outlining the
issues.  These should give us a good basis to work from at the next
meeting and into the future.


Our Charter:  

To examine issues relating to FORALL loops and local subroutines in
HPF.  In particular, the following issues were identified in the
general HPFF meeting:
	Should HPF have local blocks & subroutines?
	Should HPF have control parallelism?
	Should HPF interface with PCF parallel sections, and if so,
	    how?
	Should HPF have local data (i.e. private to a processor)?
	Should HPF have a multistatement FORALL, and if so what should
	    its semantics be?
	Should HPF have explicit processor control, and if so how
	    should this be done?
	Should HPF have SCAN intrinsics?

In addition, the following issues emerged in the working group
meeting:
	Should FORALL have a new semantics (ala CM Fortran FORALL), or
	    should it be an assertion (ala Cray directives)?  Or
	    perhaps both should be supported?
	What is the relationship between FORALLs and PCF DOALL loops?
	What is the relationship between local subroutines (local
	    sections) and PCF PARALLEL SECTIONs?
	Are local sections actually portable to SIMD and MIMD
	    machines?  If not, are there restrictions that will make
	    them portable?
	What is the relationship between local sections and the open
	    interface to other paradigms?


Getting the discussion rolling:

I think the right way to start meaningful electronic discussion is for
someone to volunteer to write a white paper on one or more of the
issues.  (Writing strawman proposals is a variation of this; I prefer
white papers since they tend to make the options explicit.)  Below are
some possible constellations of issues that could be discussed.  I'll
put something together for the first one; will others volunteer for
other tasks?

Paper 1:  Advanced FORALL loops
	Which style of parallel loop do users want?
	    A command: "Do this in parallel, whether safe or not!"
	    A new statement: "Do this in parallel, using XXXX
		semantics to resolve races!"
	    An unchecked assertion: "This is safe to do in parallel,
		trust me!"
	    A checked assertion: "I think this is safe to do in
		parallel, but check it anyway."
	Which of the above can HPF efficiently support on all
	    machines?
	What semantics do users want in a multi-statement FORALL?
	    SIMD semantics (synchronize at every statement)
	    Copy-in, copy-out (Fortran D)
	    Undefined in case of races
	    Partially defined (guarantee serializability, for example)
	Which of the above can HPF efficiently support on all
	    machines?

Paper 2:  Local Sections and Subroutines
	Actually, I think Guy's original proposal was very good.
	Maybe he could just post it, or the latest version if he has
	    made some modifications.

Paper 3:  Relation to PCF
	How do FORALL and DOALL compare?
	How do local subroutines and PARALLEL SECTIONs compare?
	Are the PCF extensions portable to distributed memory and SIMD
	    machines?
	Should HPF adopt any of the PCF extensions?  If so, which ones?
	Should HPF explicitly reject any of the PCF extensions?  If
	    so, which ones?

Paper 4:  Processor Control
	(This one probably can't be done in much detail until Group 2
	    presents their model for data distribution.)
	What do users want in the way of process control?
	    ON clauses for loops
	    SPMD paradigm
	    Inquiry functions for "current processor id" and similar
		information
	    Explicit synchronization
	    Explicit communication
	    Explicit multi-threading on a single processor
	Which of the above can HPF efficiently support on all
	    machines?
	Can processor control be better accomplished by open
	    interfaces to other paradigms?  If so, how can we define
	    the interface so that it is portable?

Paper 5:  Odds and Ends
	Do users want data private to a processor?
	Do users want SCAN intrinsics?
	Do users want parallel I/O?  If so, what do they mean by that?
	Do users want system inquiry functions?  If so, which ones?
	For each "yes" answer to the above, can HPF provide efficient,
	    portable support for that feature?


Optional: Let me know if you're interested in doing some writing.
Mandatory: If you write up a position paper, post it to this list for
discussion.

Hope to hear from you all soon!

						Chuck

From gls@think.com  Fri Mar 27 12:14:23 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA03551); Fri, 27 Mar 92 12:14:23 CST
Received: from mail.think.com by erato.cs.rice.edu (AA18912); Fri, 27 Mar 92 12:14:19 CST
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Fri, 27 Mar 92 13:14:07 -0500
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.0C)
	id AA24385; Fri, 27 Mar 92 13:14:05 EST
Date: Fri, 27 Mar 92 13:14:05 EST
Message-Id: <9203271814.AA24385@strident.think.com>
To: hpff-forall@erato.cs.rice.edu
In-Reply-To: chk@cs.rice.edu's message of Fri, 20 Mar 92 11:05:54 -0600 <9203201705.AA15777@erato.cs.rice.edu>
Subject: Working Group 4: Revised proposal for local subroutines


Attached is a revised proposal for local subroutines.  It specifies
the behavior of local subroutines in the face of distributions other
than BLOCK; addresses the treatment of replicated distributions; and
briefly discusses interactions with Fortran 90 pointers and modules.
--Guy

----------------------------------------------------------------
Proposal for local program units in HPF

Guy L. Steele Jr.
Thinking Machines Corporation
Version of 25 March 1992

The basic idea is to add a new kind of program unit, called a LOCAL
program unit.  If an HPF program is running on multiple processors,
then invocation of a LOCAL program unit causes one copy of the
program unit to be executed within each processor.  The code of a LOCAL
can access any data within the processor using Fortran
array-reference syntax; it can also use array notation, but only to
operate on data within the processor.  To access data outside the
processor requires either preparatory communication before running
the LOCAL code, or the use of communications primitives such as a
message-passing library.  (The nature of such a message-passing
library is outside the scope of this proposal.  Furthermore, local
subroutines have utility even in the absence of interprocessor
communication within local code.)

A LOCAL program unit is not permitted to invoke a non-LOCAL program
unit.  (One consequence of this rule is that if the main program is
LOCAL, then the entire executable program must be LOCAL.  This might
come in handy in some cases or for some architectures!)  A LOCAL
subroutine or function may be invoked from within LOCAL code, with
everything behaving as if it were ordinary Fortran code running on a
single processor.

A transition from global execution to local execution occurs whenever
a LOCAL subroutine is invoked from non-LOCAL (that is, "global") HPF
code.  When this occurs, all global arrays accessible to the
subroutine (notably arrays passed as arguments) are logically carved
up into pieces; the copy of the local subroutine executing on a
particular physical processor sees an array containing just those
elements of the global array that are mapped to that physical
processor.

In the previous draft of this proposal, it was assumed that all
distributions were BLOCK.  This version assumes only that array axes
are mapped independently to axes of a rectangular processor grid,
each arrays axis to at most one processor axis (no "skew"
distributions) and no two array axes to the same processor axis.  But
the mapping of an array axis to a processor axis may be any mapping
whatsoever: block, cyclic, block-cyclic, reversed, vector-indexed, or
whatever.  This restriction suffices to ensure that each physical
processor contains a subset of array eleemnts that can be locally
arranged in a rectangular configuration.  (Of course, to compute the
global indices of an element given its local indices, or vice versa,
may be quite a tangled computation--but it will be possible.)


Now for the details:

Any program unit in an HPF program may have the keyword LOCAL at the
start of its first statement (in the case of a main program, the
PROGRAM statement):

      LOCAL SUBROUTINE PAUL_SIMON(A, B, C)
      LOCAL INTEGER FUNCTION BOY_GEORGE(X)
      LOCAL PROGRAM NELLIE_MELBA

The intent is that such a subprogram unit can be compiled in a
special way.  In a machine (such as the CM-5) that has a central
control processor directing the actions of the mass of individual
parallel processors, *all* the code of a LOCAL program unit runs on
the individual processors; operations that might normally be carried
out in the control processor are all compiled for the individual
processors.  Local subprograms may use array notation.  If the
individual parallel processors have vector hardware, for example,
then one would expect that such per-processor array code would be
compiled into per-processor vector operations, etc.

A local program unit is executed on a processor and executes
asynchronously and independently of all other processors.  With the
exception of returning from a local subroutine to the global caller
that initiated local execution (see below), there is no implicit
synchronization of processors.  So a local program unit may use any
control structure whatsoever.

A local program unit can call a local subprogram (subroutine or
function).  This behaves as an ordinary Fortran subprogram call.

One may also call a local subprogram from a global program unit; this
is how local execution is initiated in the first place.  (If the main
program is LOCAL, then it is as if the many local copies of the main
program had been initiated by an unseen global caller.)  When local
execution is initiated, one subprogram call occurs within each
physical processor.  The following special rules apply:

(1) Scalar parameters are broadcast, so that each processor sees a
copy of the scalar parameter.  However, we preserve the model that
allows parameter passing to be either copy-in/copy-out or by
reference, which has the following consequences.  If more than one
processor assigns to that parameter, then the value seen by the
caller is undefined.  If exactly one processor assigns to that
parameter, then the caller will see that value in the actual
argument, but if any other processor examines the parameter, the
value seen will be undefined.  So it is best to code a local
subroutine in such a way that each scalar parameter is either (a) treated
as read-only, or (b) accessed by exactly one processor.

(2) Array parameters are broken up into subgrids, and the subprogram
on a given processor receives as its argument an array (of the same
rank as the argument) representing the subgrid of elements residing
on that processor.  So if you have a 50x50 global array on a
16-processor machine, using a 4x4 processor layout with BLOCK
distribution on each axis, then each axis of length 50 will be
divided into three blocks of 13 and a final block of 11:

	processor 1:  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13
	processor 2: 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26
	processor 3: 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39
	processor 4: 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50

So most of the processors will receive a 13x13 array as argument;
three will see 13x11 arrays; three will see 11x13 arrays; and one
will see an 11x11 array.  With a CYCLIC distribution on each axis,
then each axis of length 50 would be distributed so that the first
two processors along the axis would receive 13 elements and the last
two processors would receieve 12 elements:

	processor 1:  1,  5,  9, 13, 17, 21, 25, 29, 33, 37, 41, 45, 49
	processor 2:  2,  6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50
	processor 3:  3,  7, 11, 15, 19, 23, 27, 31, 35, 39, 43, 47
	processor 4:  4,  8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48

In this case four processors would receive 13x13 arrays; four
processors would receive 13x12 arrays; four processors would receive
12x13 arrays; and four processors would receive 12x12 arrays.

(3) As a local subprogram on a processor returns to a global caller,
the processor performs a synchronization function.  The result is
that the global caller proceeds only when *all* processors have
completed execution of the local subprogram.  (Following the
synchronization function, arrays with replicated distributions
must be brought up to date--see below.)

(4) In the case of a local function (as opposed to a subroutine), the
result of the call is a one-dimensional array of the results
(assuming the result of the function is nominally scalar).  The
length of this array is equal to the number of physical processors;
the array is distributed one element per processor according to a
BLOCK distribution of the single axis.
In the case of an array-valued local function, the ranks and shapes
of the arrays must be identical, and an extra dimension is appended.
So if you have 128 processors and each returns a 4x2 array, the
result is 4x2x128 and its distribution will be (*,*,BLOCK).
(It is expected that in many cases a call to a local function from
global code will be surrounded by a call to a reduction intrinsic
over the processor axis.)

Within a local subprogram, all the usual array inquiry intrinsics may
be used.  When applied to subgrids resulting from the breakup of a
global parameter, such intrinsics are naturally regarded as applying
to the local subgrid, because that is the array actually visible to
the local routine.  For example:

!GLOBAL CODE
      REAL A(50,50)
      CALL CINDI_LAUPER(A)

      ...

      LOCAL SUBROUTINE CINDI_LAUPER(X)
      REAL, ARRAY(:,:):: X
      N1 = SIZE(X,1)
      L1 = LBOUND(X,1)
      U1 = UBOUND(X,1)
      N2 = SIZE(X,2)
      L2 = LBOUND(X,2)
      U2 = UBOUND(X,2)
      ...

Assume 16 processors again, 4x4, with a CYCLIC distibution on the
first axis and a BLOCK distribution on the second axis.  (This is
expressed under the Thinking Machines directives proposal as

CHPF$PROCESSORS P(4,4)
CHPF$DECOMPOSITION D(50, 50)
CHPF$DISTRIBUTE D(CYCLIC,BLOCK) ONTO P
CHPF$ALIGN A(:,:) WITH D(:,:)

These are given here merely for concreteness.)

Then here are the values to which each processor would set N1, L1,
U1, N2, L2, and U2:

Key:	proc
	[N1, L1, U1]
	[N2, L2, U2]

	(1,1)		(1,2)		(1,3)		(1,4)
	[13,1,13]	[13,1,13]	[13,1,13]	[13,1,13]
	[13,1,13]	[13,1,13]	[13,1,13]	[11,1,11]

	(2,1)		(2,2)		(2,3)		(2,4)
	[13,1,13]	[13,1,13]	[13,1,13]	[13,1,13]
	[13,1,13]	[13,1,13]	[13,1,13]	[11,1,11]

	(3,1)		(3,2)		(3,3)		(3,4)
	[12,1,12]	[12,1,12]	[12,1,12]	[12,1,12]
	[13,1,13]	[13,1,13]	[13,1,13]	[11,1,11]

	(4,1)		(4,2)		(4,3)		(4,4)
	[12,1,12]	[12,1,12]	[12,1,12]	[12,1,12]
	[13,1,13]	[13,1,13]	[13,1,13]	[11,1,11]


[A piece of rationale:

There is a question as to what L and U should be: should they be
equal to the global array indices, so that a processor might see
indices from 1:13, or 14:26, or 27:39, or 40:50?  Or should they be
normalized to start at 1, so that each processor sees a range of 1:13
or 1:11?  The former is in some ways more elegant, but the latter may
be prone to fewer errors because old Fortran programmers tend to
assume *all* arrays start at 1.

If we assume that the call to the local subroutine on processor (2,3),
for example, should behave as an ordinary Fortran 90 subroutine
call whose argument is an array section representing the subgrid local
to that processor:

      CALL CINDI_LAUPER(A(14:26,27:39))   ! on processor (2.3)

then the Fortran 90 specification for assumed-shape arrays settles
the matter: Lj should be 1, and Uj should be L-1+SIZE(A,j).

End of rationale.]

Sometimes, however, it is useful to know the global dimension
information for an array.  A separate series of intrinsics,
GLOBAL_SIZE, GLOBAL_LBOUND, and GLOBAL_UBOUND may be applied to
arrays to determine their global extents.  In the CINDI_LAUPER
example above, GLOBAL_SIZE(X,1) would return 50.

The following intrinsics may also be desirable:

MY_PROCESSOR_INDEX(array, dim) returns the index of the processor
executing the call along the specified axis of the processors
arrangement onto which the given array was distributed.

PROCESSOR_SIZE(array, dim) returns the length of the specified axis
of the processors arrangement onto which the given array was
distributed.

GLOBAL_LOC(array, idx1, ..., idxn) takes an array of rank n and n
integer local indices; it returns an array of rank 1 and size n
containing the corresponding global indices of indicated array
element.

PROCESSOR_LOC(array, idx1, ..., idxn) takes an array of rank n and n
integer global indices; it returns an array of rank 1 and size n
containing the indices of the processor containing the indicated
array element.

GLOBAL_SHAPE and PROCESSOR_SHAPE are defined in terms of GLOBAL_SIZE
and PROCESSOR_SIZE in the same manner that SHAPE is defined in terms
of SIZE.  So in the CINDI_LAUPER example,

      GLOBAL_SHAPE(X) is (/ 50, 50 /)      on every processor
      PROCESSOR_SHAPE(X) is (/ 4, 4 /)     on every processor
      SHAPE(X) is (/ 13, 13 /)	     in 6 of the processors,
		  (/ 13, 11 /)	     in 2 of the processors,
		  (/ 12, 13 /)	     in 6 of the processors, and
		  (/ 12, 11 /)	     in 2 of the processors.

If a local program unit calls a local subprogram, then it may be that
a locally allocated array may be passed as a parameter; this is okay.
The GLOBAL_ and PROCESSOR_ intrinsics may be applied to an array that
was allocated locally.  The GLOBAL values are the same as the
non-GLOBAL values, and all elements turn out to be on a single
processor.  The same is true of an array that is not distributed (for
example, allocated in the control processor, if there is one).

      LOCAL SUBROUTINE KINKS()
      REAL, ARRAY(40,40):: YOU_REALLY_GOT_ME
      CALL CINDI_LAUPER(YOU_REALLY_GOT_ME)

Then:

GLOBAL_SIZE(YOU_REALLY_GOT_ME, 1)   = SIZE(YOU_REALLY_GOT_ME, 1)   = 40
GLOBAL_LBOUND(YOU_REALLY_GOT_ME, 1) = LBOUND(YOU_REALLY_GOT_ME, 1) = 1
GLOBAL_UBOUND(YOU_REALLY_GOT_ME, 1) = UBOUND(YOU_REALLY_GOT_ME, 1) = 40
GLOBAL_INDEX(YOU_REALLY_GOT_ME, 1)  = INDEX(YOU_REALLY_GOT_ME, 1)  = 1

You can't access array elements outside the local subgrid using
array notation.  It is necessary to use a communications library
to accomplish this.  The specification of a complete communications
and synchronization library is beyond the scope of this proposal,
but some simple examples might include:

      CALL ARRAY_FETCH(dest, global_source, i1, ..., in)
      CALL ARRAY_STORE(source, global_dest, i1, ..., in)

Thesewhich transfer a scalar (integer, logical, real, complex) from
source to dest.  The global operand must be the name of an array, and
the following arguments are integers that are used as indexes into
the global array of which the named array is presumably a subgrid.
(If it's a local array, it still works, but it obviously doesn't have
to go outside the executing processor!  It's probably a silly thing
to do, though, unless you're trying to use a general-purpose routine
on a local array.)  The intended CM-5 implementation is through the
use of messages that generate interrupts at the destination
processor; the interrupt routine services read and write requests.

Extended Examples

Here is some code that implements bitwise-AND reduction
on a one-dimensional array of integers:

      INTEGER FUNCTION BITAND(X)
      INTEGER, ARRAY(:):: X
      INTEGER, ARRAY(PROCESSOR_SIZE(X)):: PARTIAL_RESULTS
C  GET EACH PROCESSOR TO COMBINE ITS OWN VALUES AND DELIVER ONE RESULT EACH.
      CALL BITAND_WITHIN_PROCESSOR(X, PARTIAL_RESULTS)
C  NOW USE A BINARY TREE TO COMBINE ONE VALUE FROM EACH PROCESSOR.
C  CAREFUL FOOTWORK MAKES SURE THAT ANY SIZE WORKS, NOT JUST POWER OF 2.
      K = PROCESSOR_SIZE(X)
      DO WHILE (K .GT. 1)
	PARTIAL_RESULTS(1:K/2) = IAND(PARTIAL_RESULTS(1:K/2),         &
				      PARTIAL_RESULTS((K+1)/2):K)
	K = (K+1)/2
      END DO
      BITAND = PARTIAL_RESULTS(1)
      END

      LOCAL SUBROUTINE BITAND_WITHIN_PROCESSOR(X, PARTIAL_RESULTS)
      INTEGER, ARRAY(:):: X
      INTEGER, ARRAY(1):: PARTIAL_RESULTS
      INTEGER P
C  WITHIN A PROCESSOR, JUST DO A SERIAL LOOP TO AND THINGS UP
C  (THIS CODE ASSUMES NO VECTOR PROCESSING IS AVAILABLE).
      P = -1
      DO J = LBOUND(X),UBOUND(X)   ! LBOUND(X) will in fact be 1
	P = IAND(P, X(J))
      END DO
      PARTIAL_RESULTS(1) = P
      END

In a global program unit, COMMON means global common.  The keyword
COMMON may optionally be preceded by the word GLOBAL.

In a LOCAL program unit, COMMON means local common.  Global COMMON may
be specified by prefixing the word COMMON with the word GLOBAL.
The keyword LOCAL may also appear before the word COMMON.


      SUBROUTINE QUESTION_MARK
      COMMON /FOO/ TEARS(96)          !COULD ALSO HAVE SAID "GLOBAL COMMON"
      CALL THE_MYSTERIANS
      END QUESTION_MARK

      LOCAL SUBROUTINE THE_MYSTERIANS
      GLOBAL COMMON /FOO/ TEARS(96)   !EACH PROCESSOR HAS CEILING(96/N) ELEMENTS
      COMMON /BAR/ CRY(4)             !EACH PROCESSOR HAS 4 ELEMENTS
      LOCAL COMMON /BAZ/ TEARDROPS(10000000)  !EACH PROCESSOR HAS 1E7 ELEMENTS
					      ! (TOO MANY TEARDROPS :-)
      ...
      !EACH PROCESSOR ASSIGNS ONLY TO ITS OWN SUBSET OF THE TEARS.
      FORALL (J=1:UBOUND(TEARS,1)) TEARS(J) = CRY(1+(MOD(J,4))
      ...
      END THE_MYSTERIANS


Fortran 90 Pointers

Just as we prohibit the use of array subscript syntax from requiring
interprocessor communication in LOCAL code, so we must place
restrictions on the use of pointers.

We distinguish between global pointers and local pointers; wherever
the POINTER attribute may appear, the attribute GLOBAL POINTER or
LOCAL POINTER may potentially appear instead.  (The attribute POINTER
always means GLOBAL POINTER.)

It is forbidden for global code to dereference a local pointer or for
local code to dereference a global pointer.  

Global code may not not perform pointer assignment involving local
pointers.  Local code may perform pointer assignment involving local
pointers, including assigning a local pointer to a global pointer or
a global pointer to a local pointer (implementations may wish to
insert a run-time error check in the latter case to ensure that the
global pointer refers to an object that is in fact local to the
executing processor).

[Open question: should there similarly be a distinction between
LOCAL TARGET and GLOBAL TARGET?]

A communications library may provide for interprocessor data transfer
using Fortran 90 pointers.  For example, such a library might include
routines like:

      CALL POINTER_FETCH(dest, global_source)
      CALL POINTER_STORE(source, global_dest)

where the "global" arguments are global pointers.


Fortran 90 Modules

It should work out to declare a Fortran 90 module LOCAL.  I haven't
yet looked at all the details of this, but it seems clear that any
subprogram in a LOCAL module should be treated as LOCAL.


Updating Replicated Arrays

If an array has a replicated distribution, then we must specify what
happens when such an array is updated by local code.  We will specify
that for each element of such an array, from one synchronization
point to the next, no more than one processor may update it, and if a
processor does update it then no other processor should read it.
(In the absence of a communications library providing additional
synchronization functionality, the only synchronization point occurs
at the transition from local execution to global execution when all
instances of a local subprogram return to their global caller.)  All
copies of a replicated element should behave as if brought up to date
no later than the next synchronization point.


The LOCALLY Statement

locally-statement	  is  LOCALLY
			        declarations ???
			        block
			      END LOCALLY

This statement behaves as if its contents were the body of a local
subroutine invoked at that point, with all visible objects that
it uses passed as parameters of the same name.

You may not use LOCALLY within local code (there is no point),
even though you may call a local subroutine from local code.

So the above example could be written:


      INTEGER FUNCTION BITAND(X)
      INTEGER, ARRAY(:):: X
      INTEGER, ARRAY(PROCESSOR_SIZE(X)):: PARTIAL_RESULTS
C  GET EACH PROCESSOR TO COMBINE ITS OWN VALUES AND DELIVER ONE RESULT EACH.
      LOCALLY
	INTEGER P
C  WITHIN A PROCESSOR, JUST DO A SERIAL LOOP TO AND THINGS UP
C  (THIS CODE ASSUMES NO VECTOR PROCESSING IS AVAILABLE).
	P = -1
	DO J = LBOUND(X),UBOUND(X)
	  P = IAND(P, X(J))
	END DO
	PARTIAL_RESULTS(1) = P
      END LOCALLY
C  NOW USE A BINARY TREE TO COMBINE ONE VALUE FROM EACH PROCESSOR.
C  CAREFUL FOOTWORK MAKES SURE THAT ANY SIZE WORKS, NOT JUST POWER OF 2.
      K = PROCESSOR_SIZE(X)
      DO WHILE (K .GT. 1)
	PARTIAL_RESULTS(1:K/2) = IAND(PARTIAL_RESULTS(1:K/2),         &
				      PARTIAL_RESULTS((K+1)/2):K)
	K = (K+1)/2
      END DO
      BITAND = PARTIAL_RESULTS(1)
      END


From gls@think.com  Thu Apr  2 14:44:46 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA07605); Thu, 2 Apr 92 14:44:46 CST
Received: from mail.think.com by erato.cs.rice.edu (AA21648); Thu, 2 Apr 92 14:44:43 CST
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Thu, 2 Apr 92 15:44:41 -0500
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.0C)
	id AA03310; Thu, 2 Apr 92 15:44:41 EST
Date: Thu, 2 Apr 92 15:44:41 EST
Message-Id: <9204022044.AA03310@strident.think.com>
To: chk@cs.rice.edu
Cc: hpff-forall@erato.cs.rice.edu, gls@think.com
In-Reply-To: chk@cs.rice.edu's message of Fri, 20 Mar 92 11:05:54 -0600 <9203201705.AA15777@erato.cs.rice.edu>
Subject: A proposal for FORALL loops


Proposal for loops in HPF

Guy L. Steele Jr.
Thinking Machines Corporation
Version of 2 April 1992

This proposal recommends:

[1] Synchronized (SIMD-style) FORALL statement
    [a] Single-assignment, as in Connection Machine Fortran
    [b] Block FORALL
    [c] Allow statements other than assignments in body
        [i] WHERE
	[ii] FORALL
        [iii] ALLOCATE and DEALLOCATE
	[iv] Other?

[2] Directive for independent execution of iterations
----------------------------------------------------------------

[1] Synchronized (SIMD-style) FORALL statement

[a] Single-assignment, as in Connection Machine Fortran

      FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn [, mask] ) a(e1,...,em) = rhs

    The FORALL statement declares v1,v2,...,vn to be integer
    variables locally to the FORALL statement.  Associated
    with each is a subscript-triplet.  There is also an optional
    logical mask expression that may depend on the variables
    v1,v2,...,vn. The single statement controlled must be an
    assignment with an array reference on the left-hand-side.
    Each of the variables v1,v2...,vn must appear within the
    subscript expression(s) on the left-hand-side.  (This is
    a syntactic consequence of the semantic rule that no two
    execution instances of the body may assign to the same
    array element.)  The assignment statement is executed
    once for every combination of values for the variables
    for which the mask expression is true.  (If there is no
    mask expression, it is assumed always to be true.)
    The various instances of the assignment statement are
    executed in such a way that it appears that the right-hand
    sides of all instances, and also the subscript expressions
    on the left-hand-sides of all instances, and also the mask
    expression, are evaluated before any assignment is performed.
    So a FORALL statement is roughly equivalent to:

      templ1 = l1
      tempu1 = u1
      temps1 = s1
      DO v1=templ1,tempu1,temps1
	templ2(v1) = l2
	tempu2(v1) = u2
	temps2(v1) = s2
	DO v2=templ2(v1),tempu2(v1),temps2(v10
	  ...
	    templn(v1,v2,...,v<n-1>) = ln
	    tempun(v1,v2,...,v<n-1>) = un
	    tempsn(v1,v2,...,v<n-1>) = sn
	    DO vn=templn(v1,v2,...,v<n-1>),tempun(v1,v2,...,v<n-1>),tempsn(v1,v2,...,v<n-1>)
	    tempmask(v1,v2,...,vn) = mask
	    tempe1(v1,v2,...,vn) = e1
	    ...
	    tempem(v1,v2,...,vn) = em
	    temprhs(v1,v2,...,vn) = rhs
	    END DO
	  ...
	END DO
      END DO
      DO v1=templ1,tempu1,temps1
	DO v2=templ2(v1),tempu2(v1),temps2(v10
          ...
	    DO vn=templn(v1,v2,...,v<n-1>),tempun(v1,v2,...,v<n-1>),tempsn(v1,v2,...,v<n-1>)
	      IF (tempmask(v1,v2,...,vn)) THEN
		a(tempe1(v1,v2,...,vn),...,tempem(v1,v2,...,vn)) = temprhs(v1,v2,...,vn)
	      END IF
            END DO
          ...
        END DO
      END DO

Of course, there are other ways to implement it as well.


[b] Block FORALL

      FORALL (... e1 ... e2 ...)
	s1
	s2
	...
	sn
      END FORALL

means exactly the same as

      temp1 = e1
      temp2 = e2
      FORALL (... temp1 ... temp2 ...) s1
      FORALL (... temp1 ... temp2 ...) s2
      ...
      FORALL (... temp1 ... temp2 ...) sn

That is, a block FORALL means exactly the same as replicating
the FORALL header in front of each array assignment statement
in the block, except that any expressions in the FORALL header
are evaluated only once, rather than being re-evaluated before
each of the statements in the body.

Thus one may wish to think of a block FORALL as synchronizing twice
per contained statement: once after handling the rhs and other
expressions but before performing assignments, and once after all
assignments have been performed but before commending the next
statement.  (In practice, appropriate flow analysis often permits
the compiler to eliminate unnecessary synchronizations.)

[c] Allow statements other than assignments in body

[i] IF and WHERE

Assume that a block IF is first reduced to single IF statements
by introducing temporary variables and replicating the IF header:

      IF (m) THEN
        s1
        ...
        sm
      ELSE
        t1
        ...
        tn
      END IF

becomes

      tempm = m
      IF (tempm) s1
      ...
      IF (tempm) sm
      IF (.NOT. tempm) t1
      ...
      IF (.NOT. tempm) tn

Then we simply define

      FORALL (v1=l1:u1:s1,...,vn=ln:un:sn,fmask) IF (imask) s

to mean

      FORALL (v1=l1:u1:s1,...,vn=ln:un:sn,fmask .AND. imask) s

WHERE can be treated similarly, though the details are a bit
more complicated.

The motivation here is to make it easier to transform DO loops
into FORALL statements and vice versa, despite the fact that they
may contain conditional statements.


[ii] FORALL

      FORALL (va1=...,...,van=...,maska) FORALL (vb1=...,...,vbn=...,maskb) s

means

      FORALL (va1=...,...,van=...,vb1=...,...,vbn=...,maska.AND.maskb) s

assuming there is no duplication of the variable names (a compiler
should logically rename the variables if there is a duplication).

The motivation here is to make it easier to transform DO loops
with DO loops (that happen not to be closely nested) into FORALL
blocks within FORALL blocks, and vice versa.


[iii] ALLOCATE and DEALLOCATE

In the presence of Fortran 90 pointers and derived types,
it would be perfectly sensible to say:

      FORALL (I=1:100) ALLOCATE(FOO(I)%SUBARRAY(F(I)))

thereby constructing a ragged array.


[iv] Other?


[2] Directive for independent execution of iterations

Let there be a directive

!HPF$INDEPENDENT

that can precede either a DO loop or a FORALL statement.
It asserts to the compiler that the iterations of the loop
may be executed independently--that is, in any order, or
interleaved, or concurrently--without changing the semantics
of the program.  (The compiler is justified in producing
a warning if it can prove otherwise.)

!HPF$INDEPENDENT
      DO I=1,100
        A(I)=B(P(I))   !I happen to know that P is a permutation
      END DO

!HPF$INDEPENDENT
      FORALL (I=1:100) A(I)=A(F(I))
!I happen to know that F(I) > 100, so synchronization is not
!needed to delay assignments until every rhs has been computed.

One may apply this directive to a nest of multiple loops
by listing all the loop variables of the loops in question;
the loops must be contiguous with the directive and in the
same order that the variables are listed:

!HPF$INDEPENDENT (I1,I2,I3)
      DO I1 = ...
        DO I2 = ...
          DO I3 = ...
            DO I4 = ...    !The inner two loops are *not* independent!
              DO I5 = ...
                ...
              END DO
            END DO
          END DO
        END DO
      END DO

In the case of a FORALL, any of the variables may be mentioned:

!HPF$INDEPENDENT (I1,I3)
      FORALL(I1=...,I2=...,I3=...) ...

This means that for any given values for I1 and I3,
all the right-hand sides for all values of I2 must
be computed before any assignment are done for that
specific pair of (I1,I3) values; but assignments for
one pair of (I1,I3) values need not wait for rhs
evaluation for a different pair of (I1,I3) values.

These directives are purely advisory and a compiler is free
to ignore them if it cannot make use of the information.

This directive is of course similar to the DOSHARED directive
of Cray MPP Fortran.  A different name is offered here to avoid
even the hint of commitment to execution by a shared memory machine.
Also, the "mechanism" syntax is omitted here, though we might want
to adopt it as further advice to the compiler about appropriate
implementation strategies, if we can agree on a desirable set
of options.

From gls@think.com  Wed Apr  8 16:37:53 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA12400); Wed, 8 Apr 92 16:37:53 CDT
Received: from mail.think.com by erato.cs.rice.edu (AA00425); Wed, 8 Apr 92 16:37:50 CDT
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Wed, 8 Apr 92 17:37:07 -0400
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.0C)
	id AA27312; Wed, 8 Apr 92 17:37:04 EDT
Date: Wed, 8 Apr 92 17:37:04 EDT
Message-Id: <9204082137.AA27312@strident.think.com>
To: hpff-forall@erato.cs.rice.edu
Cc: gls@think.com
Subject: Provocative question


It's been 12 days since I sent out

    Proposal for local program units in HPF

and 6 days since I sent out

    Proposal for loops in HPF

and no one has said "boo" since.  Am I to understand
from the deafening silence that everyone is perfectly
happy with these proposals, and we should just adopt
them and go on to the next thing??  :-)

--Guy Steele

From wu@cs.buffalo.edu  Sun Apr 12 10:51:55 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA10540); Sun, 12 Apr 92 10:51:55 CDT
Received: from ruby.cs.Buffalo.EDU by erato.cs.rice.edu (AA01490); Sun, 12 Apr 92 10:51:50 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA19244; Sun, 12 Apr 92 11:51:40 EDT
Date: Sun, 12 Apr 92 11:51:40 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9204121551.AA19244@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, gls@think.com
Subject: Re:  A proposal for FORALL loops
Cc: hpff-forall@erato.cs.rice.edu


In Guy Steele's forall proposal, what I concerned is about
the expressive power and implementation efficiency. 
I believe that the SIMD-style forall gives a clear semantics, however, if
implemented directly, will result in large amount of overhead.
This tightly synchronous forall introduces too many extra synchronizations.
Users may need a flexible control of synchronization, instead of having
two synchronizations, one for rhs and one for lhs, in each statement.

One may suggest a smart compiler that can recognize the dependency between
statements and eliminate unnecessary synchronizations.
It might be extremely difficult to identify all dependencies.
In the case that the compiler cannot determine whether there is a
dependency, it must assume a synchronization for safety.
For the efficiency issue, I concern the cost of executing programs, 
as well as compiling programs.
When there are many unnecessary synchronizations, as mentioned above, 
communication costs may incur high overhead.
Also, identifying unnecessary synchronizations increases compilation cost.

One question: are the conditional branches executed in parallel or sequential?
In the SIMD style, the else part must be executed after the if-then part.
Then how can we take the advantage of MIMD?

Min-You

From chk@erato.cs.rice.edu  Mon Apr 13 13:00:22 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA01132); Mon, 13 Apr 92 13:00:22 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA02222); Mon, 13 Apr 92 13:00:20 CDT
Message-Id: <9204131800.AA02222@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Word-Of-The-Day: solon : (n) a member of a legislative body
Subject: Proposed HPF feature
Date: Mon, 13 Apr 92 13:00:18 -0500
From: chk@erato.cs.rice.edu


I should be sending out a discussion of the various flavors of FORALL
loops later today.  In the meantime, here's something forwarded from
Mary Zosel.  I think it bears directly on whether we should think
about user assertions instead of / in addition to parallel loops.

>> Date:    Thu, 02 Apr 92 09:43:03 PST
>> To:      chk@rice.edu, ken@rice.edu
>> From:    zosel@phoenix.ocf.llnl.gov (Mary E Zosel)
>> Subject: hpf wish
>> 
>> 
>> Ken and Chuck
>> The following HPF "wish" was given to me yesterday.  I don't know that
>> it fits cleanly into any of our subgroups.  I'm forwarding it to you -
>> and maybe it is something that I can have 5 minutes at the next HPF
>> meeting to propose.  {It may open up the bucket of worms to ask if
>> we want to standardize directives about vectorization in general.}
>>    -mary-
>> 
>> ----
>> Statements of the following form are very common ...
>> 
>> f(ix) = f(ix) + delta
>> 
>> where ix is a non-unique index vector (actually usually ix(j)).
>>       f and delta are vectors (or arrays) of length less
>>       than ix.
>> 
>> This is used, for example to accumulate information into a node from
>> the surrounding zones.
>> 
>> Currently some compilers incorrectly vectorize this (unsafe without
>> special attentiion).
>> 
>> Other systems provide some ugly subroutine to call which does the
>> correct thing efficiently.
>> 
>> 
>> The spokesman for a primary part of our user community here at LLNL
>> would like to propose that HPF help address this problem.
>> 
>> Specifically, he would like a directive to specify to compilers that
>> ix is non-unique.  Then compilers could recognize this array syntax
>> statement should really be turned into the ugly subroutine call, while
>> leaving the code that the user writes to be clean array statements which
>> are portable between systems.
>> 
>> (If the user has to call the ugly subroutine, it often isn't portable -
>> and it messes up the readability of the code.)
>> 

From loveman@ftn90.enet.dec.com  Mon Apr 13 14:21:25 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA03256); Mon, 13 Apr 92 14:21:25 CDT
Received: from enet-gw.pa.dec.com by erato.cs.rice.edu (AA02261); Mon, 13 Apr 92 14:21:20 CDT
Received: by enet-gw.pa.dec.com; id AA05730; Mon, 13 Apr 92 12:20:57 -0700
Message-Id: <9204131920.AA05730@enet-gw.pa.dec.com>
Received: from ftn90.enet; by decwrl.enet; Mon, 13 Apr 92 12:21:00 PDT
Date: Mon, 13 Apr 92 12:21:00 PDT
From: David Loveman <loveman@ftn90.enet.dec.com>
To: hpff-forall@erato.cs.rice.edu
Cc: loveman@ftn90.enet.dec.com, zosel@phoenix.ocf.llnl.gov
Apparently-To: hpff-forall@erato.cs.rice.edu, zosel@phoenix.ocf.llnl.gov
Subject: Re: Proposed HPF feature


With respect to:

>> f(ix) = f(ix) + delta
>> 
>> where ix is a non-unique index vector (actually usually ix(j)).
>>       f and delta are vectors (or arrays) of length less
>>       than ix.

I feel that the user's wish is backwards.  The assertion one might want
to make is that the values in ix *are* unique.  In the absence of this
information, the compiler should *not* vectorize the code, but rather,
if it is clever enough, recognize the special case and call the ugly
subroutine for the user.

VAST provides a directive

CVD$ PERMUTATION(integer_array)

to assert that integer_array has no repeated values.  Thus, in this
case, the user wants to say

CVD$ NOPERMUTATION(ix)

[although VAST, in fact, does not allow "NO" to prefix "PERMUTATION"]

"PERMUTATION" is the wrong term, since it is not asserting that
integer_array is a permutation, just that it has no repeated values. 
This suggests a better term might be something like "UNIQUE_VALUES" and
NO_UNIQUE_VALUES"

-David

From meltzer@tamarack.cray.com  Mon Apr 13 14:46:34 1992
Received: from timbuk.cray.com by cs.rice.edu (AA03885); Mon, 13 Apr 92 14:46:34 CDT
Received: from willow14.cray.com by timbuk.cray.com (4.1/CRI-MX 1.6ad)
	id AA07423; Mon, 13 Apr 92 14:46:33 CDT
Received: by willow14.cray.com
	id AA09025; 4.1/CRI-5.6; Mon, 13 Apr 92 14:46:32 CDT
Date: Mon, 13 Apr 92 14:46:32 CDT
From: meltzer@tamarack.cray.com (Andy Meltzer)
Message-Id: <9204131946.AA09025@willow14.cray.com>
To: hpff-forall@cs.rice.edu
Subject: Re: Proposed HPF feature

> >> ----
> >> Statements of the following form are very common ...
> >> 
> >> f(ix) = f(ix) + delta
> >> 
> >> where ix is a non-unique index vector (actually usually ix(j)).
> >>       f and delta are vectors (or arrays) of length less
> >>       than ix.
> >> 
> >> Specifically, he would like a directive to specify to compilers that
> >> ix is non-unique.  Then compilers could recognize this array syntax
> >> statement should really be turned into the ugly subroutine call, while
> >> leaving the code that the user writes to be clean array statements which
> >> are portable between systems.
> >> 
> >> (If the user has to call the ugly subroutine, it often isn't portable -
> >> and it messes up the readability of the code.)
> >> 
> 

The current Cray Programming Model has the directive:

	CDIR$ ATOMIC UPDATE
	      f(ix) = f(ix) + delta

The purpose of the directive is to indicate to the compiler that each array
element in the statement which follows must be updated atomically, and may
be updated more than one time.  The compiler is free to do this in any 
way it choses, including calling an ugly subroutine of its own or even
sequentializing.

I'd be in favor of an approach more along this line.


							Andy Meltzer


From gls@think.com  Mon Apr 13 15:24:16 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA04740); Mon, 13 Apr 92 15:24:16 CDT
Received: from mail.think.com (Mail1.Think.COM) by erato.cs.rice.edu (AA02290); Mon, 13 Apr 92 15:24:13 CDT
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Mon, 13 Apr 92 16:24:05 -0400
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.0C)
	id AA08599; Mon, 13 Apr 92 16:24:04 EDT
Date: Mon, 13 Apr 92 16:24:04 EDT
Message-Id: <9204132024.AA08599@strident.think.com>
To: loveman@ftn90.enet.dec.com
Cc: hpff-forall@erato.cs.rice.edu, loveman@ftn90.enet.dec.com,
        zosel@phoenix.ocf.llnl.gov
In-Reply-To: David Loveman's message of Mon, 13 Apr 92 12:21:00 PDT <9204131920.AA05730@enet-gw.pa.dec.com>
Subject: Proposed HPF feature

   Date: Mon, 13 Apr 92 12:21:00 PDT
   From: David Loveman <loveman@ftn90.enet.dec.com>
   Apparently-To: hpff-forall@erato.cs.rice.edu, zosel@phoenix.ocf.llnl.gov


   With respect to:

   >> f(ix) = f(ix) + delta
   >> 
   >> where ix is a non-unique index vector (actually usually ix(j)).
   >>       f and delta are vectors (or arrays) of length less
   >>       than ix.

   I feel that the user's wish is backwards.  The assertion one might want
   to make is that the values in ix *are* unique.  In the absence of this
   information, the compiler should *not* vectorize the code, but rather,
   if it is clever enough, recognize the special case and call the ugly
   subroutine for the user.

   VAST provides a directive

   CVD$ PERMUTATION(integer_array)

   to assert that integer_array has no repeated values.  Thus, in this
   case, the user wants to say

   CVD$ NOPERMUTATION(ix)

   [although VAST, in fact, does not allow "NO" to prefix "PERMUTATION"]

   "PERMUTATION" is the wrong term, since it is not asserting that
   integer_array is a permutation, just that it has no repeated values. 
   This suggests a better term might be something like "UNIQUE_VALUES" and
   NO_UNIQUE_VALUES"

Actually, "DISTINCT_VALUES" and "POSSIBLY_REPEATED_VALUES"
("INDISTINCT_VALUES"???) would be more accurate.  I know
that computer scientists, especially, tend to abuse the word
"UNIQUE" in this way, but I would rather avoid it.

--Guy


From "ptrpan::stpierre"@tle.enet.dec.com  Tue Apr 14 08:53:39 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA17866); Tue, 14 Apr 92 08:53:39 CDT
Received: from crl.dec.com by erato.cs.rice.edu (AA02725); Tue, 14 Apr 92 08:53:37 CDT
Received: by crl.dec.com; id AA23273; Tue, 14 Apr 92 09:53:36 -0400
Received: by easynet.crl.dec.com; id AA23715; Tue, 14 Apr 92 09:51:52 -0400
Message-Id: <9204141351.AA23715@easynet.crl.dec.com>
Received: from tle.enet; by crl.enet; Tue, 14 Apr 92 09:51:53 EDT
Date: Tue, 14 Apr 92 09:51:53 EDT
From: Paul St. Pierre <"ptrpan::stpierre"@tle.enet.dec.com>
To: crl::"hpff-forall@erato.cs.rice.edu"@tle.enet.dec.com
Apparently-To: hpff-forall@erato.cs.rice.edu
Subject: Re: Proposed HPF feature

   >> f(ix) = f(ix) + delta
   >> 
   >> where ix is a non-unique index vector (actually usually ix(j)).
   >>       f and delta are vectors (or arrays) of length less
   >>       than ix.

I think this example raises a subtle issue for HPFF.  

Note that the user wants to legalize a particular behavior for a
non-standard-conforming program.  (Many-one array sections must not
appear on the left side, as per 6.2.2.3.2, and the right side is
always evaluated completely before assigning (7.5.1.5).)

The output of a program relying on this behavior will likely be
different for processors that support the HPF directive (however it's
spelled) and processors that don't.  This seems to me contrary to the
spirit of directives, which shouldn't change the semantics of a
program, just its performance.  (I can't recall any other HPFF
proposed directives that have this property, but I could be wrong.)

You may want to discuss whether HPF should be in the business of
legalizing incorrect programs with directives, thereby encouraging
non-portable behavior.

If you decide you do want to do that, I would suggest that the
directive be phrased so that the F90 standard-conforming behavior is
assumed, and the directive provides information overriding that.

--paul


From chk@erato.cs.rice.edu  Tue Apr 14 10:46:10 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA20826); Tue, 14 Apr 92 10:46:10 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA02805); Tue, 14 Apr 92 10:46:07 CDT
Message-Id: <9204141546.AA02805@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Word-Of-The-Day: savant : (n) a person with detailed knowledge in some
	specialized field
Subject: More discussion fodder
Date: Tue, 14 Apr 92 10:46:06 -0500
From: chk@erato.cs.rice.edu


Here's my attempt at bringing out the issues in choosing a FORALL
semantics.  No recommendations yet.  Surveys of potential users and
implementors to follow - hopefully those views won't be too
divergent...

                Parallel Loops Position Paper
               Chuck Koelbel, Rice University
			   
The purpose of this paper is to identify the major issues in
defining a parallel loop for High Performance Fortran. Here,
we are only considering the possible semantics of single-
and multi-statement parallel loops with no explicit
synchronization or processor control. Those features will be
discussed after the more basic issues in this document are
settled.

The remainder of this document is organized as follows.
First, a discussion of the underlying tradeoffs in defining
parallel loops is given. Using this discussion, four styles
of parallel loops are listed which may be acceptable to both
users and implementors. This is followed by a discussion of
the possible semantics of multi-statement loops. A survey of
user views on these features will be added in later
additions to this document.

1.   Tradeoffs in Parallel Loops

1.1. Assertions vs. Commands

"Parallel loops" are used in two distinct ways in current
programming languages and compilers.

Some systems treat a parallel loop as an assertion that the
iterations of the loop may safely be executed in any order
(particularly in parallel). The compiler may then take
advantage of this additional information in optimizing the
program. Cray compiler directives fall into this class. This
mode of use can enable large amounts of optimization; on the
other hand, since such assertions are often unchecked they
can create subtle bugs if they are false.

Other systems treat parallel loops as commands to execute
the iterations of the loop in parallel. If there are data
dependences in the loop that make this unsafe, they are
either given an exact semantics that can be executed in
parallel (as in the CM Fortran FORALL loop) or explicitly
left undefined (as in the PCF DOALL). The advantage of such
loops is that parallel execution is ensured; a disadvantage
is that finding a portable parallel semantics may be
difficult.

Because there is substantial existing practice with both
styles of loops, the choice of which style to use in HPFF is
not clear. A substantial part of the user poll below is
devoted to this topic. It is possible, and indeed likely,
that HPFF will want to support both loop styles.

1.2. Expressiveness vs. Implementation

As with many areas of language design, there is a tradeoff
between generality of expression and feasibility of
implementation.

At one end of the spectrum are parallel loops with precise
semantics for very general loop bodies. These constructs
have the advantage that the meaning of the statement is
always well-defined. The disadvantage is that compiler
implementation is very difficult because of the generality
needed; this is particularly a difficulty when portability
is a requirement.

Near the middle of the spectrum are parallel loops with
exact semantics for a more restricted class of loop bodies.
For example, the CM Fortran FORALL is well-defined for most
assignments, but disallows subroutine calls. An important
distinction that can be drawn is when the restrictions can
be checked: at compile time, at run time, or not at all.
Continuing with the CM Fortran example, the absence of CALL
statements can be checked by the compiler The advantage of
these constructs is that the right set of restrictions can
provide a convenient interface for the programmer and a
manageable task for the compiler writer. The disadvantage is
that finding the right trade-off between these options may
be difficult, particularly when considering portability.
At the far end of the spectrum are semantics which provide
an exact meaning for a restricted set of cases, and
explicitly leave the results in other cases undefined. For
example, the results of a CM Fortran loop with an indirect
assignment are well-defined if and only if the indirection
vector is one-to-one. As above, a distinction can be drawn
according to when the restrictions can be checked. In the
indirection example, the restriction can only be checked
while the program is running. The advantages and
disadvantages of this approach are much the same as the last
approach.

It is most likely that HPFF will adopt some sort of
compromise position. The important variables to be decided
are where the dividing lines are drawn between what is
allowed and what is not.

1.3. Loop Semantics

Orthogonal to the above question of expressibility is the
question of what semantics a parallel loop should use in
cases where it is defined. At the very restrictive end of
the spectrum, this is often not an issue because few
difficult cases remain to be defined. Most other positions,
however, admit a variety of reasonable interpretations for
the parallel loop semantics. For example, consider the case
where multiple iterations assign to the same array element.
Some choices for this case are: generate a run-time error,
use the value from the lexically last iteration to make the
assignment, and select a value arbitrarily from among the
iterations making the assignment. Questions such as this are
perhaps most prominent in multiple-statement parallel loops.
A later section will examine possible semantics for those
loops in detail.

2.   Styles of Parallel Loops

2.1. Assertions

A popular form of parallel loops is the simple assertion
that it is safe to execute all iterations in parallel.
"Safe" may be defined differently for different purposes.
The most common meaning is that no data item is assigned in
one iteration and used in another, that is, there is no loop-
carried data dependence. Another possible meaning is that
the computed results will be equivalent regardless of the
order of execution (for example, because the operations in
the loop are associative and commutative). Yet another is
that any result that could be produced by reordering loop
iterations is acceptable to the result's consumer (for
example, a search loop for any instance of a value).
Assertions are popular because, as simple statements of
fact, they are more portable than new statement semantics.
They may also be used for different purposes by different
machines; a compiler for a parallel machine may use a "no
dependence" directive for parallelization, while a
vectorizing compiler could produce vector instructions and a
scalar compiler might use it to optimize cache behavior. A
disadvantage of assertions is that programmers may not know
what assertion will produce good results. For example, the
assertion that a loop does not carry any data dependences is
irrelevant if poor performance is caused by excessive
communication.

An important aspect of assertions is how (or even if) they
can be checked. Assertions that can be efficiently checked
are advantageous because they are safer; their disadvantage
is that the checking itself may overcome any advantage from
using the assertion.

2.1.1.    Checked Assertions
  
In theory, assertions could be checked at compile time or at
run time We discuss both possibilities here. In practice, a
system will usually provide a means to turn assertion
testing off. We are not concerned with that matter here; we
are interested in classifying the types of assertions that
can be checked.

Compile-time checking is seldom used, because if the
compiler can do the checking then the assertion does not
provide new information. It has some uses for checking the
correctness of code written for other compilers, however. An
example of this might be an assertion that a loop does not
carry data dependences; a powerful compiler might attempt to
check such assertions, producing a warning if it found a
provable error.

Run-time checking is the most popular option. In this case,
the compiler inserts code to test the truth of an assertion.
Depending on the system, failure of the test either causes
an error or branches to less optimized code. Some conditions
may be testable, but only at a prohibitive cost (for
example, checking if an array is a permutation). Examples of
assertions that can be checked at run time are statements
regarding the values of variables.

2.1.2.    Unchecked Assertions

Some assertions are simply undecidable, and thus cannot be
checked in general (although methods for special cases may
be useful). In these cases, the compiler has little choice
but to believe the programmer and act accordingly. An
example of this type of assertion would be the claim that
any solution found by a (indeterminate) search could be
used.

As mentioned above, programmers may want the ability to turn
off assertion checking, even when the checking is feasible.
This is certainly an easy feature to build into the
compiler, but it may encourage sloppy coding.

2.2. Commands

The other major form of parallel loops is the command form.
In this style, a parallel loop is a special statement that
is guaranteed to execute in parallel, just as an IF
statement is guaranteed to perform a test and branch. There
are a number of possible semantics for the complex cases
that can arise in these loops; we will discuss these in the
next section. Here, we concentrate on a different aspect of
parallel command loops, the expressiveness of the loops.
Regardless of the exact semantics of the parallel loop,
there is a range of choices available for what constructs
are allowed in them. For the purposes of this discussion, we
group this range into two divisions, strict and nonstrict
commands.

Commands are a popular form because they give the programmer
fairly direct control over what will be executed in
parallel. This control can be improved even more by
additional constructs like the ON clause. The disadvantages
of this approach are possible implementation complexity,
when the detailed semantics do not match the actual machine
well, and possible programmer confusion stemming from the
difference between parallel and sequential semantics.

2.2.1.    Strict Commands

Strict commands limit the possible bodies of parallel loops
to operations that are unambiguous and relatively easily
implemented. Loops which do not conform to these
restrictions are not legal. For example, CM Fortran FORALL
loops cannot contain subroutine calls. These restrictions,
like the assertions above, may be checked at compile time or
run time, or may be unchecked. Earlier checking generally
allows more efficient safe code to be generated. Care must
be take when defining the set of restrictions to retain
enough expressiveness and to allow efficient implementation.

2.2.2.    Nonstrict Commands

Nonstrict commands allow more freedom in the bodies of
parallel loops, at the price of more complex semantics or
implementations. For example, the Myrias PARDO statement
allowed arbitrary statements within the loop body, governed
by a nondeterministic merging semantics. Another example is
the Fortran D FORALL, which provides a deterministic merging
semantics in the same situation that may be very difficult
to implement.

3.   Semantics of Parallel Loops

3.1. Basic Semantic Principles

This section will describe semantics for nonstrict command
parallel loops. These are the style of loop requiring the
most detailed semantics; strict loops can avoid many of
these complexities by simply disallowing any constructs that
might cause problems in the loop bodies. This section also
assumes mult-statement loops, again because it is the most
general case. We begin by describing the cases that must be
resolved for a full semantics. We will then describe several
methods of resolving these conflicts which are already in
use in other languages.

3.1.1.    Loop-carried Data Dependences

A loop-carried data dependence exists when one iteration
assigns to a memory location that is read by another
iteration. This situation is also called a read-write race
condition in the literature. Sequential semantics in this
situation require that the iterations involved be executed
serially. Parallel semantics can be obtained by leaving the
results of the read undefined, or by forcing the read value
to come from a safe (uncorrupted) copy.

3.1.2.    Loop-carried Output Dependences

This case is similar to the last, except that both
iterations are writing to the same memory location. It is
also referred to as a write-write race condition. Sequential
semantics require that the iterations execute in order.
Parallel semantics can be obtained by leaving the final
value of the location undefined, or by postulating a
parallel merging rule.

A special case that involves loop-carried data and output
dependences is accumulation operations. If the reads and
writes involved in the dependences are performing a series
of commutative and associative operations, such as summing
the elements of an array, then special parallel methods can
be applied. Allowing this type of parallelism may, however,
require a special semantics (possibly in conjunction with
new syntax).

3.1.3.    Other Problems

Multi-statement loops have the semantic problems described
above, but it may also be desirable to have different
resolution rules depending on whether the dependence
involves the same statement in both iterations or different
statements. Similarly, when nested statements are allowed,
it is not always clear what the semantics should be. There
are also interesting issues involving loop-independent
dependences (similar to loop-carried dependences, but
involving statements in the same loop iteration) in multi-
statement loops.

3.2. SIMD Semantics

Perhaps the simplest parallel semantics is SIMD semantics.
The basic rule of this semantics is that all values on the
right-hand side of an assignment are evaluated before any
values are stored into the left-hand side locations.
Statements in multi-statement loops are usually considered
separately, so values stored in one statement are used by
the next one (regardless of the iterations doing the reading
and storing of a particular location). Similarly, statements
within nested constructs are considered separately, with
some iterations masked out during branches they would not
follow. Reductions and output dependences are typically not
allowed, although there is no obvious difficulty with adding
them as special SIMD merge operations.

In terms of the model described above, SIMD semantics can be
described as
     1. Loop-carried data dependences within the same
     statement are resolved by performing all read
     operations first.
     2. Loop-carried output dependences within the same
     statement are left undefined.
     3. All other dependences are resolved by executing
     the statements sequentially, using the above
     rules.

3.3. Copy-in Copy-out Semantics

A semantics similar to the above is Copy-in, Copy-out
semantics. Here, all iterations operate on a conceptually
separate copy of the original memory. Values written by one
iteration are only visible to its own copy of the data
space; thus, the new values will be used by later statements
in the same iteration but not by statements in other
iterations. A deterministic merge operation combines the
values written by individual iterations at the end of the
loop. This merge performs accumulations on explicit REDUCE
operators, and chooses the lexically last iteration as the
controlling value in any other case.

In terms of the above model, Copy-in, Copy-out semantics can
be described as
     1. Loop-carried data dependences are resolved by
     always using the value on entry to the loop or a
     value assigned in the same iteration as the read.
     2. Loop-carried output dependences are resolved by
     a deterministic merge.
     3. Loop-independent dependences are resolved by
     sequential semantics within an iteration.

3.4. Dataflow Semantics

A more radical semantics are dataflow semantics. In this
case, the key rule is the single-assignment rule -- any
value will be assigned once and only once. In parallel
loops, this means that all right-hand side references take
their values from those in force before the loop began,
regardless of any assignments made within the loop itself.
Some provision, such as a new syntax, is usually provided
for reductions. Thus,
     1. Data dependences, whether loop-carried or loop-
     independent, are resolved by always using the
     value on entry to the loop.
     2. Output dependences of any kind are not allowed.

3.5. Undefined or Partially Defined

The final possible semantics is to leave the results of loop-
carried dependences only partially defined. The PCF DOALL,
for example, does not allow the programmer to make any
assumptions about values involved in loop-carried
dependences unless explicit synchronization is used.
Projects in distributed systems often guarantee some well-
defined style of serializability. This means that the
results are equivalent to some valid sequential ordering of
the memory accesses, but there may be many possible orders
to choose from. In general, these semantics provide a set of
possible resolutions to loop-carried dependences, but may
not provide a unique answer.

From halstead@crl.dec.com  Tue Apr 21 10:52:40 1992
Received: from crl.dec.com by cs.rice.edu (AA06348); Tue, 21 Apr 92 10:52:40 CDT
Received: by crl.dec.com; id AA27987; Tue, 21 Apr 92 11:52:38 -0400
Received: by easynet.crl.dec.com; id AA08308; Tue, 21 Apr 92 11:50:54 -0400
Message-Id: <9204211552.AA18846@seine.crl.dec.com>
To: Guy Steele <gls@think.com>
Cc: halstead@crl.dec.com, hpff-forall@cs.rice.edu
In-Reply-To: your message of Thu, 2 Apr 92 15:44:41 EST <9204022044.AA03310@strident.think.com>
Subject: re: A proposal for FORALL loops
Date: Tue, 21 Apr 92 11:52:37 -0400
From: halstead@crl.dec.com
X-Mts: smtp

Guy,

I finally got around to studying your FORALL proposal, and I have a
few questions about it:

  Proposal for loops in HPF

  Guy L. Steele Jr.
  Thinking Machines Corporation
  Version of 2 April 1992

  . . .

  [1] Synchronized (SIMD-style) FORALL statement

  [a] Single-assignment, as in Connection Machine Fortran

	FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn [, mask] ) a(e1,...,em) = rhs

You don't specify any restrictions on the form or content of rhs.  Was
it your intention to leave it unconstrained?  Or did you just omit
that part of the specification?  (As you know, lots of FORALL
specifications restrict the RHS rather severely, and if you don't
restrict it at all, you may have some interesting semantic and
implementation problems with things like calling user-defined
functions in the rhs.)

Similarly, you don't specify any restrictions on the form or content
of e1,...,em, except to state that every iteration variable must be
used at least once.  Can I call user-defined functions from here?

  . . .

  [iii] ALLOCATE and DEALLOCATE

  In the presence of Fortran 90 pointers and derived types,
  it would be perfectly sensible to say:

	FORALL (I=1:100) ALLOCATE(FOO(I)%SUBARRAY(F(I)))

  thereby constructing a ragged array.

Your use of a tentative word ("would") makes me wonder if you're
proposing this seriously, or just as an interesting idea to
consider...

Aside from these two questions, it seems like a good proposal,
although it goes a lot further than our proposal did.  (We were
intentionally restrained in our FORALL proposal, in the hope that at
least the minimal functionality would be adopted quickly, rather than
losing the whole ball of wax in a protracted debate about possible
extensions.  But I think we would have been pretty happy to go along
with a proposal like this one.)				-Bert

From gls@think.com  Tue Apr 21 11:10:13 1992
Received: from mail.think.com (Mail1.Think.COM) by cs.rice.edu (AA06980); Tue, 21 Apr 92 11:10:13 CDT
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Tue, 21 Apr 92 12:10:10 -0400
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.0C)
	id AA06473; Tue, 21 Apr 92 12:10:10 EDT
Date: Tue, 21 Apr 92 12:10:10 EDT
Message-Id: <9204211610.AA06473@strident.think.com>
To: halstead@crl.dec.com
Cc: gls@think.com, halstead@crl.dec.com, hpff-forall@cs.rice.edu
In-Reply-To: halstead@crl.dec.com's message of Tue, 21 Apr 92 11:52:37 -0400 <9204211552.AA18846@seine.crl.dec.com>
Subject: A proposal for FORALL loops

   Date: Tue, 21 Apr 92 11:52:37 -0400
   From: halstead@crl.dec.com

   Guy,

   I finally got around to studying your FORALL proposal, and I have a
   few questions about it:

     Proposal for loops in HPF

     Guy L. Steele Jr.
     Thinking Machines Corporation
     Version of 2 April 1992

     . . .

     [1] Synchronized (SIMD-style) FORALL statement

     [a] Single-assignment, as in Connection Machine Fortran

	   FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn [, mask] ) a(e1,...,em) = rhs

   You don't specify any restrictions on the form or content of rhs.  Was
   it your intention to leave it unconstrained?  Or did you just omit
   that part of the specification?  (As you know, lots of FORALL
   specifications restrict the RHS rather severely, and if you don't
   restrict it at all, you may have some interesting semantic and
   implementation problems with things like calling user-defined
   functions in the rhs.)

   Similarly, you don't specify any restrictions on the form or content
   of e1,...,em, except to state that every iteration variable must be
   used at least once.  Can I call user-defined functions from here?

Yes and yes, though if things  get too weird, the CM Fortran
compiler just punts, implements it as two serial loops using
an array temporary, and issues a warning.  As time goes on,
we have increased the set of cases it can parallelize.


     . . .

     [iii] ALLOCATE and DEALLOCATE

     In the presence of Fortran 90 pointers and derived types,
     it would be perfectly sensible to say:

	   FORALL (I=1:100) ALLOCATE(FOO(I)%SUBARRAY(F(I)))

     thereby constructing a ragged array.

   Your use of a tentative word ("would") makes me wonder if you're
   proposing this seriously, or just as an interesting idea to
   consider...

I just used "would" so as not to prejudge the question of whether
Fortran 90 pointers and derived types would be in HPF.

   Aside from these two questions, it seems like a good proposal,
   although it goes a lot further than our proposal did.  (We were
   intentionally restrained in our FORALL proposal, in the hope that at
   least the minimal functionality would be adopted quickly, rather than
   losing the whole ball of wax in a protracted debate about possible
   extensions.  But I think we would have been pretty happy to go along
   with a proposal like this one.)				-Bert

Thanks.
--Guy


From halstead@crl.dec.com  Tue Apr 21 12:59:29 1992
Received: from crl.dec.com by cs.rice.edu (AA09870); Tue, 21 Apr 92 12:59:29 CDT
Received: by crl.dec.com; id AA05101; Tue, 21 Apr 92 13:59:22 -0400
Received: by easynet.crl.dec.com; id AA09500; Tue, 21 Apr 92 13:57:37 -0400
Message-Id: <9204211759.AA19937@seine.crl.dec.com>
To: Guy Steele <gls@think.com>
Cc: halstead@crl.dec.com, hpff-forall@cs.rice.edu
In-Reply-To: your message of Fri, 27 Mar 92 13:14:05 EST <9203271814.AA24385@strident.think.com>
Subject: re: Working Group 4: Revised proposal for local subroutines
Date: Tue, 21 Apr 92 13:59:20 -0400
From: halstead@crl.dec.com
X-Mts: smtp

Guy, I finally read over your March 25 proposal for local subroutines.
I still haven't heard much comment on it, even after your plea early
this month, but I find a fair amount of fodder for thought and
controversy in it.

Your proposal clearly responds to an expressed need in the user
community, but I dare say that by the time there would be convergence
on the subject of local subroutines and local program execution
blocks, we would have done as much new language design as for all of
the rest of HPF put together.  This makes me think it might be
valuable to find ways to "factor out" the question of local
subroutines and local execution so that it could proceed in parallel
with the rest of the HPF specification but not be a source of delay.

So I've been thinking about how we could respond to the need with the
smallest amount of detail work, and here are some thoughts:

 * A low-tech way to get much of the value of local subroutines is to
   provide a foreign-function-call interface to another
   (thread-oriented) language.  HPF implementations will probably need
   to have such interfaces anyhow, so this shouldn't be extra work.
   But if we have such an interface, then instead of doing the
   language design work required to be able to write thread-oriented
   execution in HPF itself, we could just let that code be written in
   another (presumably already existing) language.

   Since (by this logic) we should already be considering
   foreign-function-call interfaces from HPF to thread-oriented
   languages, perhaps we should start by just considering that aspect
   of the problem, and once that seems to be under control, we can
   tackle the task of how to do thread-oriented programming in HPF
   itself, if we still want to.  A lot of the issues in your proposal
   (notably the question of defining the local views of global data)
   still arise in defining a foreign-function-call interface, but
   other issues (such as defining the thread-oriented synchronization
   primitives) do not, since they are the property of the language in
   which the foreign functions are written.

   So, as a concrete proposal for how to proceed with this issue, I
   propose that we leave aside the question of what constructs are
   used within the thread-oriented parts of the program and just try
   to define how the global HPF data is made accessible to the local
   procedures through the foreign-function-call interface.

 * If we do indeed specify HPF as "core HPF" plus "options," then I
   think local subroutines should go in as "options," since they may
   be difficult to implement on some architectures.  On the other
   hand, the foreign-function-call design may be something that wants
   to be part of the core.

Those are my two cents.					-Bert

From wu@cs.buffalo.edu  Mon May  4 06:34:56 1992
Received: from ruby.cs.Buffalo.EDU by cs.rice.edu (AA16163); Mon, 4 May 92 06:34:56 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA27040; Mon, 4 May 92 07:34:56 EDT
Date: Mon, 4 May 92 07:34:56 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9205041134.AA27040@ruby.cs.Buffalo.EDU>
To: hpff-forall@cs.rice.edu
Subject: A proposal for FORALL
Cc: wu@cs.buffalo.edu


Proposal for FORALL in HPF

Min-You Wu
SUNY at Buffalo
wu@cs.buffalo.edu
May 1992

We propose a block forall with the directives for independent 
execution of statements.  This proposal is based on Guy Steele's 
proposal and we will use his definitions without repeating.

The block forall is in the form of

      forall (...)
        a block of statements
      end forall

where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:

!HPF$INDEPENDENT
!HPF$ENDINDEPENDENT

The two directives must be used in pair.  A sub-block of statements 
parenthesized in the two directives is called an asynchronized
sub-block.  Other statements are in synchronized sub-blocks.  
The synchronized sub-block is the same as Guy Steele's synchronized 
forall statement, and the asynchronized sub-block is the same as the 
forall with the INDEPENDENT directives.  Thus, the block forall

      forall (e)
        b1
!HPF$INDEPENDENT
        b2
!HPF$ENDINDEPENDENT
        b3
      end forall

means exactly the same as

      forall (e)
        b1
      end forall
!HPF$INDEPENDENT
      forall (e)
        b2
      end forall
      forall (e)
        b3
      end forall

An asynchronized sub-block directly followed by another asynchronized 
sub-block means there is a synchronization between the two sub-blocks:

      forall (e)
!HPF$INDEPENDENT
        b1
!HPF$ENDINDEPENDENT
!HPF$INDEPENDENT
        b2
!HPF$ENDINDEPENDENT
      end forall

is the same as

!HPF$INDEPENDENT
      forall (e)
        b1
      end forall
!HPF$INDEPENDENT
      forall (e)
        b2
      end forall

Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronized sub-block are completely independent.
It is users' responsibility to ensure there is no dependency 
between instances in an asynchronized sub-block.

There is no restriction on the type of statements in an asynchronized 
sub-block.  That is, forall statements, do loops, while loops, 
where-elsewhere statements, if-then-else statements, case statements, 
subroutine and function calls, etc. can appear.  
On the other hand, the statements that can appear in a synchronized 
sub-block are restricted.  The degree of restrictions will be 
determined later.  At least the following statements are allowed:
assignment statements, forall statements, do loops, where statements 
(not where-elsewhere), if statements (not if-then-else), reduction 
statements (see below), and some intrinsic functions.


Reduction

A reduction statement is synchronous and cannot appear in an
asynchronized sub-block unless it is the last statement in the block.

We propose to use special operators for the reduction operations.
As an example, the following forall statement provides a sum 
reduction over a(i) and assigns the result to a scalar variable,
with `+=' as a sum reduction operator:

      forall (i=1:N) 
        x = (+= a(i))
      end forall

Whenever it is possible that a memory location will be multiply 
assigned, the problem can be resolved by using a reduction operator, 
or its value will be arbitrary.  Here is an example:

      forall (i=1:N)
        b(i/c) = (+= a(i))
      end forall

We list some possible reduction operators as follows:

+=      Sum of values  
*=      Product of values 
&=      Logical AND 
|=      Logical OR 
^=      Logical XOR
<?=     Minimum of values 
>?=     Maximum of values 

Using the reduction operators, we can also provide scan functions.

      forall (i=1:N) 
        b(i) = (+= a(1:i)) 
      end forall

Using reduction operators but reduction intrinsics allows reduction 
operations in the forall body without exiting from the forall.  See 
the following example:

(1) With intrinsic function:

      forall (i=1:N) 
!HPF$INDEPENDENT
        a(i) = i
!HPF$ENDINDEPENDENT
      end forall
      b = SUM(a)
      forall (i=1:N) 
!HPF$INDEPENDENT
        c(i) = a(i) + b
!HPF$ENDINDEPENDENT
      end forall

(2) With reduction operator:

      forall (i=1:N) 
!HPF$INDEPENDENT
        a(i) = i
        b = (+= a(i)) 
!HPF$ENDINDEPENDENT
!HPF$INDEPENDENT
        c(i) = a(i) + b
!HPF$ENDINDEPENDENT
      end forall


Notes:

With the reduction defined above, only limited operators can be defined.  
We do not have `COUNT', `MAXLOC', `MINLOC', etc.  Another solution is 
using some reduction functions similar to intrinsic functions.  
For example:

      forall (i=1:N) 
        x = SUM(a(i))
      end forall

However, users may be confused with the current intrinsic function SUM.


Rationale:

1. A forall with a single asynchronized sub-block is the same as a 
doindependent (or doall, or doeach, etc.), as shown below:

      forall (e)
!HPF$INDEPENDENT
        b1
!HPF$ENDINDEPENDENT
      end forall

A forall without any INDEPENDENT directive is the same as a tightly 
synchronized forall.  In this way, we need to define only one type 
of parallel constructs.  Furthermore, combining asynchronized and 
synchronized foralls, we have a loosely synchronized forall which
is more flexible.

2. With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  One may suggest a smart compiler 
that can recognize the dependency and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependencies.  
When the compiler cannot determine whether there is a dependency, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.


From gmap11@f1ibmsv2.gmd.de  Thu May  7 08:17:41 1992
Received: from gmdzi.gmd.de by cs.rice.edu (AA05320); Thu, 7 May 92 08:17:41 CDT
Received: from f1ibmsv2.gmd.de (f1ibmsv2) by gmdzi.gmd.de with SMTP id AA24915
  (5.65c/IDA-1.4.4 for <hpff-forall@cs.rice.edu>); Thu, 7 May 1992 15:17:42 +0200
Received: by f1ibmsv2.gmd.de id AA28421; Thu, 7 May 92 15:16:49 GMT
Date: Thu, 7 May 92 15:16:49 GMT
From: gmap11@f1ibmsv2.gmd.de (C.A. Thole)
Message-Id: <9205071516.AA28421@f1ibmsv2.gmd.de>
To: hpff-forall@cs.rice.edu

This paper is an extended version of my statement on the April HPFF
working meeting in DALLAS.
It is also an responce on Chuck Koelbel's contribution about different
kinds of DO parallelism.


	          A vote for explicit MIMD support
         Clemens-August Thole, GMD I1.T, D-5205 St. Augustin
                        gmap11@gmdzi.gmd.de
 
                      Version of May 08, 1992


1.  MIMD parallelism is important

Although the CM-2, the AMT DAP and the Maspar machines have proven their
usefulness for many applications, it is important to exploit MIMD parallelism 
for many applications. In its introduction of loosly synchronous applications 
Fox [1] gives many examples of data parallel applications, which are not of
SIMD-type. Just for reference I would like to introduce three examples
from different areas:

1.1  Finite Element computations

The setup of element matrices is one of the four computations steps during
the treatment of finite element problems. (The f(ix) = f(ix) + delta example
corresponds to the assembly phase of such programs). All the element 
matrices can be computed in parallel, but the code for the element depends
on its type. Usual finite-element program support a lot of different types
(about 100?) and several of them are usually used for the solution of one
specific problem.

For each type of an element a different Fortran subroutine is used to set-up 
the element matrices.

Example for basic code structure:

	do 10 i=1, number_of_elements
           goto (101, 102, ............, 199), type_of_element(i)
           ...... error condition for illegal type ......

101        call sub_101( element_data(i))
           goto 10

102        call sub_102( element_data(i))
           goto 10

           .....

199        call sub_199( element_data(i))
           goto 10
10      continue


1.2 CFD computations

During the integration of a flow around the body of an air plane all the 
grid points can be treated in parallel. The actual code for various
grid points is different due to the treatment of:

    -  different types of boundary conditions
    -  use of special wall functions near the boundary
    -  treatment of shocks
    -  treatment of areas with sub-sonic or super-sonic flow 
    -  use of different basic models in different areas 
       (Potential Equation, Euler Equations, Navier Equations in different
        Areas)

A simple example is the treatment of a O-Net around an wing with
special treatment of the friction a near the wing:

      grid_data (0:L+1, 0:M+1, 0:N+1)

C.....boundary conditions for the far field and for the symmetry plane

      call outer_bc1 ( grid_data) updates grid_data( 1:L, 1:M, N+1)
      call outer_bc2 ( grid_data) updates grid_data(   0, 1:M, 1:N)
      call outer_bc3 ( grid_data) updates grid_data( L+1, 1:M, 1:N)

C.....boundary conditions for the wing surface and the contact surface
C.....at the side of the wing
      call plane_bc1 ( grid_data) updates grid_data( 1:L1, 1:M, 0)
      call plane_bc2 ( grid_data) updates grid_data( L1+1:L, 1:M, 0)

C.....boundary conditions for the cut after the wing
      call cut_bc    ( grid_data) updates grid_data( 1:L,   0, 1:N) and
                                          grid_data( 1:L, M+1, 1:N)

C.....treatment of inner point except the area near the wing
      call inner     ( grid_data) updates grid_data( 1:L, 6:M, 1:N)
                             different code is used depending on the 
                             size of the local Reynolds-number

C.....treatment of the area near the wing with special wall functions
      call wall_func ( grid_data) updates grid_data( 1:L, 1:5, 1:N)


1.3   ab initio computation in chemistry

This computation requires the evaluation of integrals for each pair of
the set of basic functions. For bundles of this pairs of integrals
it is determined first, whether the values contributed significantly
to the result. If not the evaluation of these integrals is skipped.

If the integrals for a bundle have to be evaluated, special cubature procedures
are used, which depend on the type of currently used basic function and
the required accuracy.


2. Different Modes of Computations for these examples


2.1  SIMD - mode

TMC has shown in their examples, that each of these cases can be done on an
SIMD architecture with some overhead. Different type of boundary conditions,
for example, can be treated as special cases of one generalized boundary 
condition.

In the end, each of the different remaining cases of computation (for example
the generalized boundary condition or the treatment of the wall functions) 
has to be evaluated for any grid point and not only for a small subset. 
In the CFD case, for example, the treatment of the wall functions or certain 
boundary condition involves much more operations per grid point than inner 
point. A performance penalty is the result. 


2.2 MIMD - non parallel mode

If on an MIMD architecture the compiler does not recognize, that the different
cases of updates can be treated in parallel the Finite Element example would
be executed sequentially. For the CFD example each subroutine would be
executed one after the other. In the later case the execution will be faster
than in the SIMD mode, because computations will be executed only for the
grid-points involved. Because each node will contain inner points and boundary 
points, boundary conditions will only be done for a fractions of its grid points.
This results in faster execution than in the SIMD mode. 

In the case of highly parallel architectures for a subset of nodes the 
ratio of "irregular" and "regular" grid points for example will become
very unfavorate, which results in load balancing problems. 

A solution of this load-balancing problem by overlapping the treatment of
"regular" and various kinds of "irregular" points cannot be achived.


2.3 MIMD - parallel mode

The MIMD - parallel mode allows the overlapping of the treatment of different 
kinds of data elements like grid points at boundaries, the wall function area 
and the regular interior of the computational domain.


3. Conclusion

1. The discussions above indicate that a parallel treatment of data elements
   of large data objects but with varying functionals is common in different
   scientific areas.

2. The treatment of different subsets of the data objects is quite often
   encapsulated in subroutines or each alternative requires large parts of code. 
   It cannot be expected to be detected automatically in most cases. 

3. Comparing assertions and commands for the expression of MIMD parallelism
   the command-type approach has the risk of large overhead for copying and
   merging data objects. The compiler technology necessary to avoid this
   overhead is the same as needed for the detection of parallelism. Passing
   array sections to subroutines and specifying the intend of use in 
   subroutine interface blocks helps but the CFD example shows, that the
   compiler needs very good technology in order to determine, that the 
   different subsets of the data objects are disjoint with each other.

   The assertion-type of expressing parallelism is therefore the much
   faster way to garanty high-performance.

4. Some kind of parallel sections should be supported. 

5. It should be possible to nest parallelism. 

6. An ON clause using sections should be allowed to describe which subset
   of processors execution an instance of a parallel construct.

7. A basic question is whether all data used in an parallel instance and
   therefore possibly inside a subroutine must be local to the set of
   processors assigned to executed the parallel instance or whether this
   restriction can be avoided. 

   This questions implies a decision about the question, whether communication
   between processors is assumed to be able to generate interrupts at the 
   destination processor. (see also Guy Steele's remark on the implementation of 
   CALL ARRAY_FETCH in his Proposal for local program units in HPF, version of 
   March 25, 1992.)

References:

[1]	Fox: Parallel problem architectures and their implication for
	portable parallel software systems. CRPC-TR91120. February 1991.
	(and further references in that paper)

From loveman@ftn90.enet.dec.com  Thu May 21 11:54:48 1992
Received: from enet-gw.pa.dec.com by cs.rice.edu (AA00819); Thu, 21 May 92 11:54:48 CDT
Received: by enet-gw.pa.dec.com; id AA05335; Thu, 21 May 92 09:54:46 -0700
Message-Id: <9205211654.AA05335@enet-gw.pa.dec.com>
Received: from ftn90.enet; by decwrl.enet; Thu, 21 May 92 09:54:47 PDT
Date: Thu, 21 May 92 09:54:47 PDT
From: David Loveman <loveman@ftn90.enet.dec.com>
To: hpff-forall@cs.rice.edu
Cc: loveman@ftn90.enet.dec.com, nelson@ftn90.enet.dec.com, roger_s@pa.dec.com,
        halstead@crl.enet.dec.com
Apparently-To: hpff-forall@cs.rice.edu
Subject: an implementor's questions about FORALL


How many times do the mask and the rhs in a FORALL get evaluated?


1) In a FORALL with a mask, how many times does the rhs get evaluated?

a.  as many times as the same FORALL without a mask would cause it to
be evaluated?  This is what the "roughly equivalent to" definition in
the curent proposal says.

b.  as many TRUEs as there are in the result of evaluating the mask?


2) Is 
     FORALL(. . . . ., mask). . . .
equivalent to
     IF ANY(mask) THEN
          FORALL(. . . . .,mask). . . . .
     ENDIF
?
Even with side effects?


3) Is
     FORALL(. . . . .,mask). . . . .
equivalent to
     FORALL(. . . . .)WHERE(mask). . . . .
assuming we allow WHERE in FORALL?


Comment:  There are currently three definitions of FORALL
     Appendix F of S8.104 (Fortran 8x)
     MasPar (DECmpp) Fortran
     TMC Fortran
I believe we should give a precise definition of FORALL, based on the
UNION (INTERSECTION?) of these three existing definitions + whatever
else we want to add in HPF.  This will take some language lawyering,
but will be worth it.

Let's not forget that we need efficient implementation on scalar
machines as well as on parallel machines.


From chk@erato.cs.rice.edu  Thu May 21 12:23:18 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA01374); Thu, 21 May 92 12:23:18 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA14501); Thu, 21 May 92 12:23:14 CDT
Message-Id: <9205211723.AA14501@erato.cs.rice.edu>
To: David Loveman <loveman@ftn90.enet.dec.com>
Cc: hpff-forall@cs.rice.edu, nelson@ftn90.enet.dec.com, roger_s@pa.dec.com,
        halstead@crl.enet.dec.com
Subject: Re: an implementor's questions about FORALL 
In-Reply-To: Your message of Thu, 21 May 92 09:54:47 -0700.
             <9205211654.AA05335@enet-gw.pa.dec.com> 
Date: Thu, 21 May 92 12:23:12 -0500
From: chk@erato.cs.rice.edu


I'm far behind in my HPFF reading and writing (yes, the April meeting
minutes WILL be coming out soon!), so I won't comment on your
questions without some further thought.  I'd like to ask for
clarification on your coment, however:

> 
> Comment:  There are currently three definitions of FORALL
>      Appendix F of S8.104 (Fortran 8x)
>      MasPar (DECmpp) Fortran
>      TMC Fortran
> I believe we should give a precise definition of FORALL, based on the
> UNION (INTERSECTION?) of these three existing definitions + whatever
> else we want to add in HPF.  This will take some language lawyering,
> but will be worth it.
> 

Could someone explain the differences between the three FORALL
definitions?  I thought that the MasPar/DECmpp and TMC definitions
were equivalent.  In particular, 
	1. Is the difference just in what restrictions are placed on
	the FORALL body in each case?
	2. Or are there examples where the same body gives different
	results under different definitions?
Note that taking the union of the definitions is (maybe) practical
under the first condition, but impossible under the second.

	Chuck

From loveman@ftn90.enet.dec.com  Fri May 22 06:48:25 1992
Received: from enet-gw.pa.dec.com by cs.rice.edu (AA07054); Fri, 22 May 92 06:48:25 CDT
Received: by enet-gw.pa.dec.com; id AA16805; Fri, 22 May 92 04:48:23 -0700
Message-Id: <9205221148.AA16805@enet-gw.pa.dec.com>
Received: from ftn90.enet; by decwrl.enet; Fri, 22 May 92 04:48:23 PDT
Date: Fri, 22 May 92 04:48:23 PDT
From: David Loveman <loveman@ftn90.enet.dec.com>
To: chk@cs.rice.edu
Cc: hpff-forall@cs.rice.edu, nelson@ftn90.enet.dec.com, roger_s@pa.dec.com,
        halstead@crl.enet.dec.com, loveman@ftn90.enet.dec.com
Apparently-To: chk@cs.rice.edu, hpff-forall@cs.rice.edu
Subject: Re: an implementor's questions about FORALL


>> 
>> Comment:  There are currently three definitions of FORALL
>>      Appendix F of S8.104 (Fortran 8x)
>>      MasPar (DECmpp) Fortran
>>      TMC Fortran
>> I believe we should give a precise definition of FORALL, based on the
>> UNION (INTERSECTION?) of these three existing definitions + whatever
>> else we want to add in HPF.  This will take some language lawyering,
>> but will be worth it.
>> 
>
>Could someone explain the differences between the three FORALL
>definitions?  I thought that the MasPar/DECmpp and TMC definitions
>were equivalent.  In particular, 
>	1. Is the difference just in what restrictions are placed on
>	the FORALL body in each case?
>	2. Or are there examples where the same body gives different
>	results under different definitions?
>Note that taking the union of the definitions is (maybe) practical
>under the first condition, but impossible under the second.

COMPASS implemented FORALL in its compiler technology based on the
definition in Appendix F of S8.104 (Fortran 8x).  This COMPASS compiler
technology is the base technology for the TMC, MasPar, and DECmpp
Fortran compilers.  As a result, the definitions are "the same" except
for some wording differences, and they meant the same since the base
implementations were the same.

These compilers have been evolving on their own for a while, so there
may be some slight divergence but, I think, the three companies (TMC,
MasPar, and Digital) all "know what they mean" when they say "FORALL." 
The major evolution efforts have been to handle progressively more
FORALL cases in parallel.  

My concern is that, assuming HPF is successful, there will be other
compilers developed that will contain FORALL.  The developers of those
compilers will base their definition on the HPF definition and may not
"know what we mean" when we say FORALL.  Thus it is in all our
interests to get the HPF stand-alone definition of FORALL accurate,
especially since the S8.104 definition is in a somewhat obsure place.

I hereby volunteer (fool that I am) to try such a draft.

-David

From meltzer@tamarack.cray.com  Fri May 29 14:00:39 1992
Received: from timbuk.cray.com by cs.rice.edu (AA08279); Fri, 29 May 92 14:00:39 CDT
Received: from willow14.cray.com by timbuk.cray.com (4.1/CRI-MX 1.6af)
	id AA03728; Fri, 29 May 92 14:00:39 CDT
Received: by willow14.cray.com
	id AA19402; 4.1/CRI-5.6; Fri, 29 May 92 14:00:38 CDT
Date: Fri, 29 May 92 14:00:38 CDT
From: meltzer@tamarack.cray.com (Andy Meltzer)
Message-Id: <9205291900.AA19402@willow14.cray.com>
To: hpff-distribute@cs.rice.edu, hpff-forall@cs.rice.edu
Subject: Re:  Draft Proposal on Sequence and Storage Association

>	Modification:   Original ------- R. Swift: March, 1992.
>			Version 1.0 ---  R. Swift: April 10, 1992
>			Version 1.1 ---  R. Schreiber, R. Swift: May 8, 1992


I'm not sure exactly what is different here, it seems that it has just
been cleaned up some.  Are there any substantial changes?

>	The compiler will insure that sequence and storage association
>	operates correctly for sequential arrays and COMMON blocks by
>	limiting the types of implicit distributions that are employed
>	for them.

This is a very limiting way to ensure that sequence and storage association
operate correctly.  It is mandating an implementation.  Can't we just say
that "the compiler will ensure that sequence and storage association 
operate correctly" without telling the compiler how to do so?

My proposal (sent out about a month ago) details a way to do this.  I again
want to point out that where we need not mandate limitations to the user
community, we should not do so.  The conservative approach is not to 
restrict these behaviors, it is to allow them, but to warn the user that
(like many other directives) some sequence and storage association 
distribution directives might be ignored by some compilers.  A compiler
might also warn that potential non-high-performance behavior might result.

Furthermore, by the definitions put forward here, any compiler which 
decides to extend the availability of sequence and storage association
is in violation of the specification, rather than extending it.

Again, please consider my proposed extension to this draft.


						Andy Meltzer


From joelw@mozart.convex.com  Fri May 29 16:36:47 1992
Received: from convex.convex.com by cs.rice.edu (AA13684); Fri, 29 May 92 16:36:47 CDT
Received: from mozart.convex.com by convex.convex.com (5.64/1.35)
	id AA04368; Fri, 29 May 92 16:36:32 -0500
Received: by mozart.convex.com (5.64/1.28)
	id AA25262; Fri, 29 May 92 16:36:30 -0500
From: joelw@mozart.convex.com (Joel Williamson)
Message-Id: <9205292136.AA25262@mozart.convex.com>
Subject: Re:  Draft Proposal on Sequence and Storage Association
To: meltzer@tamarack.cray.com (Andy Meltzer)
Date: Fri, 29 May 92 16:36:30 CDT
Cc: hpff-distribute@cs.rice.edu, hpff-forall@cs.rice.edu
In-Reply-To: <9205291900.AA19402@willow14.cray.com>; from "Andy Meltzer" at May 29, 92 2:00 pm
X-Mailer: ELM [version 2.3 PL11]

Andy Meltzer writes:

	...stuff deleted...
> 
> My proposal (sent out about a month ago) details a way to do this.  I again
> want to point out that where we need not mandate limitations to the user
> community, we should not do so.  


> The conservative approach is not to 
> restrict these behaviors, it is to allow them, 


> but to warn the user that
> (like many other directives) some sequence and storage association 
> distribution directives might be ignored by some compilers.  A compiler
> might also warn that potential non-high-performance behavior might result.
> 
> Furthermore, by the definitions put forward here, any compiler which 
> decides to extend the availability of sequence and storage association
> is in violation of the specification, rather than extending it.
> 
> Again, please consider my proposed extension to this draft.
> 
> 
> 
> 
> 						Andy Meltzer
> 

I completely agree with Andy.

Joel Williamson
> 
> 
> 


From loveman@ftn90.enet.dec.com  Tue Jun  2 16:08:57 1992
Received: from enet-gw.pa.dec.com by cs.rice.edu (AA17415); Tue, 2 Jun 92 16:08:57 CDT
Received: by enet-gw.pa.dec.com; id AA24718; Tue, 2 Jun 92 14:07:41 -0700
Message-Id: <9206022107.AA24718@enet-gw.pa.dec.com>
Received: from ftn90.enet; by decwrl.enet; Tue, 2 Jun 92 14:07:41 PDT
Date: Tue, 2 Jun 92 14:07:41 PDT
From: David Loveman <loveman@ftn90.enet.dec.com>
To: hpff-forall@cs.rice.edu
Cc: loveman@ftn90.enet.dec.com
Apparently-To: hpff-forall@cs.rice.edu
Subject: FORALL details and proposal


Following is a commentary on FORALL,  (for reference) the Fortran 8x
definition of FORALL, some notes on the Thinking Machines, MasPar, and
DECmpp definitions of FORALL, and an attempt to augment the April 2
FORALL proposal with some more detail, especially for the simple case. 
Note that the "scalarized" definition here is semantically different,
and I think more accurate, than the one given in the April 2 proposal,
and needs to be talked about.

Sorry about the LaTeX.  I intend to put it into HPF Canonical LaTeX
Form (HCLF) once it is defined.  I hope it is readable.  (It will LaTeX
as is, with no header files.)

-David

-------------------------------------------------------------------------------

%forall.tex

\documentstyle[11pt]{article}
\pagestyle{plain}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin  .25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin   -.5in
\textwidth 6in
\textheight  9.0in


\title{Analysis of FORALL Definitions and Proposal}
\author{David Loveman}
%\date{   }


\begin{document}

\maketitle

\section{Overview}

There are currently three definitions of FORALL:  Appendix F of S8.104
(Fortran 8x), MasPar (DECmpp) Fortran, and Thinking Machines Fortran. 
In addition, there is an existing HPFF proposal for FORALL that differs
in some details from the existing definitions.  This document reviews
the existing definitions of the FORALL statement as well as the current
HPFF proposal for FORALL, discusses the differences between them, and
provides a detailed revised HPFF FORALL proposal based on the S8.104
definition and the current HPFF proposal.

COMPASS originally implemented FORALL in its compiler technology based
on the definition in Appendix F of S8.104 (Fortran 8x).  This COMPASS
compiler technology is the base technology for the Thinking Machines,
MasPar, and DECmpp Fortran compilers.  As a result, the definitions of
FORALL are ``the same'' except for some wording differences, and they
mean the same since the base implementations were the same.

These compilers have been evolving on their own for a while, so there
may be some slight divergence but  the three companies all ``know what
they mean'' when they say FORALL. The major evolution efforts have been
to handle progressively more FORALL cases in parallel.  

This proposal anticipates that, assuming HPF is successful, there will
be other compilers developed that will contain FORALL.  The developers
of those compilers will base their definition on the HPF definition and
may not ``know what we mean'' when we say FORALL.  Thus it is in all
our interests to get the HPF stand-alone definition of FORALL accurate,
especially since the S8.104 definition is in a somewhat obsure place. 
In addition, there is a requirement to allow efficient implementations
on scalar machines as well as on parallel machines.


\section{Definition of FORALL from X3J3/S8 Version 104 - April 1987}

{\bf F.2.3 Element Array Assignment - FORALL.}  The element array
assignment statement is used to specify an array assignment in terms of
array elements or array sections.  The element array assignment may be
masked with a scalar logical or bit expression.  Rule R223 for {\it
action-stmt} is extended to include the {\it forall-stmt} and appears
as RF40 (F4.3.2).\\

\noindent
{\bf F2.3.1 General Form of Element Array Assignment.}

\begin{verbatim}
RF27 forall-stmt          is FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ])
                                  forall-assignment

RF28 forall-triplet-spec  is subscript-name = subscript : subscript [ : stride]
\end{verbatim}

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.\\

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\begin{verbatim}
RF29 forall-assignment    is array-element = expr
                          or array-section = expr
\end{verbatim}

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]

\noindent
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
$INT((m2 -m1 + m3) / m3) \leq 0$, the {\it forall-assignment} is not executed.\\

Examples of element array assignments are:

\begin{verbatim}
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, A(I,J) .NE. 0.0) B(I,J) = 1.0 / A(I,J)
\end{verbatim} 

{\bf F.2.3.2 Interpretation of Element Array Assignments.}  Execution
of an element array assignment consists of the evaluation in any order
of the subscript and stride expressions in the {\it
forall-triplet-spec-list}, the evaluation of the scalar mask
expression, and the evaluation of the expr in the {\it
forall-assignment} for all valid combinations of subscript names for
which the scalar mask expression is true, followed by the assignment of
these values to the corresponding elements of the array being assigned
to.  If the scalar mask expresion is omitted, it is as if it were
present with the value true.  If the scalar mask expression is of type
BIT, an expression with value B'1' is treated as true and an expression
value B'0' is treated as false.

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.  The scope of the
subscript name is the FORALL statement itself. A function reference
appearing in any expression in the {\it forall-assignment} must not
redefine any subscript name.


\section{DECmpp Parallel Fortran Version 1.0 (= MasPar MPF Version 1.2)}

\begin{itemize}

\item An upper bound subscript in a forall-triplet-spec may be omitted,
no semantics given.

\item The scalar-mask-expr must be of type logical (i.e. not type bit,
which is also a removed extension) and can reference the subscript-names.

\item Conflict:  The text says `` A [forall-stmt] can be specified in
terms of array elements or array sections . . .'' but then goes on to
say that ``[forall-asmt] is $<$array-element = expr$>$''

\item The forall-assignment must not contain a character expression.

\item Text says that subscripts and strides are evaluated first, but
does not say ``in any order.''

\item There is no precise statement of the set of permitted values of
the subscripts as determined on entry.

\item There is no explicit statement corresponding to the last
paragraph:  ``The forall-assignment must not cause any element of the
array being assigned to be assigned a value more than once.  The scope
of the subscript name is the FORALL statement itself. A function
reference appearing in any expression in the forall-assignment must not
redefine any subscript name.''

\item There is an explicit statement describing the cases for whch
parallel code is generated:  ``Arrays indexed by FORALL subscripts must
use all the FORALL subscripts exactly once, in the order in which they
appear in the FORALL header.  These subscripts must be bare - that is,
not in expressions - and they cannot appear as section bounds or
strides.  There can be additional scalar subscripts or sections not
involving any FORALL subscript-name(s); any sections must follow the
FORALL subscript-name(s).  There cannot be any transformational
intrinsics nor any user-written function calls, only scalar intrinsics
(not involving any FORALL subscripts) or elemental intrinsics
(involving FORALL subscripts).''

\end{itemize}

\section{CM Fortran Reference Manual Version 1.0, February 1991}

\begin{itemize}

\item The text is very close to that in S8.104.

\item No bit expressions in scalar-mask-expr.

\end{itemize}


\section{Current HPFF Proposal - Synchronized (SIMD-style) FORALL statement}

\subsection{Single Assignment}

\begin{verbatim}
      FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn [, mask] ) &
           a(e1,...,em) = rhs
\end{verbatim}

    The FORALL statement declares v1,v2,...,vn to be integer
    variables locally to the FORALL statement.  Associated
    with each is a subscript-triplet.  There is also an optional
    logical mask expression that may depend on the variables
    v1,v2,...,vn. The single statement controlled must be an
    assignment with an array reference on the left-hand-side.
    Each of the variables v1,v2...,vn must appear within the
    subscript expression(s) on the left-hand-side.  (This is
    a syntactic consequence of the semantic rule that no two
    execution instances of the body may assign to the same
    array element.)  The assignment statement is executed
    once for every combination of values for the variables
    for which the mask expression is true.  (If there is no
    mask expression, it is assumed always to be true.)
    The various instances of the assignment statement are
    executed in such a way that it appears that the right-hand
    sides of all instances, and also the subscript expressions
    on the left-hand-sides of all instances, and also the mask
    expression, are evaluated before any assignment is performed.
    So a FORALL statement is roughly equivalent to:

\begin{verbatim}
      templ1 = l1
      tempu1 = u1
      temps1 = s1
      DO v1=templ1,tempu1,temps1
        templ2(v1) = l2
        tempu2(v1) = u2
        temps2(v1) = s2
        DO v2=templ2(v1),tempu2(v1),temps2(v1)
          ...
            templn(v1,v2,...,v<n-1>) = ln
            tempun(v1,v2,...,v<n-1>) = un
            tempsn(v1,v2,...,v<n-1>) = sn
            DO vn=templn(v1,v2,...,v<n-1>),tempun(v1,v2,...,v<n-1>),  &
                 tempsn(v1,v2,...,v<n-1>)
            tempmask(v1,v2,...,vn) = mask
            tempe1(v1,v2,...,vn) = e1
            ...
            tempem(v1,v2,...,vn) = em
            temprhs(v1,v2,...,vn) = rhs
        END DO
	  ...
        END DO
      END DO
      DO v1=templ1,tempu1,temps1
        DO v2=templ2(v1),tempu2(v1),temps2(v10
          ...
            DO vn=templn(v1,v2,...,v<n-1>),tempun(v1,v2,...,v<n-1>),  &
                 tempsn(v1,v2,...,v<n-1>)
              IF (tempmask(v1,v2,...,vn)) THEN
                a(tempe1(v1,v2,...,vn),...,tempem(v1,v2,...,vn))   &
                      = temprhs(v1,v2,...,vn)
              END IF
            END DO
          ...
        END DO
      END DO
\end{verbatim}

Of course, there are other ways to implement it as well.


\subsection{Block FORALL}

\begin{verbatim}
    FORALL (... e1 ... e2 ...)
        s1
        s2
        ...
        sn
      END FORALL
\end{verbatim}

means exactly the same as

\begin{verbatim}
      temp1 = e1
      temp2 = e2
      FORALL (... temp1 ... temp2 ...) s1
      FORALL (... temp1 ... temp2 ...) s2
      ...
      FORALL (... temp1 ... temp2 ...) sn
\end{verbatim}

That is, a block FORALL means exactly the same as replicating
the FORALL header in front of each array assignment statement
in the block, except that any expressions in the FORALL header
are evaluated only once, rather than being re-evaluated before
each of the statements in the body.

Thus one may wish to think of a block FORALL as synchronizing twice
per contained statement: once after handling the rhs and other
expressions but before performing assignments, and once after all
assignments have been performed but before commending the next
statement.  (In practice, appropriate flow analysis often permits
the compiler to eliminate unnecessary synchronizations.)

\subsection{Allow Statements Other Than Assignments in Body}

\subsubsection{IF and WHERE}

Assume that a block IF is first reduced to single IF statements
by introducing temporary variables and replicating the IF header:

\begin{verbatim}
      IF (m) THEN
        s1
        ...
        sm
      ELSE
        t1
        ...
        tn
      END IF
\end{verbatim}

\noindent
becomes

\begin{verbatim}
      tempm = m
      IF (tempm) s1
      ...
      IF (tempm) sm
      IF (.NOT. tempm) t1
      ...
      IF (.NOT. tempm) tn
\end{verbatim}

Then we simply define

\begin{verbatim}
      FORALL (v1=l1:u1:s1,...,vn=ln:un:sn,fmask) IF (imask) s
\end{verbatim}

to mean

\begin{verbatim}
      FORALL (v1=l1:u1:s1,...,vn=ln:un:sn,fmask .AND. imask) s
\end{verbatim}

WHERE can be treated similarly, though the details are a bit
more complicated.

The motivation here is to make it easier to transform DO loops
into FORALL statements and vice versa, despite the fact that they
may contain conditional statements.


\subsubsection{FORALL}

\begin{verbatim}
      FORALL (va1=...,...,van=...,maska)  &
           FORALL (vb1=...,...,vbn=...,maskb) s
\end{verbatim}

\noindent
means

\begin{verbatim}
      FORALL (va1=...,...,van=...,vb1=...,...,vbn=...,maska.AND.maskb) s
\end{verbatim}

\noindent
assuming there is no duplication of the variable names (a compiler
should logically rename the variables if there is a duplication).

The motivation here is to make it easier to transform DO loops
with DO loops (that happen not to be closely nested) into FORALL
blocks within FORALL blocks, and vice versa.


\section{Comments on the Above}

\begin{itemize}

\item I believe that we should base the HPF FORALL on the original
definition from S8.104.

\item Fortran 90 does not have a BIT data type.  The definition of
FORALL must change accordingly.

\item Digital, MasPar, and Thinking Machines seem to agree that
character expressions should be disallowed.

\item Omissions in the Digital and MasPar descriptions appear to be
document wording problems, rather than intention or implementation
problems.  Thinking Machines documents do not have this problem because
they copied text directly from S8.104.

\item Naturally I am favorably disposed to the approach in the current
proposal of definition by means of source-to-source transformation to
simpler forms.  Unfortunately, the definition in the current proposal
is not quite correct with regard to questions such as ``How many times
does the rhs get evaluated.''  The source-to-source transformational
definitions should be correct scalarizations of the language
constructs, usable as such for a (naive) workstation implementation.

\item Our definitions should contain at least two parts:  a formal
proposal part defining, in the Fortran 90 specification style, what the
HPF language features are;  and a consequences part providing
discussion, rationale, intended usage, and (possibly) non-obvious
consequences of the formal proposal. 

\end{itemize}


\section{A Revised Proposal for FORALL}

\subsection{Element Array Assignment - FORALL}  The element array
assignment statement is used to specify an array assignment in terms of
array elements or array sections.  The element array assignment may be
masked with a scalar logical expression.  Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsubsection{General Form of Element Array Assignment}

\begin{verbatim}
forall-stmt          is FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ])
                          forall-assignment

forall-triplet-spec  is subscript-name = subscript : subscript [ : stride]
\end{verbatim}

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.\\

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\begin{verbatim}
forall-assignment    is array-element = expr
                     or array-section = expr
\end{verbatim}

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]

\noindent
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
$INT((m2 -m1 + m3) / m3) \leq 0$, the {\it forall-assignment} is not executed.\\

Examples of element array assignments are:

\begin{verbatim}
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, A(I,J) .NE. 0.0) B(I,J) = 1.0 / A(I,J)
\end{verbatim} 

\subsubsection{Interpretation of Element Array Assignments}  

Execution of an element array assignment consists of the evaluation in
any order of the subscript and stride expressions in the {\it
forall-triplet-spec-list}, the evaluation of the scalar mask
expression, and the evaluation of the expr in the {\it
forall-assignment} for all valid combinations of subscript names for
which the scalar mask expression is true, followed by the assignment of
these values to the corresponding elements of the array being assigned
to.  If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.  The scope of the
subscript name is the FORALL statement itself. A function reference
appearing in any expression in the {\it forall-assignment} must not
redefine any subscript name.

\subsubsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

\begin{verbatim}
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn [, mask] ) &
      a(e1,...,em) = rhs
\end{verbatim}

\noindent
is equivalent to the following standard Fortran 90 code:

\begin{verbatim}
!evaluate subscript and stride expressions in any order
templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
	  ...
  END DO
END DO

!then evaluate the expr in the forall-assignment 
!(and lhs subscripts) 
!for all valid combinations of subscript names 
!for which the scalar mask expression is true
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          !in any order
          tempe1(v1,v2,...,vn) = e1
            ...
          tempem(v1,v2,...,vn) = em
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
	  ...
  END DO
END DO

!then perform the assignment of these values to 
!the corresponding elements of the array being assigned to
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(tempe1(v1,v2,...,vn),...,tempem(v1,v2,...,vn)) = temprhs(v1,v2,...,vn)
        END IF
      END DO
	  ...
  END DO
END DO
\end{verbatim}

\subsubsection{Consequences of the Definition of the FORALL Statement}

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  (This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Subscripts on the left hand side of a {\it forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\item The evaluation of expressions within {\it array-element} or {\it
array-section} must neither affect nor be affected by the evaluation of {\it expr}.

\end{itemize}


\subsection{FORALL Construct}

The FORALL construct is a generalization of the element array
assignment statement allowing multiple {\it forall-assignment}s to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it forall-construct}.\\

\subsubsection{General Form of the FORALL Construct}

\begin{verbatim}
forall-stmt          is FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ])
                          forall-body-stmt-list
                        END FORALL

forall-body-stmt     is forall-assignment

forall-triplet-spec  is subscript-name = subscript : subscript [ : stride]
\end{verbatim}

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.\\

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\begin{verbatim}
forall-assignment    is array-element = expr
                     or array-section = expr
\end{verbatim}

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]

\noindent
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
$INT((m2 -m1 + m3) / m3) \leq 0$, the {\it forall-assignment}s are not executed.\\

Examples of the FORALL construct are:

\begin{verbatim}
 t o  b e  d o n e
\end{verbatim}

\subsubsection{Interpretation of the FORALL Construct}  

Execution of a FORALL construct consists of the evaluation in any order
of the subscript and stride expressions in the {\it
forall-triplet-spec-list}, the evaluation of the scalar mask
expression, and the evaluation of the {\it forall-assignment}s in
order.  If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The evaluation of a {\it forall-assignment} consists of the evaluation
of the expr in the {\it forall-assignment} for all valid combinations
of subscript names for which the scalar mask expression is true,
followed by the assignment of these values to the corresponding
elements of the array being assigned to.

A {\it forall-assignment} must not cause any element of the array being
assigned to be assigned a value more than once.  The scope of the
subscript name is the FORALL statement itself. A function reference
appearing in any expression in a {\it forall-assignment} must not
redefine any subscript name.

\subsubsection{Scalarization of the FORALL Construct}

A {it forall-construct} of the general form:

\begin{verbatim}
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
\end{verbatim}

\noindent
is equivalent to the following scalar code:

\begin{verbatim}
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
\end{verbatim}


\subsubsection{Consequences of the Definition of the FORALL Construct}

\begin{itemize}

\item A block FORALL means exactly the same as replicating the FORALL
header in front of each array assignment statement in the block, except
that any expressions in the FORALL header are evaluated only once,
rather than being re-evaluated before each of the statements in the body.

\item One may think of a block FORALL as synchronizing twice per
contained statement: once after handling the rhs and other expressions
but before performing assignments, and once after all assignments have
been performed but before commencing the next statement.  (In practice,
appropriate dependence analysis will often permit the compiler to
eliminate unnecessary synchronizations.)

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  (This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Subscripts on the left hand side of a {\it forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\end{itemize}

\subsection{Extending the Definition of forall-body-stmt}

The motivation here is to make it easier to transform nested DO loops
into FORALL constructs and vice versa, despite the fact that they may
contain conditional statements and non-tightly nested sub-loops.  

{\bf !!MORE WORK NEEDED HERE!!}

\subsubsection{IF Statements as forall-body-stmts}

Assume that a block IF is first reduced to single IF statements by
introducing temporary variables and replicating the IF header:

\begin{verbatim}
IF (m) THEN
  s1
    ...
  sm
ELSE
  t1
    ...
  tn
END IF
\end{verbatim}

\noindent
becomes

\begin{verbatim}
tempm = m
IF (tempm) s1
  ...
IF (tempm) sm
IF (.NOT. tempm) t1
  ...
IF (.NOT. tempm) tn
\end{verbatim}

Then we simply define

\begin{verbatim}
FORALL (v1=l1:u1:s1,...,vn=ln:un:sn,fmask) IF (imask) s
\end{verbatim}

to mean

\begin{verbatim}
FORALL (v1=l1:u1:s1,...,vn=ln:un:sn,fmask .AND. imask) s
\end{verbatim}


\subsubsection{WHERE Statements as forall-body-stmts}

WHERE can be treated similarly, though the details are a bit more complicated.


\subsubsection{FORALL Statements as forall-body-stmts}


\begin{verbatim}
FORALL (va1=...,...,van=...,maska)  &
  FORALL (vb1=...,...,vbn=...,maskb) s
\end{verbatim}

\noindent
means

\begin{verbatim}
FORALL (va1=...,...,van=...,vb1=...,...,vbn=...,maska.AND.maskb) s
\end{verbatim}

\noindent
assuming there is no duplication of the variable names (a compiler
should logically rename the variables if there is a duplication).


\end{document}

From mmdf@clink.co.uk  Mon Jun 15 03:54:37 1992
Received: from eros.uknet.ac.uk by cs.rice.edu (AA06187); Mon, 15 Jun 92 03:54:37 CDT
Received: from compulink.co.uk by eros.uknet.ac.uk with UUCP 
          id <6688-0@eros.uknet.ac.uk>; Mon, 15 Jun 1992 09:53:38 +0100
Date: Mon, 15 Jun 92 09:38 GMT
From: Glossa <glossa@cix.clink.co.uk>
Subject: FORALL Semantics
To: hpff-forall@cs.rice.edu
Cc: loveman@ftn90.enet.dec.com
Reply-To: glossa@cix.clink.co.uk
Message-Id: <memo.474573@cix.compulink.co.uk>


FOR-ALL statement


   When you cannot check a condition why maintain it? We cannot in general check   th
at each element of a forall array assignment left-hand side array is only
   assigned once. So why try to assert it?

   And what about side effects on the right (or left) hand side. How do they 
   interact? 

   Why not state that the statement is executed for each combination of the
   for-all indices  and that any interleaving of the separate statement 
   executions gives an acceptable execution? In other words, the programmer
   cannot be sure of the effect unless he has written a program independent
   of which interleaving is chosen.  This subsumes both side effect and 
   multiple assignment issues. 

   Thanks for using STANDARD LaTeX in your excellent paper David!


Tom Lake

From chk@erato.cs.rice.edu  Mon Jun 15 12:35:52 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA17708); Mon, 15 Jun 92 12:35:52 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA02565); Mon, 15 Jun 92 12:35:53 CDT
Message-Id: <9206151735.AA02565@erato.cs.rice.edu>
To: glossa@cix.clink.co.uk
Cc: hpff-forall@cs.rice.edu
Subject: Re: FORALL Semantics 
In-Reply-To: Your message of Mon, 15 Jun 92 09:38:00 +0000.
             <memo.474573@cix.compulink.co.uk> 
Date: Mon, 15 Jun 92 12:35:51 -0500
From: chk@erato.cs.rice.edu


> Date: Mon, 15 Jun 92 09:38 GMT
> From: Glossa <glossa@cix.clink.co.uk>
> Subject: FORALL Semantics
> 
> FOR-ALL statement
> 
> 
>    When you cannot check a condition why maintain it? We cannot in general check   th
> at each element of a forall array assignment left-hand side array is only
>    assigned once. So why try to assert it?

Why can't this condition be checked at run-time?  Inefficient check:
	FORALL (I = 1:N) A(F(I)) = B(I)
becomes
	DO J1 = 1, SIZEOF(A)
	  TEMP_A(J1) = .FALSE.
	ENDDO
	DO I = 1, N
	  TEMP_I = F(I)
	  IF ( TEMP_A(TEMP_I) ) THEN
	    PRINT *, 'ERROR - REASSIGNMENT TO A(',TEMP_A,') AT I=',I
	    STOP
	  ELSE
	    A(TEMP_I) = B(I)
	    TEMP_A(TEMP_I) = B(I)
	  ENDIF
	ENDDO
An efficient implementation would make one or more of the following
optimizations:
    1. Don't check LHS subscripts that can be proved independent
	in the compiler (for example A(I))
    2. Detect reassignments using parallel global combining operations
    3. Have a "no seat belt" mode without checking (analogous to
	turning off subscript bounds checks)

Also note that there are a lot of other conditions in Fortran that
can't be checked at compile or run-time - for example, aliasing of
function parameters is illegal *if it changes the computed value*.

>    And what about side effects on the right (or left) hand side. How do they 
>    interact? 

This is indeed something that needs to be nailed down.  Part of the
answer can be found in the draft:

\item The evaluation of expressions within {\it array-element} or {\it
array-section} must neither affect nor be affected by the evaluation
of {\it expr}.

For reference, the BNF referred to is below
forall-stmt          is FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ])
                          forall-assignment
forall-triplet-spec  is subscript-name = subscript : subscript [ : stride]
forall-assignment    is array-element = expr
                     or array-section = expr


This disallows side effects in the LHS from affecting the RHS.  It
also disallows RHS side effects from affecting the subscript
expressions in the LHS.  As written now, it says nothing about RHS
side effects interfering with RHS evaluations in other iterations (and
similarly for the LHS) - a hole you can drive finite-element programs
through.

Note that side effects will be more difficult to check than the
condition you object to above.

>    Why not state that the statement is executed for each combination of the
>    for-all indices  and that any interleaving of the separate statement 
>    executions gives an acceptable execution? In other words, the programmer
>    cannot be sure of the effect unless he has written a program independent
>    of which interleaving is chosen.  This subsumes both side effect and 
>    multiple assignment issues. 

This disallows the following
	FORALL (I = 2:N-1) A(I) = (A(I-1) + A(I) + A(I+1)) / 3
which has the same meaning (under the current proposal) as
	A(2:N-1) = (A(1:N-2) + A(2:N-1) + A(3:N)) / 3
Here, the array notation is more compact; for larger expressions, the
FORALL tends to be shorter and clearer.  Feedback from CM-2 users is
that they like the FORALL for expressing this style of computation;
HPF should learn from this kind of experience.

I also question your proposal on another ground - taking statements as
the basic unit of granularity.  Consider
	FORALL (I = 1:N-1) A(I) = F(I)
(Assume F is a user-defined function.)  Does your semantics mean that
F must be executed atomically?  Note that arbitrarily interleaving
copies of A(I) = F(I) is not the same as interleaving the statements
executed in F!

I don't mean to dismiss the general thrust of your message - that
FORALL needs a firm definition (at least, that's what I take your
point to be).  You are absolutely right that we haven't discussed many
important cases, and need to in the near future.  I think, however,
that the semantics you propose is too restricted.

>    Thanks for using STANDARD LaTeX in your excellent paper David!

I'll second that!

> Tom Lake

	Chuck Koelbel

From chk@erato.cs.rice.edu  Tue Jun 16 16:12:23 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA07833); Tue, 16 Jun 92 16:12:23 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA03496); Tue, 16 Jun 92 16:12:20 CDT
Message-Id: <9206162112.AA03496@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Subject: Local Subroutine proposal
Date: Tue, 16 Jun 92 16:12:15 -0500
From: chk@erato.cs.rice.edu


Forwarded from Marc Snir...

------- Forwarded Message

Date: Mon, 15 Jun 92 20:07:43 EDT
From: "Marc Snir" <SNIR%YKTVMV.bitnet@cunyvm.cuny.edu>
To: chk@cs.rice.edu
Subject: Forwarding note
Reply-To: SNIR@ibm.com
Status: RO

- ------------------------------- Referenced Note ---------------------------
%local.tex
%Snir
\documentstyle[11pt]{article}
\pagestyle{plain}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin  .25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin   -.5in
\textwidth 6in
\textheight  9.0in


\title{Proposal for Local Routines}
\author{Marc Snir}
%\date{   }


\begin{document}

\maketitle

\section{Overview}
Local routines are an escape mechanism for allowing to call SPMD code
from an HPF program.  Such mechanism will allow to deal with those
problems that are not dealt efficiently in HPF, and to hand-tune
critical kernels.  In addition, the local routine interface allows to
develop message passing run-time libraries that can be called from HPF.

Local routines are external -- they are not part of the HPF language.   The
local routines need not be necessarily be compiled by the same compiler
that compiles HPF -- although HPF compilers may be able to handle F90
local routines.

The
execution model for local routines is different from the HPF execution model:
Local routines are executed on each processor; they can access on each
processor the local part of distributed arrays; they can use
arbitrary message passing libraries for interprocessor communication. Thus, data
distribution and parallelism are not semantically transparent inside
local routines (this is a reason for their :q.external:eq. status).  However,
if local routine fulfills the requirement listed below,
then the execution
of local routines is semantically equivalent to the execution of an HPF routine.

We describe here an interface to F90 routines
- -- F77 or C routines can be dealt with in a similar manner.


\section{Requirements from the Caller}
\begin{enumerate}
\item
The caller must include an interface definition (in HPF syntax)
for each local routine called.  The local routine is declared with the keyword
LOCAL.  The interface block may contain HPF align/distribute directives for
alignable/distributable arrays.
\item
If the local routine is a function that returns a scalar value then
the corresponding HPF function returns a 1 dimensional array of size
NUMPROC, distributed one element per processor.  If the local function returns
an array of rank $k$ then the corresponding HPF function returns an array of
rank $k+1$ with the leading dimension of size NUMPROC, block distributed
among the processors.
\end{enumerate}
\section{Calling sequence generated by HPF compiler}
\begin{enumerate}
\item
For each nondistributable argument a corresponding name
(of the same type) must be present in each processor.
\item
Compatibility must be enforced between actual and formal parameters in terms of
type, shape, alignment and distribution prior to the call.
\item
Each argument for which an HPF align/distribute directive appears in the
interface block is realigned/redistributed according to the
directive prior to the call (directives are binding in local routine
interfaces!).  The original distribution is restored after the call returns.
\item
If a variable is replicated, then all copies are updated
prior to the call to contain the current value of
the name according to the sequential semantics of the source program.
\item
All processors are synchronized before and after the call.
\item
The local subroutine is called on each processor, with a parameter list
described below.
\item
For each nondistributable dummy argument, the routine is passed
the local copy of the actual argument. For each distributable dummy argument,
the local routine is passed two actual arguments.  The first one is an array
that consists of the local elements of the distributed array.  The second one is
a {\em distributed array descriptor}. (DAD).  The DAD describes the
current distribution of an HPF array.
\end{enumerate}
\section{Requirements from the Callee}
\begin{enumerate}
\item
Arguments are accessed in the conventional manner.
IN/OUT restrictions on them must be obeyed as stated in
the interface.
\item
DAD's are accessed using special functions
described below.  DAD's can be used to access non-local elements of
distributed arrays.
\item
If a value is being returned into a replicated argument, then all
copies must have identical values at the local subroutine(s) return.
\end{enumerate}

The execution of a local routine that fulfills these rules is semantically
equivalent to the execution of an HPF routine, assuming that HPF routines can
query the current distribution of its arguments.
\section{Query and access functions}

These functions can be called from local routines to access and manipulate
HPF arrays.  The list below is not exhaustive.   The first
argument in each call is a DAD.

\begin{verbatim}

    arank(a)       rank of the matrix
    alow(a,k)      lower bound of k-th dimension of matrix
    ahigh(a,k)     upper bound of k-th dimension of matrix

    collapse(a,k)  whether the k-th dimension is collapsed or not

    prank(a)       rank of processor array pa that a is distributed to
    phigh(a,k)     number of processors in the k-th dimension of pa
    preplica(a,k)  whether the k-th dimension of pa is replicated
    pdist(a,k)     distribution type in the k-th dimension (blk,cyc)
    pparam(a,k)    distribution parameter (blk size etc)
    plist(a)       linear list of processors for this matrix

    offset(a,i1,i2,..)  if the (i1,i2,..)-th element of the matrix
                        is located in this processor, return its offset
                        into the data area, else return -1
    pnumber(a,i1,i2,..) processor number where (i1,i2,..)-th element
                        is located
    fetch(a,i1,i2,..,v) fetches (i1,i2,..)-th element of HPF matrix
                        and stores it in v (may involve communication
                        with another processor)
    store(a,i1,i2,..,v) storess the value of v into (i1,i2,...,)-th
                        element of HPF matrix (if element is replicated
                        value is stored in all copies -- may involve
                        communication with other processors)
\end{verbatim}

\section{Sample HPF Programs}

\subsection{Matrix multiplication}

\subsubsection{HPF Code}
\begin{verbatim}


C    The MATMULT routine computes C=A*B.  A copy of row A(I,*) and
C    column B(*,J) is broadcast to the processor that computes C(I,J) before
C    before the call to MATMULT.


        INTERFACE
           LOCAL   SUBROUTINE  MATMULT(A,B,C)
              REAL,  DIMENSION(:,:), INTENT(IN)  ::  A,B
              REAL,  DIMENSION(:,:), INTENT(OUT) ::  C
*HPF$         ALIGN A(I,J) WITH C(I,*)
*HPF$         ALIGN B(I,J) WITH C(*,J)
END SUBROUTINE  MATMULT
        END INTERFACE

   ..............
CALL MATMULT(A,B,C)
   ..............

\end{verbatim}

\subsubsection{Local Subroutine}

\begin{verbatim}


C    Each processor is passed 3 arrays of rank 2.  Assume that the global HPF
C    arrays A,B and C have dimensions LxM, MxN and LxN, respectively.  The
C    local array CC is (a copy of) a rectangular subarray of C.
C    Let I1,I2,...,Ir and J1,J2,...,Js be, respectively, the row and column
C    indices of this subarray at a processor.  Then AA is (a copy of) the
C    subarray of A with row indices I1,...,Ir and column indices 1,...,M; and BB
C    is (a copy of) the subarray of B with row indices 1,...,M and column
C    indices J1,...,Js.   C may be replicated, in which case copies of C(I,J)
C    will be consistently updated at various processors.

     SUBROUTINE  MATMULT(N,AA,DA,BB,DB,CC,DC)
     REAL AA(*,*), BB(*,*), CC(*,*)

C     loop uses local indices

     DO I=LBOUND(CC,1), UBOUND(CC,1)
        DO J=LBOUND(CC,2), UBOUND(CC,2)
           CC(I,J) = SUM(AA(I,:)*BB(:,J))
        END DO
     RETURN
     END

\end{verbatim}


\subsection{Sum Reduction}

\subsubsection{HPF Code}

\begin{verbatim}


C    The SREDUCE routine computes at each processor the sum of the local
C    elements of an array of rank 1.  It returns an array that consists of
C    one sum per processor.  The sum reduction is completed by reducing this
C    array of partial sums.  The function fails if the array is replicated.


        INTERFACE
           LOCAL REAL FUNCTION SREDUCE(A)
              REAL,  DIMENSION(:), INTENT(IN)  ::  A
           END FUNCTION SREDUCE
        END INTERFACE

   .................
TOTAL = SUM(SREDUCE(A))
   ..................

\end{verbatim}

\subsubsection{Local Subroutine}

\begin{verbatim}


     SUBROUTINE  SREDUCE(AA,DA)
     REAL AA(*)

     IF PREPLICA(DA,1)
           THEN CALL ERROR()
           ELSE RETURN(SUM(AA))
     END IF
     END

\end{verbatim}
\section{Miscellany}
\subsection{Common}

If common blocks are distributable in HPF then some mechanism need be provided
for Local routines to access the local slice of a distributed common block.
\end{document}

------- End of Forwarded Message


From @cunyvm.cuny.edu:SNIR@YKTVMV  Tue Jun 16 21:24:11 1992
Received: from rice.edu ([128.42.5.1]) by cs.rice.edu (AA14124); Tue, 16 Jun 92 21:24:11 CDT
Received: from CUNYVM.CUNY.EDU by rice.edu (AA21780); Tue, 16 Jun 92 21:23:25 CDT
Message-Id: <9206170223.AA21780@rice.edu>
Received: from YKTVMV by CUNYVM.CUNY.EDU (IBM VM SMTP V2R2) with BSMTP id 4965;
   Tue, 16 Jun 92 22:23:41 EDT
Date: Tue, 16 Jun 92 22:23:48 EDT
From: "Marc Snir" <SNIR%YKTVMV.bitnet@cunyvm.cuny.edu>
To: hpff-forall@rice.edu

%forall.tex
%Snir
\documentstyle[11pt]{article}
\pagestyle{plain}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin  .25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin   -.5in
\textwidth 6in
\textheight  9.0in


\title{Proposal for FORALL}
\author{Marc Snir}
%\date{   }


\begin{document}

\maketitle

\section{Overview}

This document expands on the proposals of Dave Loveman and Guy Steele for a
FORALL statement.  The definition of a single statement FORALL is taken,
almost verbatim, from Dave's proposal.  A new definition is given for block
FORALL.

\section{Single Statement FORALL}

The single statement forall
is used to specify an array assignment in terms of
array elements or array sections.  The array assignment may be
masked with a scalar logical expression.  Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsubsection{General Form of single statement FORALL}

\begin{verbatim}
forall-stmt          is FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ])
                          forall-assignment

forall-triplet-spec  is subscript-name = subscript : subscript [ : stride]
\end{verbatim}

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.\\

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\begin{verbatim}
forall-assignment    is array-element = expr
                     or array-section = expr
\end{verbatim}

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

Examples of element array assignments are:

\begin{verbatim}
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, A(I,J) .NE. 0.0) B(I,J) = 1.0 / A(I,J)
\end{verbatim}

\subsubsection{Interpretation of Single Statement FORALL}
\label{forall-single-interpret}
\begin{enumerate}
\item
The subscript and stride expressions in the {\it
forall-triplet-spec-list} are evaluated in any order.
For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]

\noindent
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name
$INT((m2 -m1 + m3) / m3) \leq 0$, the set of permitted values is empty.
The {\it Iteration Set} for the FORALL construct is
the Cartesian product of the set of permitted values for each subscript.

\item
The scalar mask is evaluated for each combination of subscript values in
the Iteration Set, in any order.   The Iteration Set is restricted to those
combinations of values for which the mask evaluates to true.
If the scalar mask expresion is omitted, it is as if it were
present with the value true.

\item
The rhs expr
in the {\it forall-assignment} is evaluated for each combination of subscript
values in the Iteration Set, in any order.

\item
The computed values are assigned
to the corresponding elements of the array being assigned
to.
\end{enumerate}

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.
The scope of the
subscript name is the FORALL statement itself.


\noindent
Constraint:  The evaluation of a {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not affect of be affected by the evaluation of any
other {\it subscript} or {\it stride}.

\noindent
Constraint:  The evaluation of the {\it scalar-mask-expression} for one
combination of subscript names
may not affect or be affected by the evaluation
of the {\it scalar-mask-expression} for another combination, nor can it
redefine any subscript name.

\noindent
Constraint:  The evaluation of the right-hand-side expression for
one combination of subscript names
may not affect or be affected by the evaluation
of the expression for another combination,
nor can it redefine any subscript name.

\subsubsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

\begin{verbatim}
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn [, mask] ) &
      a(e1,...,em) = rhs
\end{verbatim}

\noindent
is equivalent to the following standard Fortran 90 code:

\begin{verbatim}
!evaluate subscript and stride expressions in any order
templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
          ...
  END DO
END DO

!then evaluate the expr in the forall-assignment
!(and lhs subscripts)
!for all valid combinations of subscript names
!for which the scalar mask expression is true
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          !in any order
          tempe1(v1,v2,...,vn) = e1
            ...
          tempem(v1,v2,...,vn) = em
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
          ...
  END DO
END DO

!then perform the assignment of these values to
!the corresponding elements of the array being assigned to
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(tempe1(v1,v2,...,vn),...,tempem(v1,v2,...,vn)) = temprhs(v1,v2,...,vn)
        END IF
      END DO
          ...
  END DO
END DO
\end{verbatim}

\subsubsection{Consequences of the Definition of the FORALL Statement}

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  (This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Subscripts on the left hand side of a {\it forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\end{itemize}


\section{General FORALL Construct}

The FORALL construct is a generalization of the element array
assignment statement allowing multiple {\it forall-assignment}s to be
controlled by a single {\it forall-triplet-spec-list}.  The body of a
FORALL construct is an arbitrary sequence of executable constructs, possibly
including FORALL assignments.

The execution of a FORALL construct starts with the evaluation of the {\it
forall-triplet-spec-list} and {\it scalar-mask-expr}.
This yields the set of subscript values that binds each {\it
forall-assignment}
in the {\it forall-body}.  The constructs in the {\it forall-body} are then
executed sequentially.
When a {\it forall-assignment} is encountered then the assignment is
executed for all
combinations of combinations of subscript values defined by the {\it
forall-triplet-spec-list} and {\it scalar-mask-expr}.

Rule R215 for {\it executable-constructs} is extended to include the
{\it forall-construct}.


\subsection{Block FORALL}

\begin{verbatim}
forall-stmt          is FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ])
                          forall-block
                        END FORALL

forall-triplet-spec  is subscript-name = subscript : subscript [ : stride]
\end{verbatim}

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.\\

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

A {\it forall-block} can be an arbitrary sequence of executable constructs,
treated as a unit (R801).  It can also contain {forall-assignment}'s.

\begin{verbatim}
forall-assignment    is array-element = expr
                     or array-section = expr
\end{verbatim}

\noindent
Constraint: No statement in a {\it forall-block}, other than the
{\it forall-assignment}'s, may reference any of the {\it
forall-triplet-spec subscript-names}.

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.


Examples of the Block FORALL construct are:

\begin{verbatim}

FORALL(I=1:N)
   B(I,I) = 1
   DO J = 1, N
      A(I,J) = A(I,J-1) + A(I,J)
   END DO
END FORALL

FORALL(I=1:N, J=1:N, I.NE.J)
   IF(COND)
      THEN A(I,J) = I+J
      ELSE A(I,J) = I-J
   END IF
END FORALL
\end{verbatim}

\subsubsection{Interpretation of the Block FORALL Construct}


\begin{enumerate}
\item
An Iteration Set is computed, as described in
section~\ref{forall-single-interpret}.

\item
The {\it block-forall} is evaluated sequentially.  Whenever a
{\it forall-assignment} is encountered, then the assignment is executed
for the FORALL Iteration Set, as defined in
section~\ref{forall-single-interpret}.
\end{enumerate}

The evaluation of the {\it forall-triplet-spec-list} and {\it
scalar-mask-exp} and the separate evaluation of each {\it forall-assignment}
fulfils the constraints listed in section~\ref{forall-single-interpret}.

\subsubsection{Scalarization of the Block FORALL Statement}

A {\it forall-stmt} of the general form:

\begin{verbatim}
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn [, mask] ) &
      s1
      s2
      ...
      sn
END FORALL
\end{verbatim}

\noindent
is equivalent to the following standard Fortran 90 code:

\begin{verbatim}
!evaluate subscript and stride expressions in any order
templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
          ...
  END DO
END DO

!then evaluate body
S1
S2
...
Sn

\end{verbatim}

If {\tt si} is a {\it forall-assignment} of the form {\tt a(e1,...,em)=rhs}
then {\tt Si} is the code

\begin{verbatim}
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          !in any order
          tempe1(v1,v2,...,vn) = e1
            ...
          tempem(v1,v2,...,vn) = em
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
          ...
  END DO
END DO
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(tempe1(v1,v2,...,vn),...,tempem(v1,v2,...,vn)) = temprhs(v1,v2,...,vn)
        END IF
      END DO
          ...
  END DO
END DO
\end{verbatim}

If {\tt si} is not a {\it forall-assignment} then {\tt Si=si}

\subsubsection{Consequences of the Definition of the Block FORALL Statement}

Each construct in a block FORALL, including
{\it forall-asignment}s, may depend on preceeding constructs, including
{\it forall-assignment}s, and on the evaluation of the
{\it forall-triplet-spec-list} and the {\it mask-scalar-expr}.

\subsection{WHERE-ELSEWHERE statement}


\begin{verbatim}
forall-where-construct   is WHERE ( scalar-mask-expr )
                                [ forall-block ]
                             [ ELSEWHERE
                                 [ forall-block ] ]
                             END WHERE
\end{verbatim}

The {\it scalar-mask-expr} may use (but not modify) the forall subscripts.


\subsubsection{Interpretation of the WHERE Construct}

\begin{enumerate}
\item
The {\it scalar-mask-expr} is
evaluated for each combination of values in the Iteration Set, in arbitrary
order.
\item
The
WHERE {\it forall-block} is executed, with the Iteration Set restricted to those
combinations of
values for which the {\it scalar-mask-expr} is true.
\item
The ELSEWHERE {\it forall-block} is executed, with the Iteration Set restricted
to those combinations of values for which the {\it scalar-mask-expr} is false.
\end{enumerate}

Each construct within the scope of a WHERE (or ELSEWHERE) may depend on the
evaluation of the {\it scalar-mask-expr} in the WHERE statement.

A construct within the ELSEWHERE {\it forall-block} may depend on constructs on
the WHERE {\it forall-block}.   (This is somewhat strange, but consistent with
Fortran 90 semantics for WHERE.)

Example \\
The following program will zero array {\tt A(1:N)} ({\tt N} even).

\begin{verbatim}

FORALL(I=1:N)
   WHERE(MOD(I,2)=0)
       A(I) = 0
   ELSEWHERE
       A(I) = A(I-1)
    END WHERE
END FORALL

\end{verbatim}
\subsection{Nested FORALL}

FORALL constructs may be nested.  The Iteration Set for {\it forall-assignment}s
inside nested FORALLs is the direct product of the Iteration Set defined by each
FORALL.

\subsection{General Form of the FORALL Statement}


\begin{verbatim}
forall-stmt           is FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ])
                              forall-block
                         END FORALL

forall-triplet-spec     is subscript-name = subscript : subscript [ : stride]


forall-block            is forall-construct
                            [forall-block]

forall-construct        is executable-construct
                        or forall-stmt
                        or forall-where-construct

forall-where-construct  is WHERE ( scalar-mask-expr )
                                [ forall-block ]
                             [ ELSEWHERE
                                 [ forall-block ] ]
                            END WHERE
\end{verbatim}

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.\\

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.\\

\noindent
Constraint: A {\it subscript-name} that occurs in a {\it
forall-triplet-spec-list} may be referenced inside the scope of the FORALL
statement only by the {\it scalar-mask-expr} of a FORALL or WHERE statement or
in an assignment of the form  {\tt array-element = expr} or
{\tt array-section = expr}.  A {\it subscript-name} may not be redefined
inside the scope of the FORALL. \\

Example \\

\begin{verbatim}

FORALL(I=1:N)
   WHERE(A(I).NE.0)
       A(I) =A(I)/ABS(A(I))
   END WHERE
   FORALL(J=1:N, J.NE.I)
       B(I,J) = A(I) + A(J)
   A(I) = SUM(B(I,1:N))
END FORALL

\end{verbatim}


\end{document}

From chk@erato.cs.rice.edu  Fri Jun 19 09:39:48 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA24539); Fri, 19 Jun 92 09:39:48 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA05594); Fri, 19 Jun 92 09:39:46 CDT
Message-Id: <9206191439.AA05594@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Word-Of-The-Day: versimilitude : (n) the quality or state of having the
	appearance of truth
Subject: Topics for discussion
Date: Fri, 19 Jun 92 09:39:44 -0500
From: chk@erato.cs.rice.edu


I'll be sending a somewhat longer message later today with technical
comments on the proposals on the floor.  This one is simply to remind
the members of this list that the FORALL group is presenting its
proposal at the next HPFF meeting, and we need to have our work in
good shape by then.  I'll take responsibility for pulling together the
overall draft, but I'm not going to impose my views on the world.

As I see it, we have 5 tasks to do for the next meeting:
	1. Define a single-statement FORALL
		including semantics and constraints for function calls
		from a FORALL assignment
	2. Define a multi-statement FORALL (or explicitly choose not
		to have one)
	3. Define an INDEPENDENT assertion / directive, telling the
		compiler there are no nasty dependences in a program region
	4. Define LOCAL subroutines
	5. Define an ON clause
These do not form a 1-1 correspondence to the proposals made thus far:
	Guy Steele, David Loveman, and Marc Snir have all submitted
		proposals for single- and multi-statment FORALLs.  I
		see convergence happening on single-statment FORALLs,
		but not on multi-statement.
	Min-You Wu has proposed a modification to Guy Steele's
		INDEPENDENT directive.  There hasn't been a lot of
		discussion on it (I'm in favor of the idea, but the
		syntax looks more like parallel regions than what we
		want).
	Guy Steele and Marc Snir have made rather different proposals
		on LOCAL subroutines.
	I still haven't written up my ON clause proposal, but
		hopefully by this weekend...
So we have our work cut out for us.

Logistically, here's what I propose:

	1. Take Marc's single-statment FORALL proposal as the working
document in that area.  The major issue left unaddressed there is
semantics of function calls (maybe not "unaddressed", but certainly
"unmentioned").
	2. Take Marc's multi-statement FORALL proposal as the working
document.  I've got a number of technical comments against this one (see
message later today), but it's as good a stake in the ground as any
other.
	3. Take the INDEPENDENT section of Min-You's proposal as the working
document.  Based on comments at the last meeting, I think we have to
discard the reductions proposed there.
	4. Take Marc Snir's LOCAL proposal as the working document
there.  If possible, pass the intrinsics he mentions to the intrinsics
group.
	5. Beat up on Chuck if he doesn't propose an ON clause by
Monday.

All proposals are, of course, open to comments and counter-proposals
by anybody.  We can start pulling the "official" drafts together after
some more discussion, say around July 10.

	Chuck

From chk@cs.rice.edu  Fri Jun 19 17:10:32 1992
Received: from rice.edu ([128.42.5.1]) by cs.rice.edu (AA08255); Fri, 19 Jun 92 17:10:32 CDT
Received: from cs.rice.edu by rice.edu (AA16160); Fri, 19 Jun 92 17:09:47 CDT
Received: from erato.cs.rice.edu by cs.rice.edu (AA08252); Fri, 19 Jun 92 17:10:28 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA05785); Fri, 19 Jun 92 17:10:23 CDT
Message-Id: <9206192210.AA05785@erato.cs.rice.edu>
To: "Marc Snir" <SNIR%YKTVMV.bitnet@cunyvm.cuny.edu>
Cc: hpff-forall@rice.edu
In-Reply-To: Your message of Tue, 16 Jun 92 22:23:48 -0400.
             <9206170223.AA21780@rice.edu> 
Date: Fri, 19 Jun 92 17:10:21 -0500
From: chk@cs.rice.edu


> Date: Tue, 16 Jun 92 22:23:48 EDT
> From: "Marc Snir" <SNIR%YKTVMV.bitnet@cunyvm.cuny.edu>
> To: hpff-forall@rice.edu
> Status: RO

First off, congratulations on getting a technical document out of the
IBM mail domain :-)

> %forall.tex
> %Snir
> \documentstyle[11pt]{article}
> 
> ...
>
> \subsubsection{Interpretation of Single Statement FORALL}
> \label{forall-single-interpret}
> \begin{enumerate}
> \item
> The subscript and stride expressions in the {\it
> forall-triplet-spec-list} are evaluated in any order.
> For each subscript name in the {\it forall-assignment}, the set of
> permitted values is determined on entry to the statement and is
> \[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]
> \noindent
> and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
> subscript, the second subscript, and the stride respectively in the
> {\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
> were present with a value of the integer 1.  The expression {\it
> stride} must not have the value 0.  If for some subscript name
> $INT((m2 -m1 + m3) / m3) \leq 0$, the set of permitted values is empty.
> The {\it Iteration Set} for the FORALL construct is
> the Cartesian product of the set of permitted values for each subscript.

Issue: This description makes it impossible to create a triangular
FORALL.  For example, I can easily imagine someone wanting to say

	FORALL ( I = 1:N, J = 1:I ) A(I,J) = 0.0

(zero out the lower triangle of an array).  You can get the desired
effect by using a mask expression, but that's inelegant (and I suspect
inefficient on most implementations).  Or you can use the INDEPENDENT
directive in a DO loop.

Questions: Are triangular loops important enough to include them as
FORALLs?  Will triangular FORALLs put an undue burden on the compiler
writers?

As this restriction appears on the CMFortran FORALL, perhaps the TMC
people can comment on whether users miss triangular loops.

The other interpretation steps look OK to me.

> Constraint:  The evaluation of a {\it subscript} or a {\it stride} in a {\it
> forall-triplet-spec} must not affect of be affected by the evaluation of any
                                       ^^ "or" not "of" (nitpick!)
> other {\it subscript} or {\it stride}.
> 
> Constraint:  The evaluation of the {\it scalar-mask-expression} for one
> combination of subscript names
                           ^^^^^
> may not affect or be affected by the evaluation
> of the {\it scalar-mask-expression} for another combination, nor can it
> redefine any subscript name.

Shouldn't this be "values"?  The expression will always reference the
same names, i.e. variables.

The "name" at the end is used corectly.

> Constraint:  The evaluation of the right-hand-side expression for
> one combination of subscript names
                               ^^^^^ "values" again, I argue
> may not affect or be affected by the evaluation
> of the expression for another combination,
> nor can it redefine any subscript name.

Don't we need two more constraints?

Constraint: The evaluation of an array subscript on the left-hand side for
one combination of subscript values may not affect nor be affected by
the evaluation of any right-hand-side expression, nor may it redefine
any subscript name.

Constraint: The evaluation of an array subscript on the left-hand side for
one combination of subscript values may not affect nor be affected by
the evaluation of any left-hand side subscript for any other
combination of subscript values.

(Marc's / Dave's constraint rules out RHS-RHS interference, but not
RHS-LHS or LHS-LHS interference.)

> \subsubsection{Consequences of the Definition of the FORALL Statement}
> 
> \begin{itemize}
> 
> \item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.
> 
> \item Each of the {\it subscript-name}s must appear within the
> subscript expression(s) on the left-hand-side.  (This is a syntactic
> consequence of the semantic rule that no two execution instances of the
> body may assign to the same array element.)
> 
> \item Subscripts on the left hand side of a {\it forall-assignment} are
> evaluated only for valid combinations of subscript names for which the
> scalar mask expression is true.
> 
> \end{itemize}

Another consequence worth noting:
Expressions on the right-hand side are evaluated only for combinations
of subscript values for which the mask is true.


Side issue:
Is anyone else confused by "subscript" meaning both "FORALL index" and
"subscript expression"?  Can anyone suggest a beter name?


> \section{General FORALL Construct}
> 
> The FORALL construct is a generalization of the element array
> assignment statement allowing multiple {\it forall-assignment}s to be
> controlled by a single {\it forall-triplet-spec-list}.  The body of a
> FORALL construct is an arbitrary sequence of executable constructs, possibly
> including FORALL assignments.
> 
> The execution of a FORALL construct starts with the evaluation of the {\it
> forall-triplet-spec-list} and {\it scalar-mask-expr}.
> This yields the set of subscript values that binds each {\it
> forall-assignment}
> in the {\it forall-body}.  The constructs in the {\it forall-body} are then
> executed sequentially.
> When a {\it forall-assignment} is encountered then the assignment is
> executed for all
> combinations of combinations of subscript values defined by the {\it
> forall-triplet-spec-list} and {\it scalar-mask-expr}.
> 
> Rule R215 for {\it executable-constructs} is extended to include the
> {\it forall-construct}.
> 
> 
> \subsection{Block FORALL}
> 
> \begin{verbatim}
> forall-stmt          is FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ])
>                           forall-block
>                         END FORALL
> 
> forall-triplet-spec  is subscript-name = subscript : subscript [ : stride]
> \end{verbatim}
> 
> \noindent
> Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.\\
> 
> \noindent
> Constraint:  A {\it subscript} or a {\it stride} in a {\it
> forall-triplet-spec} must not contain a reference to any {\it
> subscript-name} in the {\it forall-triplet-spec-list}.
> 
> A {\it forall-block} can be an arbitrary sequence of executable constructs,
> treated as a unit (R801).  It can also contain {forall-assignment}'s.
> 
> \begin{verbatim}
> forall-assignment    is array-element = expr
>                      or array-section = expr
> \end{verbatim}
> 
> \noindent
> Constraint: No statement in a {\it forall-block}, other than the
> {\it forall-assignment}'s, may reference any of the {\it
> forall-triplet-spec subscript-names}.
> 
> \noindent
> Constraint:  The {\it array-section} or {\it array-element} in a {\it
> forall-assignment} must reference all of the {\it forall-triplet-spec
> subscript-names}.

I think this is opening a very nasty can of worms.

Worm 1:
The BNF and text need to be fixed to state that only forall-assignment
is allowed in a forall-body.  (I assume this is what Marc means - if
not, then
	A(I) = B(i+1)
can be interpreted as either a regular assignment or a
forall-assignment, which apparently have different meanings (in the
scalarization, only forall-assignments are transformed).  Or, going
the other way,
	X = X + A(I)
is allowed in a forall-body, since it is syntactically not a
forall-assignment but otherwise a legal Fortran statement.)  

Worm 2:
Sort of a generalization on the last point - the grammar and
scalarization you give later don't seem to treat nested constructs
correctly.  Note that 
	IF ( SOME_CONDITION(N) ) THEN
	    A(I) = 1
	ELSE
	    A(I) = 1
	ENDIF
is, according to the Fortran 90 standard, one statement (which happens
to contain two other statements).  The statement 

> If {\tt si} is not a {\it forall-assignment} then {\tt Si=si}

(in the scalarization part) needs to explicitly rewrite any nested
statements.  Similar changes are required throughout the remaining
sections.

Worm 3:
The principle of least astonishment may need invocation here.  One
interesting consequence of Marc's proposal is that this is legal:
	FORALL ( I = 1:N )
	  IF ( N > 1 ) THEN
	    A(I) = 1.0
	  END IF
	END FORALL
but this is not legal:
	FORALL ( I = 1:N )
	  IF ( N > I ) THEN
	    A(I) = 1.0
	  END IF
	END FORALL
(Look carefully at the IF condition...  Marc does not allow the FORALL
index to be referenced in any statement except forall-assignments.)
The intent (I assume) is to force all iterations, and thus all
processors, down the same path at all branch points, and that sure
makes FORALL easier to define.  But it will make some useful
computations difficult to code:
	Iterating to convergence on each point of a grid
	Conditional assignment to points on a grid
	Having different formulas for different grid points
	Guy Steele's ragged array example (ALLOCATE different-sized blocks)
	Nested triangular loops or triangular FORALLs
	Boundary conditions
(Some of these limitations are solved by the WHERE construct.)
Question for users: Are you willing to live with these restrictions?

Worm 4:
If the statements in the forall-block can be arbitrary, we have to
define semantics for
	FORALL ( I = 1:N )
	    CALL FOO( A )	! How many times is FOO called?
				! Scalarization says 1, users will say N
	    OPEN(5,FILE='BAR')	! Parallel I/O, here we come
	    READ (5) B(1:100)
	    			! READ (5) B(I) is illegal, though
	    WRITE (5) C(1:5)
	    			! WRITE (5) C(I) is illegal
	    CLOSE (5)
	END FORALL
(Interestingly, I don't see any trouble from GOTO or STOP, since all
iterations must follow the same control flow.)  There may be other
problematic Fortran statements.

Worm 5:
Inlining a function or procedure may do quite unexpected things to its
meaning.  Of course, every other proposal has had this feature, and as
Guy pointed out "It's not fair to expect that inlining will be easy in
a language not designed for it."

Worm 6:
No other nested construct puts any restrictions on the statements
within its body, except for DO loops not allowing their index
variables to be redefined.  I argue that FORALLs (or other new
constructs) should not restrict their bodies either without good
reason.


Overall, I think the proposal went too far in generalizing the FORALL.
I'll make the following modest (and informal) counterproposal:
	Only "generalized assignment statements" (GASes) are allowed
	in a FORALL body.
	The following statements are GASes - assignment, WHERE,
	FORALL.
	Use Marc's semantics for interpreting GASes (no object
	assigned twice by one statement, assignments executed once for
	each element of the index space, GASes executed in sequence).
	I think that Marc's proposal still disallows triangular
	FORALLs; I propose lifting this restriction.
If there's interest, I'll write this up more formally.


	Chuck
	

From choudhar@cat.syr.edu  Sat Jun 20 17:32:42 1992
Received: from rice.edu ([128.42.5.1]) by cs.rice.edu (AA21254); Sat, 20 Jun 92 17:32:42 CDT
Received: from cat.syr.edu (peach.ece.syr.EDU) by rice.edu (AA20952); Sat, 20 Jun 92 17:31:55 CDT
Date: Sat, 20 Jun 92 18:32:31 EDT
From: choudhar@cat.syr.edu (Alok Choudhary)
Received: by cat.syr.edu (4.1/1.0-6/5/90)
	id AA10177; Sat, 20 Jun 92 18:32:31 EDT
Message-Id: <9206202232.AA10177@cat.syr.edu>
To: SNIR@YKTVMV.BITNET, chk@cs.rice.edu
Subject: FORALL
Cc: hpff-forall@rice.edu


The following are my comments on Marc's proposal and on subsequent
comments by Chuck.


>Issue: This description makes it impossible to create a triangular
>FORALL.  For example, I can easily imagine someone wanting to say

>	FORALL ( I = 1:N, J = 1:I ) A(I,J) = 0.0

>(zero out the lower triangle of an array).  You can get the desired
>effect by using a mask expression, but that's inelegant (and I suspect
>inefficient on most implementations).  Or you can use the INDEPENDENT
>directive in a DO loop.

>Questions: Are triangular loops important enough to include them as
>FORALLs?  Will triangular FORALLs put an undue burden on the compiler
>writers?


The generalization may not stop at trianglular loop. For example, if
you allow one subscript expression to be a function of another, then
it can get very complicated.

 for example,

 What if you have

        FORALL (I = 1:N, J = f1(I):f2(I), K = f3(I,J) ...) A(I,J,K..) = 0.

 Not only can it become complicated, but it also imposes and order on the
evaluation of subscript values. Therefore, I would go along with what
Marc's proposal contains. Triangular loops can be easily done using
a DO INDEPENDENT LOOPS (DO LOOPS impose an ordering on the evaluation
of the loop indices whether or not the iterations are independent).

 Furthermore, in computations like these, the efficiency will depend
more on how data is distributed. Finally, for simple loops like
triangular loop it may not be (I hope) too difficult for a compiler
to recognize and optimize!


>Side issue:
>Is anyone else confused by "subscript" meaning both "FORALL index" and
>"subscript expression"?  Can anyone suggest a beter name?

 For FORALL Index we may use Iteration-set subscript (ISS) and for
"subscript expression" we may use Array Subscript (AS).


		RE: Marc's proposal on Block Forall

> \subsection{Block FORALL}
> 
> \begin{verbatim}
> forall-stmt          is FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ])
>                           forall-block
>                         END FORALL
> 
> forall-triplet-spec  is subscript-name = subscript : subscript [ : stride]
> \end{verbatim}
> 
> \noindent
> Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.\\
> 
> \noindent
> Constraint:  A {\it subscript} or a {\it stride} in a {\it
> forall-triplet-spec} must not contain a reference to any {\it
> subscript-name} in the {\it forall-triplet-spec-list}.
> 
> A {\it forall-block} can be an arbitrary sequence of executable constructs,
> treated as a unit (R801).  It can also contain {forall-assignment}'s.
> 
> \begin{verbatim}
> forall-assignment    is array-element = expr
>                      or array-section = expr
> \end{verbatim}
> 

> \noindent
> Constraint:  The {\it array-section} or {\it array-element} in a {\it
> forall-assignment} must reference all of the {\it forall-triplet-spec
> subscript-names}.


 I am not clear about the block statement. My understanding is, as
Chuck also points out, that if a statment is not an assignment,
then the usual definition applies.

 Consider tha following example.

 Let's say A is an array containing 0,1,0,1,0,1 ....(alternating between
0 and 1). What we want to do is to change 0 to 1 and 1 to 0.
(something like making background the foreground and viceversa).

 Is it accomplished by

   FORALL(I=1:N)

    IF(A(I) .EQ. 0)
	THEN A(I)=1
	ELSE A(I)=0
    END IF
   END FORALL

Since, all the iterations are independent, each statement (IF-THEN-ELSE being
one statement) is supposedly extecuted in parallel.

My understanding is that the above statement should do the job.

BUT IS THE FOLLOWING CONSTRAINT IS MARC's PROPOSAL VIOLATED? My interpretation
of the constraint is that one may not use conditionals on the
iteration subscripts themselves, but one may use it index an
array on which conditional is based. NOTE THE DIFFERENCE. "IF (A(I) .EQ.0)"
is a content based conditional and not index value based conditional.

> \noindent
> Constraint: No statement in a {\it forall-block}, other than the
> {\it forall-assignment}'s, may reference any of the {\it
> forall-triplet-spec subscript-names}.
> 

NOTE: A WHERE-ELSEWHERE statment will not do the job because assignment in
the WHERE part can affect the evaluation in ELSEWHERE PART,

 i.e., the following will change the array A to all 0s.

    WHERE (A .EQ. 0)
	A = 1
    ELSEWHERE
	A = 0
Since array contains only 0s and 1s, all 0s will be converted to 1s by
the WHERE Part, making evaluation of ELSEWHERE true for all elements,
making the entire Array 0.

 The above task cannot be accomplished by WHERE statment unless
temporaries are used. This strange meaning of WHERE in F90 is likely
to produce lot of buggy programs.


		CHUCK's COMMENTS

>I think this is opening a very nasty can of worms.

>Worm 1:
>The BNF and text need to be fixed to state that only forall-assignment
>is allowed in a forall-body.  (I assume this is what Marc means - if
>not, then
>	A(I) = B(i+1)
>can be interpreted as either a regular assignment or a
>forall-assignment, which apparently have different meanings (in the
>scalarization, only forall-assignments are transformed).  Or, going
>the other way,
>	X = X + A(I)
>is allowed in a forall-body, since it is syntactically not a
>forall-assignment but otherwise a legal Fortran statement.)  


I agree with Chuck here. Any assignment inside a FORALL should be a FORALL
assignment, meaning that any statment having more than one
reference to the same array location  on the LHS is illegal.


>Worm 3:
>The principle of least astonishment may need invocation here.  One
>interesting consequence of Marc's proposal is that this is legal:
>	FORALL ( I = 1:N )
>	  IF ( N > 1 ) THEN
>	    A(I) = 1.0
>	  END IF
>	END FORALL
>but this is not legal:
>	FORALL ( I = 1:N )
>	  IF ( N > I ) THEN
>	    A(I) = 1.0
>	  END IF
>	END FORALL
>(Look carefully at the IF condition...  Marc does not allow the FORALL
>index to be referenced in any statement except forall-assignments.)
>The intent (I assume) is to force all iterations, and thus all
>processors, down the same path at all branch points, and that sure
>makes FORALL easier to define.  But it will make some useful
>computations difficult to code:
>	Iterating to convergence on each point of a grid
>	Conditional assignment to points on a grid
>	Having different formulas for different grid points
>	Guy Steele's ragged array example (ALLOCATE different-sized blocks)
>	Nested triangular loops or triangular FORALLs
>	Boundary conditions


>(Some of these limitations are solved by the WHERE construct.)

I agree with Chuck, although I do not know how a WHERE construct can
solve the problem, because WHERE does not allow explicit reference to
index values.


>>Worm 4:
>If the statements in the forall-block can be arbitrary, we have to
>define semantics for
>	FORALL ( I = 1:N )
>	    CALL FOO( A )	! How many times is FOO called?
>				! Scalarization says 1, users will say N
>	    OPEN(5,FILE='BAR')	! Parallel I/O, here we come
>	    READ (5) B(1:100)
>	    			! READ (5) B(I) is illegal, though
>	    WRITE (5) C(1:5)
>	    			! WRITE (5) C(I) is illegal
>	    CLOSE (5)
>	END FORALL
>(Interestingly, I don't see any trouble from GOTO or STOP, since all
>iterations must follow the same control flow.)  There may be other
>problematic Fortran statements.

FOO is called N times. 

I do not know why would one put OPEN inside a FORALL, if opening the same
file (but is it an error to try to OPEN a file already opened?? same for CLOSE)

I am not clear why READ above is illegal?

Since inside the FORALL each element of C may be assigned only once
(per statement), The WRITE will not produce an erroneous result, although
it may produce several unnecessary writes.

>Worm 6:
>No other nested construct puts any restrictions on the statements
>within its body, except for DO loops not allowing their index
>variables to be redefined.  I argue that FORALLs (or other new
>constructs) should not restrict their bodies either without good
>reason.

I agree

>Overall, I think the proposal went too far in generalizing the FORALL.
>I'll make the following modest (and informal) counterproposal:
>	Only "generalized assignment statements" (GASes) are allowed
>	in a FORALL body.
>	The following statements are GASes - assignment, WHERE,
>	FORALL.

  I do not see a need for a WHERE inside a Forall if "IF" statement
is allowed

 Anyway, I am not sure if WHERE Construct allows MASKS to be specified
the way they have been used in Marc's proposal. I looked up
the F90 documents, but did not find anything explicitly prohibiting it,
nor did I ever find an example using it also.

 For example, the following illustration used in Marc's proposal
uses MOD(I,2) inside the WHERE Mask Expression. I am not sure
if it is allowed.

PLEASE CHECK.

FORALL(I=1:N)
   WHERE(MOD(I,2)=0)
       A(I) = 0
   ELSEWHERE
       A(I) = A(I-1)
    END WHERE
END FORALL

	
Alok Choudhary


 Alok Choudhary
 Assistant Professor
 ECE Dept., 121 Link Hall
 Syracuse University
 Syracuse, NY 13244
 (315)-443-4280
 Fax: (315)-443-2583
 choudhar@cat.syr.edu


From chk@cs.rice.edu  Mon Jun 22 11:05:22 1992
Received: from rice.edu ([128.42.5.1]) by cs.rice.edu (AA09321); Mon, 22 Jun 92 11:05:22 CDT
Received: from cs.rice.edu by rice.edu (AA29316); Mon, 22 Jun 92 11:04:37 CDT
Received: from erato.cs.rice.edu by cs.rice.edu (AA09318); Mon, 22 Jun 92 11:05:17 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA07077); Mon, 22 Jun 92 11:05:14 CDT
Message-Id: <9206221605.AA07077@erato.cs.rice.edu>
To: choudhar@cat.syr.edu (Alok Choudhary)
Cc: hpff-forall@rice.edu
Subject: Re: FORALL 
In-Reply-To: Your message of Sat, 20 Jun 92 18:32:31 -0400.
             <9206202232.AA10177@cat.syr.edu> 
Date: Mon, 22 Jun 92 11:05:12 -0500
From: chk@cs.rice.edu


> Date: Sat, 20 Jun 92 18:32:31 EDT
> From: choudhar@cat.syr.edu (Alok Choudhary)
> Subject: FORALL
> 
> The following are my comments on Marc's proposal and on subsequent
> comments by Chuck.

Liberally edited by me.

> >Issue: This description makes it impossible to create a triangular
> >FORALL.  ...
> >Questions: Are triangular loops important enough to include them as
> >FORALLs?  Will triangular FORALLs put an undue burden on the compiler
> >writers?
> 
> The generalization may not stop at trianglular loop. For example, if
> you allow one subscript expression to be a function of another, then
> it can get very complicated.
>  for example,  What if you have
>         FORALL (I = 1:N, J = f1(I):f2(I), K = f3(I,J) ...) A(I,J,K..) = 0.
>  Not only can it become complicated, but it also imposes and order on the
> evaluation of subscript values. Therefore, I would go along with what
> Marc's proposal contains. Triangular loops can be easily done using
> a DO INDEPENDENT LOOPS (DO LOOPS impose an ordering on the evaluation
> of the loop indices whether or not the iterations are independent).
> 
>  Furthermore, in computations like these, the efficiency will depend
> more on how data is distributed. Finally, for simple loops like
> triangular loop it may not be (I hope) too difficult for a compiler
> to recognize and optimize!

I'm not going to take a strong stand for or against triangular
FORALLs.  I just want to make a couple points:
    1. As far as I can tell, the only way to express them now is with
	nested DO loops and INDEPENDENT directives.
    2. DO INDEPENDENT has a different semantics from FORALL (although
	not from FORALL INDEPENDENT); therefore, we really are losing
	some sort of expressiveness by dropping the triangular FORALL.
    3. My comments were not directed toward efficiency, but rather
	toward expressing certain computations.  If you can't express
	a computation, it doesn't matter how efficient the compiler
	is...

I'm a little confused about Alok's comment re: compilers recognizing
triangular loops.  Sure, that's an easy pattern match.  The hard part,
though, is recognizing a parallel loop of any shape.  That is,
dependence analysis is just as hard whether the loop is triangular or
square (more or less; arguments over how precise triangular Banerjee's
inequalities are can go elsewhere).  Alok, why do you think triangular
loops are easy to optimize, but we still need FORALL for other loops?

> >Side issue:
> >Is anyone else confused by "subscript" meaning both "FORALL index" and
> >"subscript expression"?  Can anyone suggest a beter name?
> 
>  For FORALL Index we may use Iteration-set subscript (ISS) and for
> "subscript expression" we may use Array Subscript (AS).

Minor problem - the Fortran 90 standard uses "subscript" all over the
place (and uses it consistently).  I don't like switching terminology
for one chapter of the Fortran 2001 standard.

"FORALL index" sounds like just what we want, though.  I'll check how
this fits with DO loop terminology.

> 		RE: Marc's proposal on Block Forall
> ...
>  Let's say A is an array containing 0,1,0,1,0,1 ....(alternating between
> 0 and 1). What we want to do is to change 0 to 1 and 1 to 0.
> (something like making background the foreground and viceversa).
>    FORALL(I=1:N)
>     IF(A(I) .EQ. 0)
> 	THEN A(I)=1
> 	ELSE A(I)=0
>     END IF
>    END FORALL
> Since, all the iterations are independent, each statement (IF-THEN-ELSE being
> one statement) is supposedly extecuted in parallel.
> 
> My understanding is that the above statement should do the job.
> 
> BUT IS THE FOLLOWING CONSTRAINT IS MARC's PROPOSAL VIOLATED? My interpretation
> of the constraint is that one may not use conditionals on the
> iteration subscripts themselves, but one may use it index an
> array on which conditional is based. NOTE THE DIFFERENCE. "IF (A(I) .EQ.0)"
> is a content based conditional and not index value based conditional.

Marc, please comment on how you intended the constraint to be
interpreted.  Also, a word on why you picked those constraints would
be helpful; I fear Alok and I are thinking about different problems.

Alok -
Consider Marc's scalarization applied to the following content-based
conditional:
	FORALL ( I = 1:N )
	  IF ( A(I) < 0.0 ) THEN
	    A(I) = 0.0
	  ELSE
	    STOP
	  END IF
	END FORALL
Why do you think that content-based conditionals create fewer problems
than index value based conditionals?  I can simulate index-based
conditionals by defining an array TEMP that TEMP(I) = I, then using
TEMP in all conditions.

> 
> 		CHUCK's COMMENTS
> 
> >I think this is opening a very nasty can of worms.
> 
> >Worm 1:
> >The BNF and text need to be fixed to state that only forall-assignment
> >is allowed in a forall-body.
> >...
> 
> I agree with Chuck here. Any assignment inside a FORALL should be a FORALL
> assignment, meaning that any statment having more than one
> reference to the same array location  on the LHS is illegal.

Also having a scalar on the LHS should be illegal.  In Marc's
proposal, this is a syntax error.

> >Worm 3:
> >...
> >The intent (I assume) is to force all iterations, and thus all
> >processors, down the same path at all branch points, and that sure
> >makes FORALL easier to define.  But it will make some useful
> >computations difficult to code:
> >	Iterating to convergence on each point of a grid
> >	Conditional assignment to points on a grid
> >	Having different formulas for different grid points
> >	Guy Steele's ragged array example (ALLOCATE different-sized blocks)
> >	Nested triangular loops or triangular FORALLs
> >	Boundary conditions
> >(Some of these limitations are solved by the WHERE construct.)
> 
> I agree with Chuck, although I do not know how a WHERE construct can
> solve the problem, because WHERE does not allow explicit reference to
> index values.

I thought that WHERE could reference index values - this is stated
right under the BNF for forall-where-construct (section 3.2 of Marc's
proposal).  This seems in conflict with a constraint in section 3.1
("No statement in a forall-block, other than the forall-assignments,
may reference any of the forall-triplet-spec subscript-names"),
though.  Marc, can you debug this definition?

A WHERE that can't reference the index values seems pretty useless to
me - how can you mask an array element if you can't refer to its
subscripts?

> 
> >>Worm 4:
> >	FORALL ( I = 1:N )
> >	    CALL FOO( A )	! How many times is FOO called?
> >				! Scalarization says 1, users will say N
> >	    OPEN(5,FILE='BAR')	! Parallel I/O, here we come
> >	    READ (5) B(1:100)
> >	    			! READ (5) B(I) is illegal, though
> >	    WRITE (5) C(1:5)
> >	    			! WRITE (5) C(I) is illegal
> >	    CLOSE (5)
> >	END FORALL
> FOO is called N times. 

That's my expectation, too, *but not what the scalarization gives*.
Remember, we're commenting on the proposal, not what we wish the
proposal was.  Anybody want to tackle modifying the proposal to give
the "right" semantics to CALL?

> I do not know why would one put OPEN inside a FORALL, if opening the same
> file (but is it an error to try to OPEN a file already opened?? same for CLOSE)

I suspect it would be a bug, too, *but if we allow arbitrary nested
statements we have to define what they mean*.  According to the
scalarization, the OPEN would only be performed once.

> I am not clear why READ above is illegal?

"No statement in a forall-block, other than the forall-assignments,
may reference any of the forall-triplet-spec subscript-names."

> Since inside the FORALL each element of C may be assigned only once
> (per statement), The WRITE will not produce an erroneous result, although
> it may produce several unnecessary writes.

The current scalarization produces only one write.  Is this what we want?

> >Overall, I think the proposal went too far in generalizing the FORALL.
> >I'll make the following modest (and informal) counterproposal:
> >	Only "generalized assignment statements" (GASes) are allowed
> >	in a FORALL body.
> >	The following statements are GASes - assignment, WHERE,
> >	FORALL.
>   I do not see a need for a WHERE inside a Forall if "IF" statement
> is allowed

Read my proposal again.  I'm saying IF is not allowed (nor are DO,
WHILE, OPEN, CLOSE, READ, WRITE, ALLOCATE, ...).  Only three types of
statements: assignment, WHERE, FORALL.

>  Anyway, I am not sure if WHERE Construct allows MASKS to be specified
> the way they have been used in Marc's proposal. I looked up
> the F90 documents, but did not find anything explicitly prohibiting it,
> nor did I ever find an example using it also.

My interpretation is that Marc is defining a new form of WHERE, for
use only in FORALLs.  (Marc, please comment.)

It's a subtle point, but the WHERE in Marc's example

> FORALL(I=1:N)
>    WHERE(MOD(I,2)=0)
>        A(I) = 0
>    ELSEWHERE
>        A(I) = A(I-1)
>     END WHERE
> END FORALL

is not legal Fortran 90.  Quoting section 7.5.3.1 of the standard,
where WHERE is defined,
	Constraint: In each assignment-stmt, the mask-expr and the
	variable being defined must be arrays of the same shape.
So, the mask-expr (in the example, MOD(I,2)=0) must be an array.
Scalars are not arrays in Fortran 90 (language purists can compare
this to APL).

We also saw an example of this in the last meeting, when Pres
suggested his X = SUM(A(I)) syntax in FORALL loops; again, the
difference between scalar arguments and array arguments to SUM was
important in defining the semantics.

Questions for the group: Do we wish to extend WHERE (or any other
statement or intrinsic) with a special form for use in the FORALL
statement?  Do we want to extend WHERE (or any other array statement
or intrinsic) to apply to scalars, regardless of whether it's in a
FORALL or not?

Marc's proposal seems like something we want to be able to
do, but I think the principle of minimal changes applies here.

> Alok Choudhary
> 

	Chuck

From gls@think.com  Mon Jun 22 11:13:12 1992
Received: from rice.edu ([128.42.5.1]) by cs.rice.edu (AA09699); Mon, 22 Jun 92 11:13:12 CDT
Received: from mail.think.com by rice.edu (AA29396); Mon, 22 Jun 92 11:12:27 CDT
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Mon, 22 Jun 92 12:12:51 -0400
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.2)
	id AA22681; Mon, 22 Jun 92 12:12:56 EDT
Date: Mon, 22 Jun 92 12:12:56 EDT
Message-Id: <9206221612.AA22681@strident.think.com>
To: choudhar@cat.syr.edu
Cc: SNIR@YKTVMV.BITNET, chk@cs.rice.edu, hpff-forall@rice.edu
In-Reply-To: Alok Choudhary's message of Sat, 20 Jun 92 18:32:31 EDT <9206202232.AA10177@cat.syr.edu>
Subject: FORALL

   Date: Sat, 20 Jun 92 18:32:31 EDT
   From: choudhar@cat.syr.edu (Alok Choudhary)
   ...
   NOTE: A WHERE-ELSEWHERE statment will not do the job because assignment in
   the WHERE part can affect the evaluation in ELSEWHERE PART,

    i.e., the following will change the array A to all 0s.

       WHERE (A .EQ. 0)
	   A = 1
       ELSEWHERE
	   A = 0
   Since array contains only 0s and 1s, all 0s will be converted to 1s by
   the WHERE Part, making evaluation of ELSEWHERE true for all elements,
   making the entire Array 0.

No, that is not what happens.  While assignments in the WHERE part
indeed can affect the behavior of the ELSEWHERE part, the WHERE mask
is *not* re-evaluated before execution of the ELSEWHERE part; that is,
the condition must be "remembered" in a manner not affected by
execution of the ELSEWHERE part.  Thus, if TEMP is a LOGICAL array
of appropriate size used nowhere else in the program, then

      WHERE (A .EQ. 0)
	A = 1
      ELSEWHERE
	A = 0
      END WHERE

means exactly the same thing as

      TEMP = (A .EQ. 0) 
      WHERE (TEMP)
	A = 1
      END WHERE
      WHERE (.NOT. TEMP)
	A = 0
      END WHERE

This is entirely analogous to the fact that

      IF (A .EQ. 0) THEN
	A = 1
      ELSE
	A = 0
      END 	

means exactly the same thing as

      TEMP = (A .EQ. 0)
      IF (TEMP) THEN
	A = 1
      END IF
      IF (.NOT. TEMP) THEN
	A = 0
      END IF

assuming TEMP is a LOGICAL scalar variable used nowhere else in the program.

--Guy Steele

From choudhar@cat.syr.edu  Mon Jun 22 12:03:41 1992
Received: from rice.edu ([128.42.5.1]) by cs.rice.edu (AA11170); Mon, 22 Jun 92 12:03:41 CDT
Received: from cat.syr.edu (peach.ece.syr.EDU) by rice.edu (AA29841); Mon, 22 Jun 92 12:02:55 CDT
Date: Mon, 22 Jun 92 13:03:23 EDT
From: choudhar@cat.syr.edu (Alok Choudhary)
Received: by cat.syr.edu (4.1/1.0-6/5/90)
	id AA10633; Mon, 22 Jun 92 13:03:23 EDT
Message-Id: <9206221703.AA10633@cat.syr.edu>
To: chk@cs.rice.edu
Cc: hpff-forall@rice.edu
In-Reply-To: chk@cs.rice.edu's message of Mon, 22 Jun 92 11:05:12 -0500 <9206221605.AA07077@erato.cs.rice.edu>
Subject: FORALL 


>I'm not going to take a strong stand for or against triangular
>FORALLs.  I just want to make a couple points:
>    1. As far as I can tell, the only way to express them now is with
>	nested DO loops and INDEPENDENT directives.
>    2. DO INDEPENDENT has a different semantics from FORALL (although
>	not from FORALL INDEPENDENT); therefore, we really are losing
>	some sort of expressiveness by dropping the triangular FORALL.
>    3. My comments were not directed toward efficiency, but rather
>	toward expressing certain computations.  If you can't express
>	a computation, it doesn't matter how efficient the compiler
>	is...

You can still specify the computation using 

    FORALL(I=1:N, J=1:N, J<=I) A(I,J) = 0.

The difference is that indices are independently specified and condition
is moved to the mask expression of forall. This allows one to
evaluate the mask for each FORALL index independent of the others
without imposing an order on the evaluation.


>I'm a little confused about Alok's comment re: compilers recognizing
>triangular loops.  Sure, that's an easy pattern match.  The hard part,
>though, is recognizing a parallel loop of any shape.  That is,
>dependence analysis is just as hard whether the loop is triangular or
>square (more or less; arguments over how precise triangular Banerjee's
>inequalities are can go elsewhere).  Alok, why do you think triangular
>loops are easy to optimize, but we still need FORALL for other loops?

 What I meant was is also shoem in the above example. That is,
using the mask expression, one may still be able to recognize
that it is triangular (or any other) loop, if sequentialized. How
hard it is for a general case, I do not know.


>> 		RE: Marc's proposal on Block Forall
>> ...
>>  Let's say A is an array containing 0,1,0,1,0,1 ....(alternating between
>> 0 and 1). What we want to do is to change 0 to 1 and 1 to 0.
>> (something like making background the foreground and viceversa).
>>    FORALL(I=1:N)
>>     IF(A(I) .EQ. 0)
>> 	THEN A(I)=1
>> 	ELSE A(I)=0
>>     END IF
>>    END FORALL
>> Since, all the iterations are independent, each statement (IF-THEN-ELSE being
>> one statement) is supposedly extecuted in parallel.
>> 
>> My understanding is that the above statement should do the job.
>> 
>> BUT IS THE FOLLOWING CONSTRAINT IS MARC's PROPOSAL VIOLATED? My interpretation
>> of the constraint is that one may not use conditionals on the
>> iteration subscripts themselves, but one may use it index an
>> array on which conditional is based. NOTE THE DIFFERENCE. "IF (A(I) .EQ.0)"
>> is a content based conditional and not index value based conditional.

>Marc, please comment on how you intended the constraint to be
>interpreted.  Also, a word on why you picked those constraints would
>be helpful; I fear Alok and I are thinking about different problems.

My interpretation on the constraint is that one may not use the
FORALL INDEX subscripts inside a FORALL except for using them
as a part of an array being assigned. Since, A(I) is part of
an assignment, I can use A(I) to evaluate the conditional. I guess,
Marc may be able to say if the above example is legal or illegal
according to his proposal.


>Alok -
>Consider Marc's scalarization applied to the following content-based
>conditional:
>	FORALL ( I = 1:N )
>	  IF ( A(I) < 0.0 ) THEN
>	    A(I) = 0.0
>	  ELSE
>	    STOP
>	  END IF
>	END FORALL
>Why do you think that content-based conditionals create fewer problems
>than index value based conditionals? That is my interpretation of
what is allowed and what is not allowed.

Chuck,
I am not saying that content-based conditionals create fewer (or
greater for that matter) problems.

> I can simulate index-based
>conditionals by defining an array TEMP that TEMP(I) = I, then using
>TEMP in all conditions.

True. your example of simulating index-based using content-based
will do the job of evaluating the conditional.

> >Worm 3:
> >...
> >The intent (I assume) is to force all iterations, and thus all
> >processors, down the same path at all branch points, and that sure
> >makes FORALL easier to define.  But it will make some useful
> >computations difficult to code:
> >	Iterating to convergence on each point of a grid
> >	Conditional assignment to points on a grid
> >	Having different formulas for different grid points
> >	Guy Steele's ragged array example (ALLOCATE different-sized blocks)
> >	Nested triangular loops or triangular FORALLs
> >	Boundary conditions
> >(Some of these limitations are solved by the WHERE construct.)
> 
> I agree with Chuck, although I do not know how a WHERE construct can
> solve the problem, because WHERE does not allow explicit reference to
> index values.

>I thought that WHERE could reference index values - this is stated
>right under the BNF for forall-where-construct (section 3.2 of Marc's
>proposal).  This seems in conflict with a constraint in section 3.1
>("No statement in a forall-block, other than the forall-assignments,
>may reference any of the forall-triplet-spec subscript-names"),
>though.  Marc, can you debug this definition?

>A WHERE that can't reference the index values seems pretty useless to
>me - how can you mask an array element if you can't refer to its
>subscripts?

According to F90, a mask's shape in WHERE should conform to the array
assigment inside the WHERE. DOES  a WHERE inside FORALL have a
different definition (meaning)??

>We also saw an example of this in the last meeting, when Pres
>suggested his X = SUM(A(I)) syntax in FORALL loops; again, the
>difference between scalar arguments and array arguments to SUM was
>important in defining the semantics.

If it is interpreted as a SCAN OPERATION then of course it is 
incorrect. That is, if SUM(A(I)) is considered as a sum of
all numbers from the lower limit of the FORALL UP to I for all 1<=I<=N.

But we may define the meaning of reduction functions inside
a FORALL if a unique answer is obtained. Need more discussion.

>Questions for the group: Do we wish to extend WHERE (or any other
>statement or intrinsic) with a special form for use in the FORALL
>statement?  Do we want to extend WHERE (or any other array statement
>or intrinsic) to apply to scalars, regardless of whether it's in a
>FORALL or not?

I would go for allowing WHERE inside as long as its meaning remains
the same as that defined in F90. Otherwise, it may create a lot
of confusion.


Alok Choudhary


From chk@cs.rice.edu  Mon Jun 22 13:08:20 1992
Received: from rice.edu ([128.42.5.1]) by cs.rice.edu (AA12526); Mon, 22 Jun 92 13:08:20 CDT
Received: from cs.rice.edu by rice.edu (AA00408); Mon, 22 Jun 92 13:07:36 CDT
Received: from erato.cs.rice.edu by cs.rice.edu (AA12523); Mon, 22 Jun 92 13:08:15 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA07158); Mon, 22 Jun 92 13:08:12 CDT
Message-Id: <9206221808.AA07158@erato.cs.rice.edu>
To: choudhar@cat.syr.edu (Alok Choudhary)
Cc: hpff-forall@rice.edu
Subject: Re: FORALL 
In-Reply-To: Your message of Mon, 22 Jun 92 13:03:23 -0400.
             <9206221703.AA10633@cat.syr.edu> 
Date: Mon, 22 Jun 92 13:08:10 -0500
From: chk@cs.rice.edu


> From: choudhar@cat.syr.edu (Alok Choudhary)
> Subject: FORALL 
> 
> >I'm not going to take a strong stand for or against triangular
> >FORALLs.  I just want to make a couple points:
> > ...
> 
> You can still specify the computation using 
> 
>     FORALL(I=1:N, J=1:N, J<=I) A(I,J) = 0.
> 
> The difference is that indices are independently specified and condition
> is moved to the mask expression of forall. This allows one to
> evaluate the mask for each FORALL index independent of the others
> without imposing an order on the evaluation.
> 
> >I'm a little confused about Alok's comment re: compilers recognizing
> >triangular loops. ...
> 
>  What I meant was is also shoem in the above example. That is,
> using the mask expression, one may still be able to recognize
> that it is triangular (or any other) loop, if sequentialized. How
> hard it is for a general case, I do not know.

OK, now I see what you mean.  It appears that any
triangular/trapezoidal/etc loop can be converted into a mask
expression; efficient translation seems possible for at least some
cases.  I withdraw my claim of non-expressibility.  Unless some other
users want to take up the triangular loop fight, Marc's proposal
stands as is on this point.

> >> 		RE: Marc's proposal on Block Forall
> 
> >Marc, please comment on how you intended the constraint to be
> >interpreted.  Also, a word on why you picked those constraints would
> >be helpful; I fear Alok and I are thinking about different problems.
> 
> My interpretation on the constraint is that one may not use the
> FORALL INDEX subscripts inside a FORALL except for using them
> as a part of an array being assigned. Since, A(I) is part of
> an assignment, I can use A(I) to evaluate the conditional. I guess,
> Marc may be able to say if the above example is legal or illegal
> according to his proposal.

I'll hold off on discussing this until Marc tells us what he meant.
Whatever the interpretation, some rewording or extra English
explanation is probably needed to remove this sort of confusion.

> I am not saying that content-based conditionals create fewer (or
> greater for that matter) problems.

I guess I'm confused over why you draw the distinction in the first
place, then.  What is the difference used for?

> > > ...
> > >(Some of these limitations are solved by the WHERE construct.)
> > 
> > I agree with Chuck, although I do not know how a WHERE construct can
> > solve the problem, because WHERE does not allow explicit reference to
> > index values.
> 
> >I thought that WHERE could reference index values - this is stated
> >right under the BNF for forall-where-construct (section 3.2 of Marc's
> >proposal).  This seems in conflict with a constraint in section 3.1
> >("No statement in a forall-block, other than the forall-assignments,
> >may reference any of the forall-triplet-spec subscript-names"),
> >though.  Marc, can you debug this definition? ...
> 
> According to F90, a mask's shape in WHERE should conform to the array
> assigment inside the WHERE. DOES  a WHERE inside FORALL have a
> different definition (meaning)??

Again, I'll wait for Marc's response.

> >We also saw an example of this in the last meeting, when Pres
> >suggested his X = SUM(A(I)) syntax in FORALL loops; again, the
> >difference between scalar arguments and array arguments to SUM was
> >important in defining the semantics.
> 
> If it is interpreted as a SCAN OPERATION then of course it is 
> incorrect. That is, if SUM(A(I)) is considered as a sum of
> all numbers from the lower limit of the FORALL UP to I for all 1<=I<=N.
> 
> But we may define the meaning of reduction functions inside
> a FORALL if a unique answer is obtained. Need more discussion.

Avoiding sidetrack discussion: I meant this as an example of what was
being done, not as a proposal for extending/changing FORALL.  I
believe that the straw polls taken at the last meeting strongly advise
us to not consider reductions in FORALL and DO INDEPENDENT.

> >Questions for the group: Do we wish to extend WHERE (or any other
> >statement or intrinsic) with a special form for use in the FORALL
> >statement?  Do we want to extend WHERE (or any other array statement
> >or intrinsic) to apply to scalars, regardless of whether it's in a
> >FORALL or not?
> 
> I would go for allowing WHERE inside as long as its meaning remains
> the same as that defined in F90. Otherwise, it may create a lot
> of confusion.
> 
> 
> Alok Choudhary
> 
> 

Well said.  I think that Marc's semantics extend the meaning of WHERE
in an obvious way (to scalar masks and assignments).  The meaning for
cases that were legal in F90 remains the same, and the meaning of the
new cases is closely related to the old and is useful in the FORALL
context (at least --- I haven't thought about WHERE in scalar code).
I propose that we extend WHERE to this new case.


	Chuck

From wu@cs.buffalo.edu  Mon Jun 22 14:47:03 1992
Received: from rice.edu ([128.42.5.1]) by cs.rice.edu (AA15289); Mon, 22 Jun 92 14:47:03 CDT
Received: from ruby.cs.Buffalo.EDU by rice.edu (AA01333); Mon, 22 Jun 92 14:46:17 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA11856; Mon, 22 Jun 92 15:46:39 EDT
Date: Mon, 22 Jun 92 15:46:39 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9206221946.AA11856@ruby.cs.Buffalo.EDU>
To: SNIR%YKTVMV.bitnet@cunyvm.cuny.edu, chk@cs.rice.edu
Subject: Forall
Cc: hpff-forall@rice.edu, wu@cs.buffalo.edu


Some of my comments on Marc's proposal and Chuck and Alok's comments.

> Worm 3:
> The principle of least astonishment may need invocation here.  One
> interesting consequence of Marc's proposal is that this is legal:
> 	FORALL ( I = 1:N )
> 	  IF ( N > 1 ) THEN
> 	    A(I) = 1.0
> 	  END IF
> 	END FORALL
> but this is not legal:
> 	FORALL ( I = 1:N )
> 	  IF ( N > I ) THEN
> 	    A(I) = 1.0
> 	  END IF
> 	END FORALL
> (Look carefully at the IF condition...  Marc does not allow the FORALL
> index to be referenced in any statement except forall-assignments.)
> The intent (I assume) is to force all iterations, and thus all
> processors, down the same path at all branch points, and that sure
> makes FORALL easier to define.  But it will make some useful
> computations difficult to code:
> 	Iterating to convergence on each point of a grid
> 	Conditional assignment to points on a grid
> 	Having different formulas for different grid points
> 	Guy Steele's ragged array example (ALLOCATE different-sized blocks)
> 	Nested triangular loops or triangular FORALLs
> 	Boundary conditions
> (Some of these limitations are solved by the WHERE construct.)
> Question for users: Are you willing to live with these restrictions?
> -- Chuck

We can allow FORALL index in the IF statement, as long as the user
can ensure there is no dependence between FORALL instances.
It there is a dependence, we have to define the semantics first
(IF-THEN and ELSE executed in order or simultaneously).

> Worm 4:
> If the statements in the forall-block can be arbitrary, we have to
> define semantics for
> 	FORALL ( I = 1:N )
> 	    CALL FOO( A )	! How many times is FOO called?
> 				! Scalarization says 1, users will say N
> 	    OPEN(5,FILE='BAR')	! Parallel I/O, here we come
> 	    READ (5) B(1:100)
> 	    			! READ (5) B(I) is illegal, though
> 	    WRITE (5) C(1:5)
> 	    			! WRITE (5) C(I) is illegal
> 	    CLOSE (5)
> 	END FORALL
> (Interestingly, I don't see any trouble from GOTO or STOP, since all
> iterations must follow the same control flow.)  There may be other
> problematic Fortran statements.
> -- Chuck

I agree that in this FORALL, FOO is called N times.  It can be solved
by using a different function:
       FORALL ( I = 1:N )
           CALL foo( a(I) )

> Overall, I think the proposal went too far in generalizing the FORALL.
> I'll make the following modest (and informal) counterproposal:
> 	Only "generalized assignment statements" (GASes) are allowed
> 	in a FORALL body.
> 	The following statements are GASes - assignment, WHERE,
> 	FORALL.
> 	Use Marc's semantics for interpreting GASes (no object
> 	assigned twice by one statement, assignments executed once for
> 	each element of the index space, GASes executed in sequence).
> 	I think that Marc's proposal still disallows triangular
> 	FORALLs; I propose lifting this restriction.
> If there's interest, I'll write this up more formally.
> 
> 
> 	Chuck
> 	
>A {\it forall-block} can be an arbitrary sequence of executable constructs,
>treated as a unit (R801).  It can also contain {forall-assignment}'s.
> -- Marc

My understanding is, in Marc's proposal, besides of GASes, all other 
statements follow the same flow.  The good thing is that ARBITRARY sequence 
of statements is allowed, but these statements can only follow the same
control flow.  Moreover, it is not clear that those statements should be 
scalar or not.  For example, the statements
       FORALL(I=1:N, J=1:N, I.NE.J)
          IF(COND)
             THEN A(I,J) = I+J
             ELSE A(I,J) = I-J
          END IF
       END FORALL
and 
       FORALL(I=1:N)
          WHERE(MOD(I,2)=0)
              A(I) = 0
          ELSEWHERE
              A(I) = A(I-1)
          END WHERE
       END FORALL
are scalar, and this statement
       FORALL ( I = 1:N )
           CALL FOO( A )
       END FORALL
is not scalar.

>  I do not see a need for a WHERE inside a Forall if "IF" statement
>is allowed
>
> Anyway, I am not sure if WHERE Construct allows MASKS to be specified
>the way they have been used in Marc's proposal. I looked up
>the F90 documents, but did not find anything explicitly prohibiting it,
>nor did I ever find an example using it also.
>
> For example, the following illustration used in Marc's proposal
>uses MOD(I,2) inside the WHERE Mask Expression. I am not sure
>if it is allowed.
>
>PLEASE CHECK.
>
>FORALL(I=1:N)
>   WHERE(MOD(I,2)=0)
>       A(I) = 0
>   ELSEWHERE
>       A(I) = A(I-1)
>    END WHERE
>END FORALL
>
> -- Alok

Alok, I think Marc uses IF for a uniform flow but WHERE allows
different conditions applied to different grid points.  
However, Marc's WHERE in FORALL is not consistent with that
outside of FORALL.  So if it is necessary, I suggest a different
name should be used to avoid confusion.


Min-You Wu

From chk@erato.cs.rice.edu  Mon Jun 22 15:38:24 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA17295); Mon, 22 Jun 92 15:38:24 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA07301); Mon, 22 Jun 92 15:38:05 CDT
Message-Id: <9206222038.AA07301@erato.cs.rice.edu>
To: wu@cs.buffalo.edu (Min-You Wu)
Cc: hpff-forall@erato.cs.rice.edu
Subject: Re: Forall 
In-Reply-To: Your message of Mon, 22 Jun 92 15:46:39 -0400.
             <9206221946.AA11856@ruby.cs.Buffalo.EDU> 
Date: Mon, 22 Jun 92 15:38:01 -0500
From: chk@erato.cs.rice.edu


> Date: Mon, 22 Jun 92 15:46:39 EDT
> From: wu@cs.buffalo.edu (Min-You Wu)
> Subject: Forall
> 
> 
> > The principle of least astonishment may need invocation here.  One
> > interesting consequence of Marc's proposal is that this is legal:
> > 	FORALL ( I = 1:N )
> > 	  IF ( N > 1 ) THEN
> > 	    A(I) = 1.0
> > 	  END IF
> > 	END FORALL
> > but this is not legal:
> > 	FORALL ( I = 1:N )
> > 	  IF ( N > I ) THEN
> > 	    A(I) = 1.0
> > 	  END IF
> > 	END FORALL
> > (Look carefully at the IF condition...  Marc does not allow the FORALL
> > index to be referenced in any statement except forall-assignments.)
> > Question for users: Are you willing to live with these restrictions?
> > -- Chuck
> We can allow FORALL index in the IF statement, as long as the user
> can ensure there is no dependence between FORALL instances.
> It there is a dependence, we have to define the semantics first
> (IF-THEN and ELSE executed in order or simultaneously).

Which of the following are you advocating?
    1. IF-THEN-ELSE is allowed in a FORALL, but it is the user's
	responsibility to ensure that array elements assigned in the THEN
	branch are not referenced in the ELSE branch on another iteration.
    2. IF-THEN-ELSE is allowed in a FORALL, and assignments made in
	the THEN branch are visible in the ELSE branch of another
	iteration (i.e. WHERE semantics).
    3. IF-THEN-ELSE is allowed in a FORALL, and the effect of an
	assignment in one branch to an element used in the other
	branch on another iteration is undefined (i.e. asynchronous
	semantics).
    4. IF-THEN-ELSE is allowed in a FORALL, and references made in
	either branch always refer to array values that were current before
	the IF or assigned in the same branch (almost Fortran D
	semantics, but not quite).
    5. Something else.  (If you choose this one, please tell us what
	you have in mind! :-)


> > 	FORALL ( I = 1:N )
> > 	    CALL FOO( A )	! How many times is FOO called?
> > 				! Scalarization says 1, users will say N
> > 	END FORALL
> I agree that in this FORALL, FOO is called N times.  It can be solved
> by using a different function:
>        FORALL ( I = 1:N )
>            CALL foo( a(I) )

Now I'm confused.  Why should changing the function's argument change
the number of times it is called?

Let me say that I'm in favor a semantics that would call FOO N times.
It's not the semantics that Marc defined, however.  We need to
    1. Allow the CALL to reference FORALL indices. (Otherwise the
	subroutine is called with the same arguments every time; not
	efficient!)
    2. Define some restrictions on FOO to either disallow dependences
	between iterations, or define the behavior when dependences
	occur.
Part 1 is easy, part 2 is hard.  It's things like this that led me to
propose the very restricted block FORALL.  Anybody willing to put in
the time to define these semantics is welcome to, and I'll support
more general proposals if they look reasonable.

> My understanding is, in Marc's proposal, besides of GASes, all other 
> statements follow the same flow.  The good thing is that ARBITRARY sequence 
> of statements is allowed, but these statements can only follow the same
> control flow.  Moreover, it is not clear that those statements should be 
> scalar or not.  For example, the statements
>        FORALL(I=1:N, J=1:N, I.NE.J)
>           IF(COND)
>              THEN A(I,J) = I+J
>              ELSE A(I,J) = I-J
>           END IF
>        END FORALL
> and 
>        FORALL(I=1:N)
>           WHERE(MOD(I,2)=0)
>               A(I) = 0
>           ELSEWHERE
>               A(I) = A(I-1)
>           END WHERE
>        END FORALL
> are scalar, and this statement
>        FORALL ( I = 1:N )
>            CALL FOO( A )
>        END FORALL
> is not scalar.

Please define what you mean by "scalar" in this context.  In Fortran
90, "scalar" is a noun meaning roughly "a data object that is not an
array."

> >  I do not see a need for a WHERE inside a Forall if "IF" statement
> >is allowed
> >
> > Anyway, I am not sure if WHERE Construct allows MASKS to be specified
> >the way they have been used in Marc's proposal. ...
> 
> Alok, I think Marc uses IF for a uniform flow but WHERE allows
> different conditions applied to different grid points.  
> However, Marc's WHERE in FORALL is not consistent with that
> outside of FORALL.  So if it is necessary, I suggest a different
> name should be used to avoid confusion.
> 
> Min-You Wu

As I argued before, Marc's proposal extends WHERE in a natural way.
If you're saying that there is a case in Marc's semantics in which a
vector WHERE statement has a different meaning than in standard
Fortran 90, then please show us an example.  (That would be grounds
for modifying the treatment of WHERE or throwing it out entirely,
depending on how hard it was to fix.)  If you are against extending
WHERE for some reason, then say that.

I'm against adding any new statement except for FORALL.  If Marc's
WHERE is different from Fortran 90 WHERE (except for extending the
domain), then we should fix or abandon it.

	Chuck

From wu@cs.buffalo.edu  Mon Jun 22 21:30:50 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA23778); Mon, 22 Jun 92 21:30:50 CDT
Received: from ruby.cs.Buffalo.EDU by erato.cs.rice.edu (AA07462); Mon, 22 Jun 92 21:30:38 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA12180; Mon, 22 Jun 92 22:30:34 EDT
Date: Mon, 22 Jun 92 22:30:34 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9206230230.AA12180@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, wu@cs.buffalo.edu
Subject: Re: Forall
Cc: hpff-forall@erato.cs.rice.edu, wu@cs.buffalo.edu


> > We can allow FORALL index in the IF statement, as long as the user
> > can ensure there is no dependence between FORALL instances.
> > It there is a dependence, we have to define the semantics first
> > (IF-THEN and ELSE executed in order or simultaneously).
> 
> Which of the following are you advocating?
>     1. IF-THEN-ELSE is allowed in a FORALL, but it is the user's
> 	responsibility to ensure that array elements assigned in the THEN
> 	branch are not referenced in the ELSE branch on another iteration.
>     2. IF-THEN-ELSE is allowed in a FORALL, and assignments made in
> 	the THEN branch are visible in the ELSE branch of another
> 	iteration (i.e. WHERE semantics).
>     3. IF-THEN-ELSE is allowed in a FORALL, and the effect of an
> 	assignment in one branch to an element used in the other
> 	branch on another iteration is undefined (i.e. asynchronous
> 	semantics).
>     4. IF-THEN-ELSE is allowed in a FORALL, and references made in
> 	either branch always refer to array values that were current before
> 	the IF or assigned in the same branch (almost Fortran D
> 	semantics, but not quite).
>     5. Something else.  (If you choose this one, please tell us what
> 	you have in mind! :-)

We can allow IT-THEN-ELSE in a FORALL within a pair of INDEPENDENT 
directives.  The user's responsibility is to ensure that:
1. array elements assigned in the THEN branch are not referenced in 
   the IF condition and the ELSE branch on another instance; and
2. array elements assigned in the ELSE branch are not referenced in 
   the IF condition and the THEN branch on another instance.

Take Alok's example:

   FORALL(I=1:N)
!HPF$INDEPENDENT  
    IF(A(I) .EQ. 0)
	THEN A(I)=1
	ELSE A(I)=0
    END IF
!HPF$ENDINDEPENDENT  
   END FORALL

The user can ensure the above two conditions are satisfied.

So far I believe we don't have a reasonable semantics for IF-THEN-ELSE
with *dependence* between instances.  Semantics 2 and 3 above is not 
reasonable.  Semantics 4 is interesting but may confuse users.  
It worth to put time on defining a reasonable semantics.  

> > > 	FORALL ( I = 1:N )
> > > 	    CALL FOO( A )	! How many times is FOO called?
> > > 				! Scalarization says 1, users will say N
> > > 	END FORALL
> > I agree that in this FORALL, FOO is called N times.  It can be solved
> > by using a different function:
> >        FORALL ( I = 1:N )
> >            CALL foo( a(I) )
> 
> Now I'm confused.  Why should changing the function's argument change
> the number of times it is called?

Sorry I didn't make the argument clear.  What I meant was a call in 
a FORALL should be called N times, so we should make the subroutine call 
with array element as argument instead of the whole array.  (And a 
subroutine with array element as argument is easier to write).

> Let me say that I'm in favor a semantics that would call FOO N times.
> It's not the semantics that Marc defined, however.  We need to
>     1. Allow the CALL to reference FORALL indices. (Otherwise the
> 	subroutine is called with the same arguments every time; not
> 	efficient!)
>     2. Define some restrictions on FOO to either disallow dependences
> 	between iterations, or define the behavior when dependences
> 	occur.
> Part 1 is easy, part 2 is hard.  It's things like this that led me to
> propose the very restricted block FORALL.  Anybody willing to put in
> the time to define these semantics is welcome to, and I'll support
> more general proposals if they look reasonable.

I agree.  For part 2, I believe before we can define the semantics for 
subroutine calls with dependence between instances, the call must be in 
the INDEPENDENT part.

Marc didn't make it clear how we can make a subroutine call.
From his proposal:
>Constraint: No statement in a {\it forall-block}, other than the
>{\it forall-assignment}'s, may reference any of the {\it
>forall-triplet-spec subscript-names}.
I assume he doesn't allow the call to reference FORALL indices.

> > My understanding is, in Marc's proposal, besides of GASes, all other 
> > statements follow the same flow. The good thing is that ARBITRARY sequence 
> > of statements is allowed, but these statements can only follow the same
> > control flow.  Moreover, it is not clear that those statements should be 
> > scalar or not.  For example, the statements
> >        FORALL(I=1:N, J=1:N, I.NE.J)
> >           IF(COND)
> >              THEN A(I,J) = I+J
> >              ELSE A(I,J) = I-J
> >           END IF
> >        END FORALL
> > and 
> >        FORALL(I=1:N)
> >           WHERE(MOD(I,2)=0)
> >               A(I) = 0
> >           ELSEWHERE
> >               A(I) = A(I-1)
> >           END WHERE
> >        END FORALL
> > are scalar, and this statement
> >        FORALL ( I = 1:N )
> >            CALL FOO( A )
> >        END FORALL
> > is not scalar.
> 
> Please define what you mean by "scalar" in this context.  In Fortran
> 90, "scalar" is a noun meaning roughly "a data object that is not an
> array."

Roughly, what I meant is to reference an array element or an array.


> As I argued before, Marc's proposal extends WHERE in a natural way.
> If you're saying that there is a case in Marc's semantics in which a
> vector WHERE statement has a different meaning than in standard
> Fortran 90, then please show us an example.  (That would be grounds
> for modifying the treatment of WHERE or throwing it out entirely,
> depending on how hard it was to fix.)  If you are against extending
> WHERE for some reason, then say that.
> 
> I'm against adding any new statement except for FORALL.  If Marc's
> WHERE is different from Fortran 90 WHERE (except for extending the
> domain), then we should fix or abandon it.
> 
> 	Chuck
> 

In Marc's proposal, the definition of WHERE in FORALL is not consistent 
with that outside of FORALL.  Just as you pointed out before:
>It's a subtle point, but the WHERE in Marc's example
>
>> FORALL(I=1:N)
>>    WHERE(MOD(I,2)=0)
>>        A(I) = 0
>>    ELSEWHERE
>>        A(I) = A(I-1)
>>     END WHERE
>> END FORALL
>
>is not legal Fortran 90.  Quoting section 7.5.3.1 of the standard,
>where WHERE is defined,
>	Constraint: In each assignment-stmt, the mask-expr and the
>	variable being defined must be arrays of the same shape.
>So, the mask-expr (in the example, MOD(I,2)=0) must be an array.
I don't think we need to extend WHERE since we have another way
to do it.  Actually, if we allow IF-THEN-ELSE reference FORALL index, 
we don't need WHERE:
      FORALL(I=1:N)
!HPF$INDEPENDENT  
         IF(MOD(I,2)=0) 
           THEN  A(I) = 0
           ELSE  A(I) = 1
!HPF$ENDINDEPENDENT  
      END FORALL
(I modified the ELSE part to make it no dependence).

However, I don't suggest abandon WHERE completely.  WHERE can be used 
for other purposes.  See the following example:
      INTEGER, ARRAY(N,N) :: A, B

      FORALL(I=1:N)
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
      END FORALL


Min-You

From gls@think.com  Tue Jun 23 10:56:20 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA01949); Tue, 23 Jun 92 10:56:20 CDT
Received: from mail.think.com by erato.cs.rice.edu (AA07757); Tue, 23 Jun 92 10:56:15 CDT
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Tue, 23 Jun 92 11:56:01 -0400
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.2)
	id AA09376; Tue, 23 Jun 92 11:56:06 EDT
Date: Tue, 23 Jun 92 11:56:06 EDT
Message-Id: <9206231556.AA09376@strident.think.com>
To: wu@cs.buffalo.edu
Cc: chk@cs.rice.edu, wu@cs.buffalo.edu, hpff-forall@erato.cs.rice.edu,
        wu@cs.buffalo.edu
In-Reply-To: Min-You Wu's message of Mon, 22 Jun 92 22:30:34 EDT <9206230230.AA12180@ruby.cs.Buffalo.EDU>
Subject: Forall

   Date: Mon, 22 Jun 92 22:30:34 EDT
   From: wu@cs.buffalo.edu (Min-You Wu)
   ...
   In Marc's proposal, the definition of WHERE in FORALL is not consistent 
   with that outside of FORALL.  Just as you pointed out before:
   >It's a subtle point, but the WHERE in Marc's example
   >
   >> FORALL(I=1:N)
   >>    WHERE(MOD(I,2)=0)
   >>        A(I) = 0
   >>    ELSEWHERE
   >>        A(I) = A(I-1)
   >>     END WHERE
   >> END FORALL
   >
   >is not legal Fortran 90.  Quoting section 7.5.3.1 of the standard,
   >where WHERE is defined,
   >	Constraint: In each assignment-stmt, the mask-expr and the
   >	variable being defined must be arrays of the same shape.
   >So, the mask-expr (in the example, MOD(I,2)=0) must be an array.
   I don't think we need to extend WHERE since we have another way
   to do it.  Actually, if we allow IF-THEN-ELSE reference FORALL index, 
   we don't need WHERE:
	 FORALL(I=1:N)
   !HPF$INDEPENDENT  
	    IF(MOD(I,2)=0) 
	      THEN  A(I) = 0
	      ELSE  A(I) = 1
   !HPF$ENDINDEPENDENT  
	 END FORALL
   (I modified the ELSE part to make it no dependence).

... and so we come full circle.  This line of reasoning was exactly
what caused me to propose IF within FORALL several months ago; but
I retracted it after a discussion in the meeting showed that it puzzled
some people.  Not that I object to hauling it out again for another look.

--Guy

From zrlp09@trc.amoco.com  Tue Jun 23 11:37:15 1992
Received: from noc.msc.edu by cs.rice.edu (AA03430); Tue, 23 Jun 92 11:37:15 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA04504; Tue, 23 Jun 92 11:37:14 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA19118; Tue, 23 Jun 92 11:37:11 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA10216; Tue, 23 Jun 92 11:37:07 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA20921; Tue, 23 Jun 92 11:37:03 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA02729; Tue, 23 Jun 92 11:37:01 CDT
Message-Id: <9206231637.AA02729@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Cc: rpage@trc.amoco.com
Subject: triangular array access and assignment-only block foralls
Date: Tue, 23 Jun 92 11:36:59 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>


     forall (i=1:n)                      ! assigns new values
       forall (j=i:n) A(j,j-i+1) = ...   ! to the lower triangle
     end forall                          ! of a matrix


Array sections provide access to rectilinear portions of arrays
(i.e., portions whose index sets are Cartesian products of sets
of subscripts).

Forall statements provide access to more general portions of
arrays (the diagonal of a matrix, for example).  Block forall
constructs generalize accessible portions of arrays still further
(e.g., to slices taken at angles).

If a forall construct permits the block of statements instantiated
by one value of the index set to affect the result of another such
block, the overall effect of the construct will fail to be
deterministic.  Acceptable nondeterministic results would be fairly
easy to describe if all of the statements in the block were
assignments (... must be the same as repeated execution of the block
according to a sequence established by some permutation of the index
set, assuming all rhs's are pre-evaluated and stored in temporaries
... blah blah mumble mumble).

Fortran 90, as it stands, avoids even this level of nondeterminism.
That is why it prohibits duplicate entries in vector-valued subscripts
that occur in lhs's.

Acceptable non-determinism in a block forall becomes more difficult
to describe in the presence of statements other than assignments.
CALLs are especially difficult to deal with.  The quantity of
misunderstandings and incorrect programs that would result from
such a complex linguistic structure would, I think, negate its
potential advantages.

HPF Fortran might try to avoid this sort of complexity by requiring
that the evaluation of a forall block for one element in the forall's
index set neither affect nor be affected by the evaluation of the
block for any other element of the index set.  This almost works,
but not quite because rhs's in assignments need to be pre-evaluated,
and this exception invalidates, unfortunately, inline substitution as
a correct method of invoking subroutines.  A subroutine containing
assignments could produce a different effect if its executable
statements were substituted in place of a CALL to the subroutine
in a forall.

I conclude that forall should be used for array assignment only,
including forall assignment.  This would provide a way to access
non-rectilinear portions of arrays, but it would not provide a
parallel loop.  (A forall assignment is not a loop because it is
not a control structure.)  Perhaps HPF Fortran could get by with
a directive indicating independence in DO loops as its sole
parallel-loop facility, using block foralls simply as an array
access method.

Rex Page      Amoco Production Company              918-660-3935
              Research Center, 41&Yale zip 74135           -4163 FAX
              POBox 3385
              Tulsa OK  74102

From Keith.Bierman@eng.sun.com  Tue Jun 23 20:39:04 1992
Received: from Sun.COM by cs.rice.edu (AA20436); Tue, 23 Jun 92 20:39:04 CDT
Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1)
	id AA02459; Tue, 23 Jun 92 18:39:03 PDT
Received: from chiba.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1)
	id AA13684; Tue, 23 Jun 92 18:39:08 PDT
Received: by chiba.Eng.Sun.COM (4.1/SMI-4.1)
	id AA05542; Tue, 23 Jun 92 18:38:57 PDT
Date: Tue, 23 Jun 92 18:38:57 PDT
From: Keith.Bierman@eng.sun.com (Keith Bierman fpgroup)
Message-Id: <9206240138.AA05542@chiba.Eng.Sun.COM>
To: hpff-forall@cs.rice.edu
Subject: significant blanks


>> Douglas Walls brought the recent discussions to my attention. I may
>> not have studied them carefully enough, so if I've missed the main
>> points I apologize in advance.
>> 
>> In the free form source (ISO 1539:1991) form, blanks are already
>> significant (3.3.1).
>> 
>> I know that there has been a lot of effort expended to make the HPF
>> additions to the language fit nicely in a subset which looks
>> remarkably like some extended FORTRAN 77s, it would seem that we would
>> be much better off with the Free Form source form (in light of the
>> current discussion).
>> 
>> A separate question, but related in the sense that it too is
>> "spelling" is that of directives versus syntax. When X3H5 faced this
>> question, the vote went to syntax. X3J3 was asked for a sense of the
>> committee (straw vote) and syntax won hands down.
>> 
>> If HPF is sucessful, we'll have codes with these constructs for at
>> least 10-20 years; so using syntax might be a good idea.
>> 
>> Sorry to jump into the middle of things.
>> 
>> khb


From chk@erato.cs.rice.edu  Wed Jun 24 09:49:06 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA28365); Wed, 24 Jun 92 09:49:06 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08249); Wed, 24 Jun 92 09:49:02 CDT
Message-Id: <9206241449.AA08249@erato.cs.rice.edu>
To: Keith.Bierman@eng.sun.com (Keith Bierman fpgroup)
Cc: hpff-distribute@cs.rice.edu, hpff-forall@cs.rice.edu
Subject: Re: significant blanks 
In-Reply-To: Your message of Tue, 23 Jun 92 18:28:49 -0700.
             <9206240128.AA05465@chiba.Eng.Sun.COM> 
Date: Wed, 24 Jun 92 09:49:00 -0500
From: chk@erato.cs.rice.edu


> Date: Tue, 23 Jun 92 18:28:49 PDT
> From: Keith.Bierman@eng.sun.com (Keith Bierman fpgroup)
> To: hpff-distribute@cs.rice.edu, hpff-foralle@cs.rice.edu
> Subject: significant blanks


> In the free form source (ISO 1539:1991) form, blanks are already
> significant (3.3.1).
> 
> I know that there has been a lot of effort expended to make the HPF
> additions to the language fit nicely in a subset which looks
> remarkably like some extended FORTRAN 77s, it would seem that we would
> be much better off with the Free Form source form (in light of the
> current discussion).

You're right, there's no problem with significant blanks in free form source.

The problem is, supporting fixed form source is not optional in
Fortran 90.  Nor is it a deprecated feature.  In other words, there
will be programs that count on insignificant blanks for many moons
yet.  As screwey as they make lexing, I don't think we can define HPF
to get rid insignificant blanks.  

> A separate question, but related in the sense that it too is
> "spelling" is that of directives versus syntax. When X3H5 faced this
> question, the vote went to syntax. X3J3 was asked for a sense of the
> committee (straw vote) and syntax won hands down.
> 
> If HPF is sucessful, we'll have codes with these constructs for at
> least 10-20 years; so using syntax might be a good idea.

My understanding was that there has been a fair amount of dissention
at X3J3 on this point; apparently I'm wrong.

HPFF went for the following strategy:
	Features that do not affect the value semantics of a program
	(i.e. they don't change the printed answer, like ALIGN) are
	directives that non-HPF compilers can ignore.
	Features that do introduce new semantics (FORALL, new
	intrinsics) are new syntax elements; non-HPF compilers can't
	handle them without change anyway.
The reasoning was to maintain as much backward compatibility (to
workstations, for example) as possible.  I'm happy with that decision,
but if you want to reopen the discussion in this forum I won't stop you.

> Sorry to jump into the middle of things.
> 
> khb

What makes you think you're the only one? :-)

	Chuck

From chk@erato.cs.rice.edu  Wed Jun 24 11:18:41 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA01156); Wed, 24 Jun 92 11:18:41 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08302); Wed, 24 Jun 92 11:18:36 CDT
Message-Id: <9206241618.AA08302@erato.cs.rice.edu>
To: hpff-forall@cs.rice.edu, rpage@trc.amoco.com
Subject: Re: triangular array access and assignment-only block foralls 
In-Reply-To: Your message of Tue, 23 Jun 92 11:36:59 -0500.
             <9206231637.AA02729@backus.trc.amoco.com> 
Date: Wed, 24 Jun 92 11:18:34 -0500
From: chk@erato.cs.rice.edu


> To: hpff-forall@cs.rice.edu
> Subject: triangular array access and assignment-only block foralls
> Date: Tue, 23 Jun 92 11:36:59 -0500
> From: "Rex Page" <zrlp09@trc.amoco.com>
> 
> ...
> 
> Array sections provide access to rectilinear portions of arrays
> (i.e., portions whose index sets are Cartesian products of sets
> of subscripts).
> 
> Forall statements provide access to more general portions of
> arrays (the diagonal of a matrix, for example).  Block forall
> constructs generalize accessible portions of arrays still further
> (e.g., to slices taken at angles).

My reading of Marc's proposal is that you can't use the outer FORALL
indices as inner FORALL bounds (as your example does).  To quote Marc:
	Constraint: A subscript-name that occurs in a
	forall-triplet-spec-list may be referenced inside the scope of
	the FORALL only by the scalar-mask-expr of a FORALL or WHERE
	statement or in an assignment of the form array-element - expr
	or array-section = expr.  A subscript-name may not be
	redefined inside the scope of the FORALL.
So I don't think the block FORALL generalizes the array sections more
than the regular FORALL.

See also the previous discussion of faking triangular FORALLs using masks.

On the other hand, I like your version of block FORALL better than Marc's.

> ...
> 
> Acceptable non-determinism in a block forall becomes more difficult
> to describe in the presence of statements other than assignments.
> CALLs are especially difficult to deal with.  The quantity of
> misunderstandings and incorrect programs that would result from
> such a complex linguistic structure would, I think, negate its
> potential advantages.

Like Ken, I don't want nondeterminism in the HPF language.  The cases
that Rex calls "acceptable non-determinism" I would label as
"non-standard-conforming".  [Asbestos suit on]

Let me observe that function calls have essentially all the problems
that CALL statements do within FORALLs.  I want to be able to say
	FORALL ( I = 1:N ) A(I) = SIN( 2*I*PI / N )
but be protected from suprises in
	FORALL ( I = 1:N ) A(I) = MESSY_FUNCTION( I, B(I), INDX(:) )
where MESSY_FUNCTION could potentially 
	Assign to B(I) [ no problem, they're independent ]
	Assign to B(INDX(I)) [ big problem, in general ]
	Reference all of INDX [ possibly complicated if INDX is distributed ]
	Assign anything in INDX [ big problem ]
We need to figure out some restrictions on functions to avoid
ambiguities and problems.  Pres, weren't you going to circulate a
definition of user elemental functions?  If nothing else, that sounds
like a good starting point.

> I conclude that forall should be used for array assignment only,
> including forall assignment.  This would provide a way to access
> non-rectilinear portions of arrays, but it would not provide a
> parallel loop.  (A forall assignment is not a loop because it is
> not a control structure.)  Perhaps HPF Fortran could get by with
> a directive indicating independence in DO loops as its sole
> parallel-loop facility, using block foralls simply as an array
> access method.
> 
> Rex Page      Amoco Production Company              918-660-3935
>               Research Center, 41&Yale zip 74135           -4163 FAX
>               POBox 3385
>               Tulsa OK  74102

This sounds like something I can support, with a couple
qualifications:
    I support WHERE in FORALL loops (which requires extending
	WHERE to scalar assignments as well).
    It appears that nested single-statement FORALL is allowed, but
	nested multi-statement FORALL is not.  What is the reason for
	this?
    We still need firm agreement on what to do for functions and DO
	INDEPENDENT.  There's been almost no discussion of Min-You's
	INDEPENDENT proposal; should I read this as general agreement
	or general "too busy/don't care"?

	Chuck

From chk@erato.cs.rice.edu  Wed Jun 24 12:06:42 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA00333); Wed, 24 Jun 92 12:06:42 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08312); Wed, 24 Jun 92 11:39:53 CDT
Message-Id: <9206241639.AA08312@erato.cs.rice.edu>
To: Guy Steele <gls@think.com>
Cc: wu@cs.buffalo.edu, chk@cs.rice.edu, hpff-forall@erato.cs.rice.edu
Subject: Re: Forall 
In-Reply-To: Your message of Tue, 23 Jun 92 11:56:06 -0400.
             <9206231556.AA09376@strident.think.com> 
Date: Wed, 24 Jun 92 11:39:52 -0500
From: chk@erato.cs.rice.edu


> From: Guy Steele <gls@think.com>
> Date: Tue, 23 Jun 92 11:56:06 EDT
> Subject: Forall
> 
>    Date: Mon, 22 Jun 92 22:30:34 EDT
>    From: wu@cs.buffalo.edu (Min-You Wu)
>    I don't think we need to extend WHERE since we have another way
>    to do it.  Actually, if we allow IF-THEN-ELSE reference FORALL index, 
>    we don't need WHERE:
> 	 FORALL(I=1:N)
>    !HPF$INDEPENDENT  
> 	    IF(MOD(I,2)=0) 
> 	      THEN  A(I) = 0
> 	      ELSE  A(I) = 1
>    !HPF$ENDINDEPENDENT  
> 	 END FORALL
>    (I modified the ELSE part to make it no dependence).
> 
> ... and so we come full circle.  This line of reasoning was exactly
> what caused me to propose IF within FORALL several months ago; but
> I retracted it after a discussion in the meeting showed that it puzzled
> some people.  Not that I object to hauling it out again for another look.
> 
> --Guy

Almost full circle.  You defined a meaning for the original example:

	FORALL ( I = 1:N )
	  IF (MOD(I,2)=0) THEN
	    A(I) = 0
	  ELSE
	    A(I) = A(I-1)
	  END IF
	END FORALL

(the infamous WHERE semantics).  Min-You, on the other hand, makes
this case undefined because of the interaction between the branches.

I don't like the "branches may not affect each other" restriction
because it keeps me from handling inhomogeneous local computations in
the obvious way:

	FORALL ( I = 1:N )
	  IF ( ABS(A(I)) < EPS ) THEN
	    A(I) = 0.0
	  ELSE
	    A(I) = A(I-1) + A(I+1)
	  END IF
	END FORALL

In an ideal world (or in Fortran D FORALLs), this would be legal and
would perform independent computations on all points.  Those semantics
are difficult to explain (everywhere) and implement (on some machines).

The WHERE semantics are at least clear, if counterintuitive in some
cases.  I would rather use them, and call the construct in the FORALL
"WHERE", than do without.  Unfortunately, there is no WHERE-CASE
construct for more complex situations; sigh.  I can't in good
conscience propose adding one.

I am against allowing IF (or any other control-flow construct) if its
interpretation in the FORALL is markedly different from its outside
the FORALL.  Judging from the reaction to Guy's original proposal, I'm
not alone in this.

It would be a shame if we limited block FORALL to only assignment
statements.

	Chuck

From joelw@mozart.convex.com  Wed Jun 24 12:51:28 1992
Received: from convex.convex.com by cs.rice.edu (AA02014); Wed, 24 Jun 92 12:51:28 CDT
Received: from mozart.convex.com by convex.convex.com (5.64/1.35)
	id AA01334; Wed, 24 Jun 92 12:51:13 -0500
Received: by mozart.convex.com (5.64/1.28)
	id AA07408; Wed, 24 Jun 92 12:51:12 -0500
From: joelw@mozart.convex.com (Joel Williamson)
Message-Id: <9206241751.AA07408@mozart.convex.com>
Subject: Re: triangular array access and assignment-only block foralls
To: chk@cs.rice.edu
Date: Wed, 24 Jun 92 12:51:11 CDT
Cc: hpff-forall@cs.rice.edu
In-Reply-To: <9206241618.AA08302@erato.cs.rice.edu>; from "chk@cs.rice.edu" at Jun 24, 92 11:18 am
X-Mailer: ELM [version 2.3 PL11]

chk@cs.rice.edu writes:
> 
>     We still need firm agreement on what to do for functions and DO
> 	INDEPENDENT.  There's been almost no discussion of Min-You's
> 	INDEPENDENT proposal; should I read this as general agreement
> 	or general "too busy/don't care"?
> 
> 	Chuck
> 
I continue to prefer Min-You's INDEPENDENT proposal.  Being able to
code:

	forall (...)
	...synchronous-forall-stuff...
CHPF$ INDEPENDENT
	...freeforall-stuff...
CHPF$ END INDEPENDENT
	...more-synchronous-forall-stuff...
	end forall

instead of three separate foralls with the second declared INDEPENDENT,
is concise, clear, and "natural."

Joel

From wu@cs.buffalo.edu  Wed Jun 24 14:17:26 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA04653); Wed, 24 Jun 92 14:17:26 CDT
Received: from ruby.cs.Buffalo.EDU by erato.cs.rice.edu (AA08390); Wed, 24 Jun 92 14:17:24 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA13117; Wed, 24 Jun 92 15:17:06 EDT
Date: Wed, 24 Jun 92 15:17:06 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9206241917.AA13117@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, gls@think.com
Subject: Re: Forall
Cc: hpff-forall@erato.cs.rice.edu, wu@cs.buffalo.edu


> I don't like the "branches may not affect each other" restriction
> because it keeps me from handling inhomogeneous local computations in
> the obvious way:
> 
> 	FORALL ( I = 1:N )
> 	  IF ( ABS(A(I)) < EPS ) THEN
> 	    A(I) = 0.0
> 	  ELSE
> 	    A(I) = A(I-1) + A(I+1)
> 	  END IF
> 	END FORALL
> 
> In an ideal world (or in Fortran D FORALLs), this would be legal and
> would perform independent computations on all points.  Those semantics
> are difficult to explain (everywhere) and implement (on some machines).
> 
> The WHERE semantics are at least clear, if counterintuitive in some
> cases.  I would rather use them, and call the construct in the FORALL
> "WHERE", than do without.  Unfortunately, there is no WHERE-CASE
> construct for more complex situations; sigh.  I can't in good
> conscience propose adding one.
> 
> I am against allowing IF (or any other control-flow construct) if its
> interpretation in the FORALL is markedly different from its outside
> the FORALL.  Judging from the reaction to Guy's original proposal, I'm
> not alone in this.
> 
> It would be a shame if we limited block FORALL to only assignment
> statements.
> 
> 	Chuck

I agree that IF with INDENPENDENT doesn't work for this example because
of interactions between branches.   Let's find out how can we solve
this problem.  It seems Fortran D semantics is a good solution, however,
Fortran D semantics can only be apply to using *old* values.  Of course 
the above example is a good one for Fortran D semantics.  How about the 
following example?

 	FORALL ( I = 1:N )
          DO J = 1,N
   	    IF ( ABS(A(I)) < EPS ) THEN
 	      A(I) = 0.0
 	    ELSE
 	      A(I) = A(I-1) + A(I+1)
 	    END IF
          END DO
 	END FORALL
 
The WHERE semantics, as you indicated, is *counterintuitive* in some
cases.  Lets consider the tightly synchronous semantics.  It is difficult 
to specify the corresponding synchronization point between branches.  
For the above example, we may say the RHS of THEN part can be synchronous
with the RHS of ELSE part, and so the LHS of THEN and ELSE.  However,
in general, it is not easy to do so.  I believe we need to have 
an explicit method to indicate the synchronization points.

Moreover, if we allow IF in Forall and assume we can define a reasonable
semantics for it, then CASE can be treated similarly.

Min-You

From chk@erato.cs.rice.edu  Wed Jun 24 15:27:54 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA07604); Wed, 24 Jun 92 15:27:54 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08420); Wed, 24 Jun 92 15:27:44 CDT
Message-Id: <9206242027.AA08420@erato.cs.rice.edu>
To: wu@cs.buffalo.edu (Min-You Wu)
Cc: chk@cs.rice.edu, gls@think.com, hpff-forall@erato.cs.rice.edu
Subject: Re: Forall 
In-Reply-To: Your message of Wed, 24 Jun 92 15:17:06 -0400.
             <9206241917.AA13117@ruby.cs.Buffalo.EDU> 
Date: Wed, 24 Jun 92 15:27:41 -0500
From: chk@erato.cs.rice.edu


> From: wu@cs.buffalo.edu (Min-You Wu)
> Subject: Re: Forall
> 
> > I don't like the "branches may not affect each other" restriction
> > because it keeps me from handling inhomogeneous local computations in
> > the obvious way:
> > 
> > 	FORALL ( I = 1:N )
> > 	  IF ( ABS(A(I)) < EPS ) THEN
> > 	    A(I) = 0.0
> > 	  ELSE
> > 	    A(I) = A(I-1) + A(I+1)
> > 	  END IF
> > 	END FORALL
> > 
> > ...
> > 
> > 	Chuck
> 
> I agree that IF with INDENPENDENT doesn't work for this example because
> of interactions between branches.   Let's find out how can we solve
> this problem.  It seems Fortran D semantics is a good solution, however,
> Fortran D semantics can only be apply to using *old* values.  Of course 
> the above example is a good one for Fortran D semantics.  How about the 
> following example?
> 
>  	FORALL ( I = 1:N )
>           DO J = 1,N
>    	    IF ( ABS(A(I)) < EPS ) THEN
>  	      A(I) = 0.0
>  	    ELSE
>  	      A(I) = A(I-1) + A(I+1)
>  	    END IF
>           END DO
>  	END FORALL
>  
> ...
> 
> Moreover, if we allow IF in Forall and assume we can define a reasonable
> semantics for it, then CASE can be treated similarly.
> 
> Min-You

Leaving nested DO loops out of the discussion for a moment...

I agree that a reasonable semantics for IF is very likely to be
reasonable for CASE.  Let's find one.

Unfortunately, "synchronization points" look like a bad starting point
for defining IF semantics.  As soon as there are two statements in one
branch of the IF, the following conflict pops up:

	Users want synchronization between statements in the same branch
	Users do not want synchronization between the branches

Case study:

	FORALL ( I = 1 : N )
	  IF ( A(I) < EPS ) THEN		! S0
	    A(I) = 0.0				! S1
	    B(I) = 0.0				! S2
	  ELSE
	    A(I) = (A(I-1) + A(I+1)) / 2	! S3
	    B(I) = A(I) * 3			! S4
	  END IF
	END FORALL

As a programmer, I expect S4 to use the value of A(I) that S3
computes.  I do not expect S3 to see the values of A(I-1) and A(I+1)
assigned by S1.  (It's not clear to me what users expect to happen if
S4 is "B(I) = A(I-1)" instead.  But let's handle that after we get
agreement on this case.)

The only interpretation that I can see to get out of this conflict
involves creating copies of any objects referenced in both branches of
the IF, then executing each branch using its own copy.  Some language
has to be added to the effect that the same element cannot be assigned
by both branches of the IF (or a resolution rule for those conflicts)
as well.  I'll write something up formally if there's interest.
This is not Fortran D semantics, because each branch still has SIMD
semantics (it's close, though).

I'm not sure what this says in terms of synchronization points;
probably there's a global synchronization before (and after?)
evaluating the condition, a global synch after the END IF, and
synchronizations around S1, S2, S3, and S4 *of only the iterations
executing that branch*.  To the best of my recollection, we haven't
needed to talk about partial synchronizations before, although they
would certainly be used heavily in the underlying implementation on
MIMD machines.


Straw poll for the group:

1. Should HPF define a scalar WHERE statement (with semantics as in Guy
Steele's proposal) in a FORALL...
	A) If we cannot agree on a semantics for IF?
	B) If we do agree on a semantics for IF that is different
	   from WHERE semantics?
	C) If we agree to use WHERE semantics for IF?

(My votes are A - yes, B - yes, C - no.  I don't think we'll agree to
C, though...)

2. Which of the following statements should be allowed in a block FORALL?
	(Let's assume that assignment makes the cut :-)
	A) single-statement FORALL
	B) block FORALL
	C) array WHERE (i.e. current Fortran 90 WHERE)
	D) scalar WHERE (i.e. Marc's proposal)
	E) IF-THEN-ELSE with FORALL-invariant conditions
	F) IF-THEN-ELSE, conditions can depend on FORALL indices
	G) DO loop with FORALL-invariant bounds
	H) DO loop, bounds can depend on FORALL indices
	J) GOTO
	K) STOP and PAUSE
	L) I/O statements (OPEN, CLOSE, READ, WRITE, ...)
	M) ALLOCATE and DEALLOCATE
	N) CALL

(
My reading of the current proposals is:
	Snir - all of above except F and H
	Page - A for sure; maybe B, C, and D
	Me - A, B, C, D for sure; E and F if we can agree on some meaning
)

	Chuck

From zrlp09@trc.amoco.com  Wed Jun 24 15:29:50 1992
Received: from noc.msc.edu by cs.rice.edu (AA07702); Wed, 24 Jun 92 15:29:50 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA13636; Wed, 24 Jun 92 15:29:50 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA10525; Wed, 24 Jun 92 15:29:48 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA16176; Wed, 24 Jun 92 15:29:43 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA29444; Wed, 24 Jun 92 15:29:37 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA00374; Wed, 24 Jun 92 15:29:34 CDT
Message-Id: <9206242029.AA00374@backus.trc.amoco.com>
To: chk@cs.rice.edu
Cc: hpff-forall@cs.rice.edu
Subject: Re: triangular array access and assignment-only block foralls 
In-Reply-To: Your message of Wed, 24 Jun 92 11:18:34 -0500.
             <9206241618.AA08302@erato.cs.rice.edu> 
Date: Wed, 24 Jun 92 15:29:33 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>

In response to Chuck's comments (>):

>> 
>> ...
>> 
>> Array sections provide access to rectilinear portions of arrays
>> (i.e., portions whose index sets are Cartesian products of sets
>> of subscripts).
>> 
>> Forall statements provide access to more general portions of
>> arrays (the diagonal of a matrix, for example).  Block forall
>> constructs generalize accessible portions of arrays still further
>> (e.g., to slices taken at angles).

> My reading of Marc's proposal is that you can't use the outer FORALL
> indices as inner FORALL bounds (as your example does).  To quote Marc:
> Constraint: A subscript-name that occurs in a
>	forall-triplet-spec-list may be referenced inside the scope of
>	the FORALL only by the scalar-mask-expr of a FORALL or WHERE
>	statement or in an assignment of the form array-element - expr
>	or array-section = expr.  A subscript-name may not be
>	redefined inside the scope of the FORALL.
> So I don't think the block FORALL generalizes the array sections more
> than the regular FORALL.

I'm not sure why Marc's proposal restricts the use of FORALL indices.
I view the block FORALL as a semantic equivalent to a collection of
statement blocks executing concurrently.  The collection contains one
instance of the block for each value in the index set, and in that
instance the appropriate values of the indices replace the index variables.
An index variable cannot be redefined (that would be like redefining a
constant), but it should be usable in any way that an integer constant
could be used.

I also don't understand the rationale for prohibiting FORALL-type
array-assignments in block FORALLs, especially when section-type
assignments are permitted.

>> ...
>> 
>> Acceptable non-determinism in a block forall becomes more difficult
>> to describe in the presence of statements other than assignments.
>> CALLs are especially difficult to deal with.  The quantity of
>> misunderstandings and incorrect programs that would result from
>> such a complex linguistic structure would, I think, negate its
>> potential advantages.

>Like Ken, I don't want nondeterminism in the HPF language.  The cases
>that Rex calls "acceptable non-determinism" I would label as
>"non-standard-conforming".  [Asbestos suit on]

I don't have a strong feeling about nondeterminism, but it doesn't make
me uncomfortable as long as the collection of correct results from a
non-deterministic piece of code is easy to describe.  Actually, Fortran
has had nondeterminism from the outset in the 1966 standard because it
permited expressions to be computed without full evaluation of all the
operands as long as the value delivered for the expression is mathematically
equivalent to the one that would have been delivered if all operands had
been evaluated.

An example from the Fortran 90 document implies that the expression 
    .TRUE. .OR. f(x)
need not invoke the function f.  If f contains a write statement,
Fortran 90 admits both the computation that executes the write and the
one that doesn't as legitimate interpretations of the code.  (I think
there is actually a theoretical problem with this philosophy because f
might be non-terminating, in which case the value delivered if f is not
evaluated, namely .TRUE., seems mathematically different from the value
delivered when f is evaluated, namely none.)  A similar example in
Fortran 66 would be the expression 0*f(x).

Nevertheless, prohibiting further instances of nondeterminism is ok by me.
Assignemnt statements of the form A(v)=..., where v is a vector, are
non-standard if v contains duplicate values.  The reason it's non-standard
is to avoid nondeterminism. On the other hand, if Fortran 90 had not made
this restriction, I don't think we'd need intrinsic subroutines for
defining send.  (By the way, don't defining sends introduce some
nondeterminism?).


>Let me observe that function calls have essentially all the problems
>that CALL statements do within FORALLs.  I want to be able to say
>	FORALL ( I = 1:N ) A(I) = SIN( 2*I*PI / N )
>but be protected from suprises in
>	FORALL ( I = 1:N ) A(I) = MESSY_FUNCTION( I, B(I), INDX(:) )
>where MESSY_FUNCTION could potentially 
>	Assign to B(I) [ no problem, they're independent ]
>	Assign to B(INDX(I)) [ big problem, in general ]
>	Reference all of INDX [ possibly complicated if INDX is distributed ]
>	Assign anything in INDX [ big problem ]
>We need to figure out some restrictions on functions to avoid
>ambiguities and problems.  Pres, weren't you going to circulate a
>definition of user elemental functions?  If nothing else, that sounds
>like a good starting point.

One restriction I would favor is to prohibit side-effects in functions
used in this FORALLs (such as assigning to B(I) in your MESSY_FUNCTION).


>> I conclude that forall should be used for array assignment only,
>> including forall assignment.  This would provide a way to access
>> non-rectilinear portions of arrays, but it would not provide a
>> parallel loop.  (A forall assignment is not a loop because it is
>> not a control structure.)  Perhaps HPF Fortran could get by with
>> a directive indicating independence in DO loops as its sole
>> parallel-loop facility, using block foralls simply as an array
>> access method.
> ...
>    It appears that nested single-statement FORALL is allowed, but
>	nested multi-statement FORALL is not.  What is the reason for
>	this?

I would include block FORALL as a form of FORALL assignment.
That is, I did not intend to exclude nested multi-statement FORALL.

Rex Page


From chk@erato.cs.rice.edu  Wed Jun 24 15:57:05 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA08714); Wed, 24 Jun 92 15:57:05 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08441); Wed, 24 Jun 92 15:57:02 CDT
Message-Id: <9206242057.AA08441@erato.cs.rice.edu>
To: "Rex Page" <zrlp09@trc.amoco.com>
Cc: chk@cs.rice.edu, hpff-forall@cs.rice.edu
Subject: Re: triangular array access and assignment-only block foralls 
In-Reply-To: Your message of Wed, 24 Jun 92 15:29:33 -0500.
             <9206242029.AA00374@backus.trc.amoco.com> 
Date: Wed, 24 Jun 92 15:57:00 -0500
From: chk@erato.cs.rice.edu


> Subject: Re: triangular array access and assignment-only block foralls 
> Date: Wed, 24 Jun 92 15:29:33 -0500
> From: "Rex Page" <zrlp09@trc.amoco.com>
> 
> In response to Chuck's comments (>):
> > My reading of Marc's proposal is that you can't use the outer FORALL
> > indices as inner FORALL bounds (as your example does).  To quote Marc:
> > Constraint: A subscript-name that occurs in a
> >	forall-triplet-spec-list may be referenced inside the scope of
> >	the FORALL only by the scalar-mask-expr of a FORALL or WHERE
> >	statement or in an assignment of the form array-element - expr
> >	or array-section = expr.  A subscript-name may not be
> >	redefined inside the scope of the FORALL.
> > So I don't think the block FORALL generalizes the array sections more
> > than the regular FORALL.
> 
> I'm not sure why Marc's proposal restricts the use of FORALL indices.
> I view the block FORALL as a semantic equivalent to a collection of
> statement blocks executing concurrently.  The collection contains one
> instance of the block for each value in the index set, and in that
> instance the appropriate values of the indices replace the index variables.
> An index variable cannot be redefined (that would be like redefining a
> constant), but it should be usable in any way that an integer constant
> could be used.
> 
> I also don't understand the rationale for prohibiting FORALL-type
> array-assignments in block FORALLs, especially when section-type
> assignments are permitted.

Marc, where are you?  Please lead us out of this darkness!

I agree with Rex that FORALL bounds shouldn't be restricted thus.


> >Like Ken, I don't want nondeterminism in the HPF language.  The cases
> >that Rex calls "acceptable non-determinism" I would label as
> >"non-standard-conforming".  [Asbestos suit on]
> 
> I don't have a strong feeling about nondeterminism, but it doesn't make
> me uncomfortable as long as the collection of correct results from a
> non-deterministic piece of code is easy to describe.  Actually, Fortran
> has had nondeterminism from the outset in the 1966 standard because it
> permited expressions to be computed without full evaluation of all the
> operands as long as the value delivered for the expression is mathematically
> equivalent to the one that would have been delivered if all operands had
> been evaluated.
> 
> ... example ...
> 
> Nevertheless, prohibiting further instances of nondeterminism is ok by me.
> Assignemnt statements of the form A(v)=..., where v is a vector, are
> non-standard if v contains duplicate values.  The reason it's non-standard
> is to avoid nondeterminism. On the other hand, if Fortran 90 had not made
> this restriction, I don't think we'd need intrinsic subroutines for
> defining send.  (By the way, don't defining sends introduce some
> nondeterminism?).

Sounds like we basically agree on nondeterminism.

Yes, a Fortran 90 definition would have eliminated the need for the
COPY_SEND intrinsic.  I think SUM_SEND and the others would still have
been needed, because Fortran 90 doesn't have C-style += operators.

Yes, COPY_SEND is nondeterministic (as defined now, it could probably
be changed).  SUM_SEND can be implemented deterministically on all
architectures I'm aware of as long as you keep the number of
processors constant.  A few have hefty performance penalties, though.
Changing the number of processors can change the answer due to
roundoff errors; then again, this is also true of the SUM intrinsic.
Changing data distributions probably has similar kinds of effects to
changing number of processors.


> >We need to figure out some restrictions on functions to avoid
> >ambiguities and problems.  Pres, weren't you going to circulate a
> >definition of user elemental functions?  If nothing else, that sounds
> >like a good starting point.
> 
> One restriction I would favor is to prohibit side-effects in functions
> used in this FORALLs (such as assigning to B(I) in your MESSY_FUNCTION).

I assume you mean at least "all side effects visible in the caller"
(we'll discuss side-effects in SAVE variables later).  Sounds rational
to me.  The floor is open for users to complain that this is too
restrictive...

> >    It appears that nested single-statement FORALL is allowed, but
> >	nested multi-statement FORALL is not.  What is the reason for
> >	this?
> 
> I would include block FORALL as a form of FORALL assignment.
> That is, I did not intend to exclude nested multi-statement FORALL.
> 
> Rex Page

My misunderstanding.  Objection overruled.

	Chuck

From joelw@mozart.convex.com  Wed Jun 24 15:59:29 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA08788); Wed, 24 Jun 92 15:59:29 CDT
Received: from convex.convex.com by erato.cs.rice.edu (AA08447); Wed, 24 Jun 92 15:59:27 CDT
Received: from mozart.convex.com by convex.convex.com (5.64/1.35)
	id AA09945; Wed, 24 Jun 92 15:59:20 -0500
Received: by mozart.convex.com (5.64/1.28)
	id AA20981; Wed, 24 Jun 92 15:59:17 -0500
From: joelw@mozart.convex.com (Joel Williamson)
Message-Id: <9206242059.AA20981@mozart.convex.com>
Subject: Re: Forall
To: wu@cs.buffalo.edu (Min-You Wu)
Date: Wed, 24 Jun 92 15:59:16 CDT
Cc: chk@cs.rice.edu, gls@think.com, hpff-forall@erato.cs.rice.edu,
        wu@cs.buffalo.edu
In-Reply-To: <9206241917.AA13117@ruby.cs.Buffalo.EDU>; from "Min-You Wu" at Jun 24, 92 3:17 pm
X-Mailer: ELM [version 2.3 PL11]

Min-You Wu writes:
> 
> I agree that IF with INDENPENDENT doesn't work for this example because
> of interactions between branches.   Let's find out how can we solve
> this problem.  It seems Fortran D semantics is a good solution, however,
> Fortran D semantics can only be apply to using *old* values.  Of course 
> the above example is a good one for Fortran D semantics.  How about the 
> following example?
> 
>  	FORALL ( I = 1:N )
>           DO J = 1,N
>    	    IF ( ABS(A(I)) < EPS ) THEN
>  	      A(I) = 0.0
>  	    ELSE
>  	      A(I) = A(I-1) + A(I+1)
>  	    END IF
>           END DO
>  	END FORALL

Please explain to me the purpose of the "DO J = 1,N" loop.

Joel

From wu@cs.buffalo.edu  Wed Jun 24 21:00:01 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA15284); Wed, 24 Jun 92 21:00:01 CDT
Received: from ruby.cs.Buffalo.EDU by erato.cs.rice.edu (AA08536); Wed, 24 Jun 92 20:59:56 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA13425; Wed, 24 Jun 92 21:59:45 EDT
Date: Wed, 24 Jun 92 21:59:45 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9206250159.AA13425@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, wu@cs.buffalo.edu
Subject: Re: Forall
Cc: gls@think.com, hpff-forall@erato.cs.rice.edu


> Leaving nested DO loops out of the discussion for a moment...
> 
> I agree that a reasonable semantics for IF is very likely to be
> reasonable for CASE.  Let's find one.
> 
> Unfortunately, "synchronization points" look like a bad starting point
> for defining IF semantics.  As soon as there are two statements in one
> branch of the IF, the following conflict pops up:
> 
> 	Users want synchronization between statements in the same branch
> 	Users do not want synchronization between the branches
> 
> Case study:
> 
> 	FORALL ( I = 1 : N )
> 	  IF ( A(I) < EPS ) THEN		! S0
> 	    A(I) = 0.0				! S1
> 	    B(I) = 0.0				! S2
> 	  ELSE
> 	    A(I) = (A(I-1) + A(I+1)) / 2	! S3
> 	    B(I) = A(I) * 3			! S4
> 	  END IF
> 	END FORALL
> 
> As a programmer, I expect S4 to use the value of A(I) that S3
> computes.  I do not expect S3 to see the values of A(I-1) and A(I+1)
> assigned by S1.  (It's not clear to me what users expect to happen if
> S4 is "B(I) = A(I-1)" instead.  But let's handle that after we get
> agreement on this case.)
> 
> The only interpretation that I can see to get out of this conflict
> involves creating copies of any objects referenced in both branches of
> the IF, then executing each branch using its own copy.  Some language
> has to be added to the effect that the same element cannot be assigned
> by both branches of the IF (or a resolution rule for those conflicts)
> as well.  I'll write something up formally if there's interest.
> This is not Fortran D semantics, because each branch still has SIMD
> semantics (it's close, though).

Case study:
(1) Let me use barriers for synchronization, and barriers in THEN and ELSE
branches are 1-1 corresponding:
        THEN                             ELSE
          BARRIER  --------------------    BARRIER
          BARRIER  --------------------    BARRIER
          BARRIER  --------------------    BARRIER

Here is the example:

 	FORALL ( I = 1 : N )
 	  IF ( A(I) < EPS ) THEN		! S0
            BARRIER
 	    A(I) = 0.0				! S1
 	    B(I) = 0.0				! S2
 	  ELSE
 	    TEMP1(I) = (A(I-1) + A(I+1)) / 2	! S3.1
            BARRIER
            A(I) = TEMP1(I)			! S3.2
 	    B(I) = A(I) * 3			! S4
 	  END IF
 	END FORALL

S1 will not start execution before S3.1 completes. 

We don't need to create copies for every object.  A copy is necessary
only for the cases that a statement reads and writes the same array 
at different grid points, like A(I) = A(I-1) + A(I+1).  
(when interpreting the single statement forall, we must create temporary
for this statement, it is the similar case, though not the same.)

(2) Let's take a look at the case B(I) = A(I-1).  Assume the user wants
to use the new value assigned by S1, then:

 	FORALL ( I = 1 : N )
 	  IF ( A(I) < EPS ) THEN		! S0
            BARRIER
 	    A(I) = 0.0				! S1
            BARRIER
 	    B(I) = 0.0				! S2
 	  ELSE
 	    TEMP1(I) = (A(I-1) + A(I+1)) / 2	! S3.1
            BARRIER
            A(I) = TEMP1(I)			! S3.2
            BARRIER
 	    B(I) = A(I-1)			! S4
 	  END IF
 	END FORALL

It the user want to use the old value, one more temporary is created:

 	FORALL ( I = 1 : N )
 	  IF ( A(I) < EPS ) THEN		! S0
            BARRIER
 	    A(I) = 0.0				! S1
 	    B(I) = 0.0				! S2
 	  ELSE
 	    TEMP1(I) = (A(I-1) + A(I+1)) / 2	! S3.1
            TEMP2(I) = A(I-1)    		! S4.1
            BARRIER
            A(I) = TEMP1(I)			! S3.2
 	    B(I) = TEMP2(I-1)			! S4
 	  END IF
 	END FORALL

Here I just show the general method.  Of course, the two temporaries
can be optimized by using one temporary.

> I'm not sure what this says in terms of synchronization points;
> probably there's a global synchronization before (and after?)
> evaluating the condition, a global synch after the END IF, and
> synchronizations around S1, S2, S3, and S4 *of only the iterations
> executing that branch*.  To the best of my recollection, we haven't
> needed to talk about partial synchronizations before, although they
> would certainly be used heavily in the underlying implementation on
> MIMD machines.
> 
> 
> 
> Straw poll for the group:
> 
> 1. Should HPF define a scalar WHERE statement (with semantics as in Guy
> Steele's proposal) in a FORALL...
> 	A) If we cannot agree on a semantics for IF?
> 	B) If we do agree on a semantics for IF that is different
> 	   from WHERE semantics?
> 	C) If we agree to use WHERE semantics for IF?
> 
> (My votes are A - yes, B - yes, C - no.  I don't think we'll agree to
> C, though...)

My votes are A - maybe, B - maybe, C - no. 

> 2. Which of the following statements should be allowed in a block FORALL?
> 	(Let's assume that assignment makes the cut :-)
> 	A) single-statement FORALL
> 	B) block FORALL
> 	C) array WHERE (i.e. current Fortran 90 WHERE)
> 	D) scalar WHERE (i.e. Marc's proposal)
> 	E) IF-THEN-ELSE with FORALL-invariant conditions
> 	F) IF-THEN-ELSE, conditions can depend on FORALL indices
> 	G) DO loop with FORALL-invariant bounds
> 	H) DO loop, bounds can depend on FORALL indices
> 	J) GOTO
> 	K) STOP and PAUSE
> 	L) I/O statements (OPEN, CLOSE, READ, WRITE, ...)
> 	M) ALLOCATE and DEALLOCATE
> 	N) CALL
> 
> (
> My reading of the current proposals is:
> 	Snir - all of above except F and H
> 	Page - A for sure; maybe B, C, and D
> 	Me - A, B, C, D for sure; E and F if we can agree on some meaning
> )

My votes are A, B, C, E, F, G, H, N - yes, D - maybe or no, 
J, K, M - don't know, L - don't know the meaning, parallel I/O?

> 
> 	Chuck
> 

From wu@cs.buffalo.edu  Wed Jun 24 21:13:22 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA15452); Wed, 24 Jun 92 21:13:22 CDT
Received: from ruby.cs.Buffalo.EDU by erato.cs.rice.edu (AA08542); Wed, 24 Jun 92 21:13:17 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA13443; Wed, 24 Jun 92 22:13:10 EDT
Date: Wed, 24 Jun 92 22:13:10 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9206250213.AA13443@ruby.cs.Buffalo.EDU>
To: joelw@mozart.convex.com, wu@cs.buffalo.edu
Subject: Re: Forall
Cc: chk@cs.rice.edu, gls@think.com, hpff-forall@erato.cs.rice.edu

> 
> Min-You Wu writes:
> > 
> > I agree that IF with INDENPENDENT doesn't work for this example because
> > of interactions between branches.   Let's find out how can we solve
> > this problem.  It seems Fortran D semantics is a good solution, however,
> > Fortran D semantics can only be apply to using *old* values.  Of course 
> > the above example is a good one for Fortran D semantics.  How about the 
> > following example?
> > 
> >  	FORALL ( I = 1:N )
> >           DO J = 1,N
> >    	    IF ( ABS(A(I)) < EPS ) THEN
> >  	      A(I) = 0.0
> >  	    ELSE
> >  	      A(I) = A(I-1) + A(I+1)
> >  	    END IF
> >           END DO
> >  	END FORALL
> 
> Please explain to me the purpose of the "DO J = 1,N" loop.
> 
> Joel
> 

Well, just make a code in which not only old values but also the new 
values assigned in FORALL are needed.

Min-You

From zrlp09@trc.amoco.com  Thu Jun 25 08:26:20 1992
Received: from noc.msc.edu by cs.rice.edu (AA00732); Thu, 25 Jun 92 08:26:20 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA13657; Thu, 25 Jun 92 08:26:19 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA10329; Thu, 25 Jun 92 08:26:17 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA18747; Thu, 25 Jun 92 08:26:13 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA26751; Thu, 25 Jun 92 08:26:12 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA03585; Thu, 25 Jun 92 08:26:10 CDT
Message-Id: <9206251326.AA03585@backus.trc.amoco.com>
To: chk@cs.rice.edu
Cc: hpff-forall@cs.rice.edu
Subject: Re: Forall 
In-Reply-To: Your message of Wed, 24 Jun 92 15:27:41 -0500.
             <9206242027.AA08420@erato.cs.rice.edu> 
Date: Thu, 25 Jun 92 08:26:09 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>

> > 
> > 	FORALL ( I = 1:N )
> > 	  IF ( ABS(A(I)) < EPS ) THEN
> > 	    A(I) = 0.0
> > 	  ELSE
> > 	    A(I) = A(I-1) + A(I+1)
> > 	  END IF
> > 	END FORALL
> > 
> > ...

>	Users want synchronization between statements in the same branch
>	Users do not want synchronization between the branches


Right.  Are we also agreed that the above FORALL containing IF-THEN-ELSE
should have the same meaning the following?
   TEMP = A(1:N)
   FORALL (I=1:N, ABS(TEMP(I))< EPS) A(I)=0.0
   FORALL (I=1:N, ABS(TEMP(I))>=EPS) A(I)=TEMP(I-1)+TEMP(I+1)

I think this would be the most appealing analog to the meaning of
array assignments without IF-THEN-ELSE, such as the following:
   FORALL (I=1:N) A(I) = A(I-1) + A(I+1) ! both forms use current values,
   A(1:N) = A(0:N-1) + A(2:N+1)          ! build an array, then assign

            
Straw poll for the group:

rlp votes:

1. Should HPF define a scalar WHERE statement (with semantics as in Guy
Steele's proposal) in a FORALL...
 no              A) If we cannot agree on a semantics for IF?
 probably no     B) If we do agree on a semantics for IF that is different
	            from WHERE semantics?
 no              C) If we agree to use WHERE semantics for IF?

2. Which of the following statements should be allowed in a block FORALL?
	(Let's assume that assignment makes the cut :-)
 yes    A) single-statement FORALL
 yes	B) block FORALL
 no	C) array WHERE (i.e. current Fortran 90 WHERE)
 no	D) scalar WHERE (i.e. Marc's proposal)
 maybe	E) IF-THEN-ELSE with FORALL-invariant conditions
 maybe	F) IF-THEN-ELSE, conditions can depend on FORALL indices
 maybe	G) DO loop with FORALL-invariant bounds
 maybe	H) DO loop, bounds can depend on FORALL indices
 no	J) GOTO
 no	K) STOP and PAUSE
 maybe	L) I/O statements (OPEN, CLOSE, READ, WRITE, ...)
 maybe	M) ALLOCATE and DEALLOCATE
 maybe	N) CALL

     ("Maybe" goes to yes if adding these facilities makes it possible
      to avoid local subroutines.)

 - Rex Page

From zrlp09@trc.amoco.com  Thu Jun 25 09:03:12 1992
Received: from noc.msc.edu by cs.rice.edu (AA01226); Thu, 25 Jun 92 09:03:12 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA15778; Thu, 25 Jun 92 09:03:12 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA27836; Thu, 25 Jun 92 09:03:09 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA18898; Thu, 25 Jun 92 09:03:05 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA27239; Thu, 25 Jun 92 09:03:00 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA03627; Thu, 25 Jun 92 09:02:41 CDT
Message-Id: <9206251402.AA03627@backus.trc.amoco.com>
To: chk@cs.rice.edu
Cc: hpff-forall@cs.rice.edu
Subject: Re: triangular array access and assignment-only block foralls 
In-Reply-To: Your message of Wed, 24 Jun 92 15:57:00 -0500.
             <9206242057.AA08441@erato.cs.rice.edu> 
Date: Thu, 25 Jun 92 09:02:40 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>

Chuck said:
  Yes, a Fortran 90 definition would have eliminated the need for the
  COPY_SEND intrinsic.  I think SUM_SEND and the others would still have
  been needed, because Fortran 90 doesn't have C-style += operators.

It doesn't seem to me that the lack of += operators should be an obstacle.
What's wrong with  A(v)=A(v)+B instead of A(v)+=B?

Chuck said:
  Yes, COPY_SEND is nondeterministic (as defined now, it could probably
  be changed).  SUM_SEND can be implemented deterministically on all
  architectures I'm aware of as long as you keep the number of
  processors constant.  A few have hefty performance penalties, though.

Performance is the key issue.  Nondeterminism similarly can be defined away
in assignments like A(v)=... when v contains duplicates, but why restrict
the processor?  A sequential program is always an option when the programmer
needs the extra synchonization.  Defining away the nondeterminism sacrifices
performance in the cases where the programmer is satisfied with any
interleaving of the updates.  Another advantage of permitting nondeterminism
over defining it away is that the definition will be clumsy and difficult
to remember; description of the nondeterministic semantics will be simpler
(operationally, at least).

On side effects:
> One restriction I would favor is to prohibit side-effects in functions
> used in this FORALLs (such as assigning to B(I) in your MESSY_FUNCTION).

Chuck responded:
  I assume you mean at least "all side effects visible in the caller"
  (we'll discuss side-effects in SAVE variables later).  Sounds rational
  to me.  The floor is open for users to complain that this is too
  restrictive...

No, I mean side effects that change the subsequent result of the
computation.  This includes assignments to SAVE variables, writes,
reads, assignments to COMMON or MODULE variables, assignments to dummy
arguments, and probably a bunch of stuff I haven't thought of.

I don't want the processor to have to execute any statments in a function
at all if the processor can figure out the value of the expression
containing the funtion's invocation in some other way.  (I know you didn't
mean to open the floor to people who thought it wasn't restrictive enough,
Chuck, but as you can see I have strong feelings about this.) 


 - Rex Page

From wu@cs.buffalo.edu  Thu Jun 25 09:26:07 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA01791); Thu, 25 Jun 92 09:26:07 CDT
Received: from ruby.cs.Buffalo.EDU by erato.cs.rice.edu (AA08802); Thu, 25 Jun 92 09:26:01 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA13717; Thu, 25 Jun 92 10:25:49 EDT
Date: Thu, 25 Jun 92 10:25:49 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9206251425.AA13717@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, wu@cs.buffalo.edu
Subject: Re: Forall
Cc: gls@think.com, hpff-forall@erato.cs.rice.edu

> 
> Straw poll for the group:
> 
> 1. Should HPF define a scalar WHERE statement (with semantics as in Guy
> Steele's proposal) in a FORALL...
> 	A) If we cannot agree on a semantics for IF?
> 	B) If we do agree on a semantics for IF that is different
> 	   from WHERE semantics?
> 	C) If we agree to use WHERE semantics for IF?
> 
> (My votes are A - yes, B - yes, C - no.  I don't think we'll agree to
> C, though...)
> 
> 2. Which of the following statements should be allowed in a block FORALL?
> 	(Let's assume that assignment makes the cut :-)
> 	A) single-statement FORALL
> 	B) block FORALL
> 	C) array WHERE (i.e. current Fortran 90 WHERE)
> 	D) scalar WHERE (i.e. Marc's proposal)
> 	E) IF-THEN-ELSE with FORALL-invariant conditions
> 	F) IF-THEN-ELSE, conditions can depend on FORALL indices
> 	G) DO loop with FORALL-invariant bounds
> 	H) DO loop, bounds can depend on FORALL indices
> 	J) GOTO
> 	K) STOP and PAUSE
> 	L) I/O statements (OPEN, CLOSE, READ, WRITE, ...)
> 	M) ALLOCATE and DEALLOCATE
> 	N) CALL
> 
> (
> My reading of the current proposals is:
> 	Snir - all of above except F and H
> 	Page - A for sure; maybe B, C, and D
> 	Me - A, B, C, D for sure; E and F if we can agree on some meaning
> )
> 
> 	Chuck
> 

I suggest you distinguish "array CALL" and "scalar CALL".  That is:
 
      FORALL(I=1:N)
        A(I) = B(I) - 1
        CALL FOO(A)
      ENDFORALL

or

      FORALL(I=1:N)
        A(I) = B(I) - 1
        CALL FOO(A(I))
      ENDFORALL

Min-You 

From chk@erato.cs.rice.edu  Thu Jun 25 10:18:42 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA03254); Thu, 25 Jun 92 10:18:42 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08830); Thu, 25 Jun 92 10:18:39 CDT
Message-Id: <9206251518.AA08830@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Subject: Re: Forall 
In-Reply-To: Your message of Wed, 24 Jun 92 21:59:45 -0400.
             <9206250159.AA13425@ruby.cs.Buffalo.EDU> 
Date: Thu, 25 Jun 92 10:18:37 -0500
From: chk@erato.cs.rice.edu


> Date: Wed, 24 Jun 92 21:59:45 EDT
> From: wu@cs.buffalo.edu (Min-You Wu)
> Subject: Re: Forall
> 
> > I agree that a reasonable semantics for IF is very likely to be
> > reasonable for CASE.  Let's find one.
> > 
> > Unfortunately, "synchronization points" look like a bad starting point
> > for defining IF semantics.  As soon as there are two statements in one
> > branch of the IF, the following conflict pops up:
> > 
> > 	Users want synchronization between statements in the same branch
> > 	Users do not want synchronization between the branches
> > 
> > Case study:
> > 
> > 	FORALL ( I = 1 : N )
> > 	  IF ( A(I) < EPS ) THEN		! S0
> > 	    A(I) = 0.0				! S1
> > 	    B(I) = 0.0				! S2
> > 	  ELSE
> > 	    A(I) = (A(I-1) + A(I+1)) / 2	! S3
> > 	    B(I) = A(I) * 3			! S4
> > 	  END IF
> > 	END FORALL
> > 
> > ...
> 
> Case study:
> (1) Let me use barriers for synchronization, and barriers in THEN and ELSE
> branches are 1-1 corresponding:
>         THEN                             ELSE
>           BARRIER  --------------------    BARRIER
>           BARRIER  --------------------    BARRIER
>           BARRIER  --------------------    BARRIER

Let me get this straight.  Are you proposing yet another new
statement/directive?  I'm assuming so.

If BARRIER is supposed to be a directive, then it changes the
semantics of the program.  It's not asserting that no dependence
exists, it is forcing a particular resolution of the dependence.
We've been over this ground before: directives cannot affect the
correctness of a program.

If BARRIER is supposed to be a new statement, then we're adding even
more syntax, and adding really major complexity to the implementation.
We now have to deal with nested IFs (each level of which may need a
barrier structure), BARRIERs outside of FORALL (inside DO INDEPENDENT,
for example), ...  We seem to be complicating our lives a lot, and
it's not clear to me that the gain is worth it.

I'll mention in passing that if we add BARRIER, then users will want
EVENTs and all the other PCF synchronization statements.  Plus, I'm
not convinced that BARRIER here is the same as the "usual" BARRIER
statement - I thought that generally all *processors* had to reach the
*same* barrier, not all *iterations* reaching *different* BARRIERs.

Finally, what is your interpretation of the case study without
BARRIERs?  Non-standard-conforming?  Nondeterminate?  Some specific
combination of BARRIERs?  Even assuming that we accept BARRIER, we
have to know what happens when the programmer forgets to put them in.

> Here is the example:
> ...
> We don't need to create copies for every object.  A copy is necessary
> only for the cases that a statement reads and writes the same array 
> at different grid points, like A(I) = A(I-1) + A(I+1).  
> (when interpreting the single statement forall, we must create temporary
> for this statement, it is the similar case, though not the same.)

I probably didn't make myself clear.  My semantics would read
something like "The effect of an IF statement nested within a FORALL
would be as if two copies of all data referenced in the IF were copied
before evaluation of the condition, and each branch of the IF would
execute using its own copy."  This is similar to the scalarization of
FORALL itself, which generates half a dozen temporary arrays; a real
implementation will minimize this memory use.

> (2) Let's take a look at the case B(I) = A(I-1).  Assume the user wants
> to use the new value assigned by S1, then:
> ...

I agree that BARRIERs allow both of the reasonable deterministic
interpretations of this case.  If we assume that programmers want this
power, Min-You's proposal seems to provide it.  My question is, how
much control do we want to give programers, and how complex are we
willing to make the language to handle that?

> > Straw poll for the group:
>... 
> > 2. Which of the following statements should be allowed in a block FORALL?
> > 	(Let's assume that assignment makes the cut :-)
> > 	A) single-statement FORALL
> > 	B) block FORALL
> > 	C) array WHERE (i.e. current Fortran 90 WHERE)
> > 	D) scalar WHERE (i.e. Marc's proposal)
> > 	E) IF-THEN-ELSE with FORALL-invariant conditions
> > 	F) IF-THEN-ELSE, conditions can depend on FORALL indices
> > 	G) DO loop with FORALL-invariant bounds
> > 	H) DO loop, bounds can depend on FORALL indices
> > 	J) GOTO
> > 	K) STOP and PAUSE
> > 	L) I/O statements (OPEN, CLOSE, READ, WRITE, ...)
> > 	M) ALLOCATE and DEALLOCATE
> > 	N) CALL
> 
> My votes are A, B, C, E, F, G, H, N - yes, D - maybe or no, 
> J, K, M - don't know, L - don't know the meaning, parallel I/O?

Sentiment seems to be running strongly in favor of A, B, and C; would
someone besides Min-You and myself comment on D?

For case H:
Min-You, where do you envision the synchronization points in this
example:
	FORALL ( K = 2:N )
	  DO J = 1, K
	    A(K) = A(K) + A(J)
	  END DO
	END FORALL
Fortran D semantics would produce A(I) = SUM(A(1:I))
SIMD semantics, masking K iterations that had finished their DO loops,
would give tmp=SUM_PREFIX(A(1:N)); A(I)=SUM(tmp(1:I)), I think
PCF semantics would produce seriously nondeterminate behavior

Note that SIMD and PCF semantics produce different results for
	FORALL ( K = 2:N )
	  DO J = K, 1, -1
	    A(K) = A(K) + A(J)
	  END DO
	END FORALL
(and I'm not talking about roundoff behavior!)

	Chuck

From joelw@mozart.convex.com  Thu Jun 25 10:18:45 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA03256); Thu, 25 Jun 92 10:18:45 CDT
Received: from convex.convex.com by erato.cs.rice.edu (AA08831); Thu, 25 Jun 92 10:18:42 CDT
Received: from mozart.convex.com by convex.convex.com (5.64/1.35)
	id AA24097; Thu, 25 Jun 92 10:18:40 -0500
Received: by mozart.convex.com (5.64/1.28)
	id AA27261; Thu, 25 Jun 92 10:18:39 -0500
From: joelw@mozart.convex.com (Joel Williamson)
Message-Id: <9206251518.AA27261@mozart.convex.com>
Subject: Re: Forall
To: chk@cs.rice.edu
Date: Thu, 25 Jun 92 10:18:38 CDT
Cc: hpff-forall@erato.cs.rice.edu (HPFF FORALL Group)
In-Reply-To: <9206242027.AA08420@erato.cs.rice.edu>; from "chk@cs.rice.edu" at Jun 24, 92 3:27 pm
X-Mailer: ELM [version 2.3 PL11]

chk@cs.rice.edu writes:
> 
> 
> Straw poll for the group:
> 
> 1. Should HPF define a scalar WHERE statement (with semantics as in Guy
> Steele's proposal) in a FORALL...
> 	A) If we cannot agree on a semantics for IF?
> 	B) If we do agree on a semantics for IF that is different
> 	   from WHERE semantics?
> 	C) If we agree to use WHERE semantics for IF?
> 
> (My votes are A - yes, B - yes, C - no.  I don't think we'll agree to
> C, though...)
> 
> 2. Which of the following statements should be allowed in a block FORALL?
> 	(Let's assume that assignment makes the cut :-)
> 	A) single-statement FORALL
> 	B) block FORALL
> 	C) array WHERE (i.e. current Fortran 90 WHERE)
> 	D) scalar WHERE (i.e. Marc's proposal)
> 	E) IF-THEN-ELSE with FORALL-invariant conditions
> 	F) IF-THEN-ELSE, conditions can depend on FORALL indices
> 	G) DO loop with FORALL-invariant bounds
> 	H) DO loop, bounds can depend on FORALL indices
> 	J) GOTO
> 	K) STOP and PAUSE
> 	L) I/O statements (OPEN, CLOSE, READ, WRITE, ...)
> 	M) ALLOCATE and DEALLOCATE
> 	N) CALL
> 
> (
> My reading of the current proposals is:
> 	Snir - all of above except F and H
> 	Page - A for sure; maybe B, C, and D
> 	Me - A, B, C, D for sure; E and F if we can agree on some meaning
> )
> 
> 	Chuck
> 

I believe that FORALL should be constrained to act as just a flexible
extension of triplet notation and nothing more.  Most of the proposed
extensions to this notion seem to be trying to forcefit MIMD semantics
into a SIMD construct.  I strongly believe that a "high performance
Fortran" needs these semantics, but not in FORALL.  Within our current
set of proposals I think that all else should be relegated to
INDEPENDENT DO loops.  Beyond that I personally believe that an
amalgamation of HPF and X3H5 is what the user community will ultimately
demand.  (Just to thoroughly muddy the waters.)

Best regards,

Joel Williamson


From shapiro@think.com  Thu Jun 25 10:54:10 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA04116); Thu, 25 Jun 92 10:54:10 CDT
Received: from mail.think.com by erato.cs.rice.edu (AA08874); Thu, 25 Jun 92 10:54:04 CDT
Return-Path: <shapiro@Think.COM>
Received: from Django.Think.COM by mail.think.com; Thu, 25 Jun 92 11:54:00 -0400
From: Richard Shapiro <shapiro@think.com>
Received: by django.think.com (4.1/Think-1.2)
	id AA07186; Thu, 25 Jun 92 11:53:59 EDT
Date: Thu, 25 Jun 92 11:53:59 EDT
Message-Id: <9206251553.AA07186@django.think.com>
To: joelw@mozart.convex.com
Cc: chk@cs.rice.edu, hpff-forall@erato.cs.rice.edu
In-Reply-To: Joel Williamson's message of Thu, 25 Jun 92 10:18:38 CDT <9206251518.AA27261@mozart.convex.com>
Subject: Forall


   I believe that FORALL should be constrained to act as just a flexible
   extension of triplet notation and nothing more.  Most of the proposed
   extensions to this notion seem to be trying to forcefit MIMD semantics
   into a SIMD construct.  I strongly believe that a "high performance
   Fortran" needs these semantics, but not in FORALL.  Within our current
   set of proposals I think that all else should be relegated to
   INDEPENDENT DO loops.  Beyond that I personally believe that an
   amalgamation of HPF and X3H5 is what the user community will ultimately
   demand.  (Just to thoroughly muddy the waters.)

   Best regards,

   Joel Williamson

I agree with the above statement as well.  Given our ambitious schedule, I
don't see us resolving the MIMD issues very well. We should make sure we
have a well-defined escape mechanism, and come back to MIMD in HPF 2.


		Richard Shapiro,
		Thinking Machines Corporation
		(shapiro@think.com)

From @eros.uknet.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Thu Jun 25 12:47:33 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA09800); Thu, 25 Jun 92 12:47:33 CDT
Via: uk.ac.uknet-relay; Thu, 25 Jun 1992 18:47:17 +0100
Received: from ecs.soton.ac.uk by eros.uknet.ac.uk via JANET with NIFTP (PP) 
          id <23651-0@eros.uknet.ac.uk>; Thu, 25 Jun 1992 18:18:33 +0100
Via: camra.ecs.soton.ac.uk; Thu, 25 Jun 92 18:14:35 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Thu, 25 Jun 92 18:19:45 BST
Date: Thu, 25 Jun 92 18:17:13 BST
Message-Id: <6071.9206251717@bacchus.ecs.soton.ac.uk>
To: chk@cs.rice.edu
Subject: Re: Forall
Cc: hpff-forall@cs.rice.edu

From chk@edu.rice.cs Wed Jun 24 21:46:24 1992

> Unfortunately, "synchronization points" look like a bad starting point
> for defining IF semantics.  As soon as there are two statements in one
> branch of the IF, the following conflict pops up:
> 
> 	Users want synchronization between statements in the same branch
> 	Users do not want synchronization between the branches
> 
> Case study:
> 
> 	FORALL ( I = 1 : N )
> 	  IF ( A(I) < EPS ) THEN		! S0
> 	    A(I) = 0.0				! S1
> 	    B(I) = 0.0				! S2
> 	  ELSE
> 	    A(I) = (A(I-1) + A(I+1)) / 2	! S3
> 	    B(I) = A(I) * 3			! S4
> 	  END IF
> 	END FORALL
> 
> As a programmer, I expect S4 to use the value of A(I) that S3
> computes.  I do not expect S3 to see the values of A(I-1) and A(I+1)
> assigned by S1.  (It's not clear to me what users expect to happen if
> S4 is "B(I) = A(I-1)" instead.  But let's handle that after we get
> agreement on this case.)

This seems to be a case for allowing conditional expressions (as in C)
in forall-assignments, which is really what you want to express 
in the above example.  (I proposed this once before in the context of
alignment and distribution directives).  Using these, the above example
without stmts S2 and S4 could be written:

 	FORALL ( I = 1 : N )
 	    A(I) = (A(I) < EPS) ? 0.0 : (A(I-1) + A(I+1)) / 2
 	END FORALL

The full example becomes:

 	FORALL ( I = 1 : N )
 	    MASK(I) = (A(I) < EPS)
 	    A(I) = MASK (I) ? 0.0 : (A(I-1) + A(I+1)) / 2
 	    B(I) = MASK (I) ? 0.0 : 3  * A(I)
 	END FORALL

Admittedly it's a nuisance to introduce the mask, but I expect
it must be done implicitly anyway to implement this example.

I don't know how serious I am about this -- it's just my initial 
reaction on seeing this example.  I do have a general feeling that
it's better to introduce new syntax to express new semantics,
rather than overloading existing syntax by making its semantics
context dependent (which many users will find very confusing)

If you introduce the conditional expressions for this case, then
the other case, where you want S3 to use the *new* values computed by
S1, can be expressed with the IF construct using its normal meaning:

 	FORALL ( I = 1 : N )
 	  IF ( A(I) < EPS ) THEN		! S0
 	    A(I) = 0.0				! S1
 	  ELSE
 	    A(I) = (A(I-1) + A(I+1)) / 2	! S3
 	  END IF
 	END FORALL

Then you don't need to introduce a scalar WHERE construct within
FORALL for this case, which I also think should be avoided on the
same grounds as above.


> Straw poll for the group:
 
 1. Should HPF define a scalar WHERE statement (with semantics as in Guy
 Steele's proposal) in a FORALL...
no 	A) If we cannot agree on a semantics for IF?
no 	B) If we do agree on a semantics for IF that is different
 	   from WHERE semantics?
no	C) If we agree to use WHERE semantics for IF?
 
(I hope that A and B don't arise)
 
 2. Which of the following statements should be allowed in a block FORALL?
 	(Let's assume that assignment makes the cut :-)
yes 	A) single-statement FORALL
yes 	B) block FORALL
yes	C) array WHERE (i.e. current Fortran 90 WHERE)
no 	D) scalar WHERE (i.e. Marc's proposal)
yes 	E) IF-THEN-ELSE with FORALL-invariant conditions
yes 	F) IF-THEN-ELSE, conditions can depend on FORALL indices
 	G) DO loop with FORALL-invariant bounds
 	H) DO loop, bounds can depend on FORALL indices
 	J) GOTO
 	K) STOP and PAUSE
 	L) I/O statements (OPEN, CLOSE, READ, WRITE, ...)
	M) ALLOCATE and DEALLOCATE
yes 	N) CALL

I haven't really thought about the others.

                    John.


P.S.1: A colleague of mine, David Pritchard, suggests repeating labels
to indicate when stmts in different branches of an IF are to
be executed concurrently; thus...

 	FORALL ( I = 1 : N )
 	  IF ( A(I) < EPS ) THEN		! S0
1 	    A(I) = 0.0				! S1
2 	    B(I) = 0.0				! S2
 	  ELSE
1 	    A(I) = (A(I-1) + A(I+1)) / 2	! S3
2 	    B(I) = A(I) * 3			! S4
 	  END IF
 	END FORALL
 
corresponds to your desired interpretation.  I don't think he's
very serious about it though! :-)

P.S.2:  I plan to input something about user-defined elemental 
procedures soon.  It's just a matter of knuckling down to it!


From chk@erato.cs.rice.edu  Thu Jun 25 13:53:47 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA15484); Thu, 25 Jun 92 13:53:47 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08927); Thu, 25 Jun 92 13:53:44 CDT
Message-Id: <9206251853.AA08927@erato.cs.rice.edu>
To: hpff-forall@cs.rice.edu
Subject: Re: triangular array access and assignment-only block foralls 
In-Reply-To: Your message of Thu, 25 Jun 92 08:47:48 -0500.
             <9206251347.AA05401@willow14.cray.com> 
Date: Thu, 25 Jun 92 13:53:40 -0500
From: chk@erato.cs.rice.edu


Andy, I noticed your mail didn't go to hpff-forall; hope you don't
mind me forwarding it

> Date: Thu, 25 Jun 92 08:47:48 CDT
> From: meltzer@tamarack.cray.com (Andy Meltzer)
> To: chk@cs.rice.edu
> Subject: Re: triangular array access and assignment-only block foralls
> 
> I haven't been available to read the hpff mailings for a few weeks, but
> it strikes me that Chuck's statement goes to the heart of one of the
> things that we have to decide before we should go much further.
> 
> > Like Ken, I don't want nondeterminism in the HPF language.  The cases
> > that Rex calls "acceptable non-determinism" I would label as
> > "non-standard-conforming".  [Asbestos suit on]
> 
> I don't think there is much desire in HPF for non-determinism (I could be
> wrong, but I seem to be one of the people whose proposals come closest to
> it).  The issue is how far we go to ensure that the programmer cannot
> introduce non-determinism.  
> 
> Now, to put my two cents in:  I think that non-deterministic programs should
> be labelled "non-standard conforming" or even "erroneous", but I don't
> think that we should mandate any semantics which restricts the user from
> doing things that might cause these problems.  When we restrict the 
> semantics, we disable an enormous number of useful constructs.  
> 
> > Acceptable non-determinism in a block forall becomes more difficult
> > to describe in the presence of statements other than assignments.
> > CALLs are especially difficult to deal with.  (Rex Page)
> 
> I'd suggest that the only restriction of this sort that we want for FORALL 
> is one which states something like (using Marc's proposal for syntax):
> 
> 	"Any storage location which is updated by one forall-block 
> 	 (in a function, subroutine, or assignment) may not be 
> 	 updated or read by any other forall-block."
> 
> I think this is clear and it leaves a very powerful construct for the user.
> It also makes illegal all non-deterministic programs.  Its only drawback
> (albeit a major one) is that the compiler cannot enforce it.
> 
> BTW, this restriction removes the ambiguity in Chuck's IF example.  It 
> makes it "erroneous".
> 
> > 2. Which of the following statements should be allowed in a block FORALL?
> > 	(Let's assume that assignment makes the cut :-)
> > 	A) single-statement FORALL
> > 	B) block FORALL
> > 	C) array WHERE (i.e. current Fortran 90 WHERE)
> > 	D) scalar WHERE (i.e. Marc's proposal)
> > 	E) IF-THEN-ELSE with FORALL-invariant conditions
> > 	F) IF-THEN-ELSE, conditions can depend on FORALL indices
> > 	G) DO loop with FORALL-invariant bounds
> > 	H) DO loop, bounds can depend on FORALL indices
> > 	J) GOTO
> > 	K) STOP and PAUSE
> > 	L) I/O statements (OPEN, CLOSE, READ, WRITE, ...)
> > 	M) ALLOCATE and DEALLOCATE
> > 	N) CALL
> 
> Allowed: Everything except H, K.
> (only restricting K because I don't know how to do a good job at it)
> 
> 
> 
> 
> 						Andy Meltzer


From chk@erato.cs.rice.edu  Thu Jun 25 14:04:44 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA15817); Thu, 25 Jun 92 14:04:44 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08936); Thu, 25 Jun 92 14:04:42 CDT
Message-Id: <9206251904.AA08936@erato.cs.rice.edu>
To: meltzer@tamarack.cray.com (Andy Meltzer)
Cc: hpff-forall@cs.rice.edu
Subject: Re: triangular array access and assignment-only block foralls 
In-Reply-To: Your message of Thu, 25 Jun 92 08:47:48 -0500.
             <9206251347.AA05401@willow14.cray.com> 
Date: Thu, 25 Jun 92 14:04:41 -0500
From: chk@erato.cs.rice.edu


> Date: Thu, 25 Jun 92 08:47:48 CDT
> From: meltzer@tamarack.cray.com (Andy Meltzer)
> Subject: Re: triangular array access and assignment-only block foralls
> 
> ...
> I don't think there is much desire in HPF for non-determinism (I could be
> wrong, but I seem to be one of the people whose proposals come closest to
> it).  The issue is how far we go to ensure that the programmer cannot
> introduce non-determinism.  
> 
> Now, to put my two cents in:  I think that non-deterministic programs should
> be labelled "non-standard conforming" or even "erroneous", but I don't
> think that we should mandate any semantics which restricts the user from
> doing things that might cause these problems.  When we restrict the 
> semantics, we disable an enormous number of useful constructs.  

This sounds like a vote for "non-standard-conforming" - if we make
those programs "erroneous" then they have to be rejected (just like
subscripts out of bounds have to be detected).

> ...
> I'd suggest that the only restriction of this sort that we want for FORALL 
> is one which states something like (using Marc's proposal for syntax):
> 
> 	"Any storage location which is updated by one forall-block 
> 	 (in a function, subroutine, or assignment) may not be 
> 	 updated or read by any other forall-block."
> 
> I think this is clear and it leaves a very powerful construct for the user.
> It also makes illegal all non-deterministic programs.  Its only drawback
> (albeit a major one) is that the compiler cannot enforce it.
> 

Uh, wouldn't that restriction also make this non-standard-conforming?
	FORALL ( I = 2:N-1 )
	  A(I) = A(I) * 2		! S1
	  B(I) = A(I-1) + A(I+1)	! S2
	END FORALL
A(I) in S1 is read by S2 on different iterations...

Just pointing out we have to be careful in making these restrictions...

> > 2. Which of the following statements should be allowed in a block FORALL?
> > 	(Let's assume that assignment makes the cut :-)
> > 	A) single-statement FORALL
> > 	B) block FORALL
> > 	C) array WHERE (i.e. current Fortran 90 WHERE)
> > 	D) scalar WHERE (i.e. Marc's proposal)
> > 	E) IF-THEN-ELSE with FORALL-invariant conditions
> > 	F) IF-THEN-ELSE, conditions can depend on FORALL indices
> > 	G) DO loop with FORALL-invariant bounds
> > 	H) DO loop, bounds can depend on FORALL indices
> > 	J) GOTO
> > 	K) STOP and PAUSE
> > 	L) I/O statements (OPEN, CLOSE, READ, WRITE, ...)
> > 	M) ALLOCATE and DEALLOCATE
> > 	N) CALL
> 
> Allowed: Everything except H, K.
> (only restricting K because I don't know how to do a good job at it)
> 
> 						Andy Meltzer

Does that mean you can do a good job on this?
	X = 0.0
	FORALL ( I = 1:N )
	  IF (A(I) = 0.0) THEN
	    GOTO 100
	  END IF
	END FORALL
	X = 1.0
100	PRINT X
While we're on the subject, what's printed?

	Chuck

From zrlp09@trc.amoco.com  Thu Jun 25 14:22:14 1992
Received: from noc.msc.edu by cs.rice.edu (AA16819); Thu, 25 Jun 92 14:22:14 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA05179; Thu, 25 Jun 92 14:22:12 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA06487; Thu, 25 Jun 92 14:22:11 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA19878; Thu, 25 Jun 92 14:22:07 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA02934; Thu, 25 Jun 92 14:22:04 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA04610; Thu, 25 Jun 92 14:22:03 CDT
Message-Id: <9206251922.AA04610@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: forall, IF-THEN-ELSE
Date: Thu, 25 Jun 92 14:22:03 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>

John Merlin says:
      If you introduce the conditional expressions for this case, then
      the other case, where you want S3 to use the *new* values computed by
      S1, can be expressed with the IF construct using its normal meaning:

 	FORALL ( I = 1 : N )
 	  IF ( A(I) < EPS ) THEN		! S0
 	    A(I) = 0.0				! S1
 	  ELSE
 	    A(I) = (A(I-1) + A(I+1)) / 2	! S3
 	  END IF
 	END FORALL

I don't think what John calls the normal meaning of IF makes sense in this
context.

FORALL is not a loop.  It is a bunch of statement-blocks being carried out
simultaneously (conceptually, at least).

  - Rex


From chk@erato.cs.rice.edu  Thu Jun 25 14:42:29 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA17676); Thu, 25 Jun 92 14:42:29 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08986); Thu, 25 Jun 92 14:42:26 CDT
Message-Id: <9206251942.AA08986@erato.cs.rice.edu>
To: "Rex Page" <zrlp09@trc.amoco.com>
Cc: chk@cs.rice.edu, hpff-forall@cs.rice.edu
Subject: Re: triangular array access and assignment-only block foralls 
In-Reply-To: Your message of Thu, 25 Jun 92 09:02:40 -0500.
             <9206251402.AA03627@backus.trc.amoco.com> 
Date: Thu, 25 Jun 92 14:42:24 -0500
From: chk@erato.cs.rice.edu


> Subject: Re: triangular array access and assignment-only block foralls 
> Date: Thu, 25 Jun 92 09:02:40 -0500
> From: "Rex Page" <zrlp09@trc.amoco.com>
> 
> Chuck said:
>   Yes, a Fortran 90 definition would have eliminated the need for the
>   COPY_SEND intrinsic.  I think SUM_SEND and the others would still have
>   been needed, because Fortran 90 doesn't have C-style += operators.
> 
> It doesn't seem to me that the lack of += operators should be an obstacle.
> What's wrong with  A(v)=A(v)+B instead of A(v)+=B?

Consider the Fortran 90 interpretation of any array assignment
statement: first the RHS is evaluated, then the LHS is assigned.  By
this definition, the RHS evaluation doesn't depend on where it's
going; by the time you have to choose between COPY_SEND and SUM_SEND,
the expression has already been evaluated, and the (wrong) additions
done.

This could be fixed by a suitable (albeit more complex) interpretation
in the standard.

> Chuck said:
>   Yes, COPY_SEND is nondeterministic (as defined now, it could probably
>   be changed).  SUM_SEND can be implemented deterministically on all
>   architectures I'm aware of as long as you keep the number of
>   processors constant.  A few have hefty performance penalties, though.
> 
> Performance is the key issue.  Nondeterminism similarly can be defined away
> in assignments like A(v)=... when v contains duplicates, but why restrict
> the processor?  A sequential program is always an option when the programmer
> needs the extra synchonization.  Defining away the nondeterminism sacrifices
> performance in the cases where the programmer is satisfied with any
> interleaving of the updates.  Another advantage of permitting nondeterminism
> over defining it away is that the definition will be clumsy and difficult
> to remember; description of the nondeterministic semantics will be simpler
> (operationally, at least).

The performance penalties come in on machines where data moves among
processors dynamically (virtual paging, etc.); then it's hard to
control order of evaluation, hence overhead.  Of course, it's not
really clear what static data distributions mean on hardware like
that, anyway.

Numerical users: How important is controlling roundoff to you?  This
is the classic question "Do you want the right answers, or do you want
them fast?"

> On side effects:
> ...
> No, I mean side effects that change the subsequent result of the
> computation.  This includes assignments to SAVE variables, writes,
> reads, assignments to COMMON or MODULE variables, assignments to dummy
> arguments, and probably a bunch of stuff I haven't thought of.
> 
> I don't want the processor to have to execute any statments in a function
> at all if the processor can figure out the value of the expression
> containing the funtion's invocation in some other way.  (I know you didn't
> mean to open the floor to people who thought it wasn't restrictive enough,
> Chuck, but as you can see I have strong feelings about this.) 
> 
> 
>  - Rex Page

OK, a strong stand against *any* side effects.  Pending some
codification of this condition (anyone want to list all possible side
effects?), does anyone want to argue against this restriction?

	Chuck

From meltzer@tamarack.cray.com  Thu Jun 25 14:48:14 1992
Received: from timbuk.cray.com by cs.rice.edu (AA17785); Thu, 25 Jun 92 14:48:14 CDT
Received: from willow14.cray.com by timbuk.cray.com (4.1/CRI-MX 1.6ag)
	id AA16475; Thu, 25 Jun 92 14:48:13 CDT
Received: by willow14.cray.com
	id AA06086; 4.1/CRI-5.6; Thu, 25 Jun 92 14:48:12 CDT
Date: Thu, 25 Jun 92 14:48:12 CDT
From: meltzer@tamarack.cray.com (Andy Meltzer)
Message-Id: <9206251948.AA06086@willow14.cray.com>
To: hpff-forall@cs.rice.edu
Subject: Re: triangular array access and assignment-only block foralls

> > 	"Any storage location which is updated by one forall-block 
> > 	 (in a function, subroutine, or assignment) may not be 
> > 	 updated or read by any other forall-block."
> > 
> > I think this is clear and it leaves a very powerful construct for the user.
> > It also makes illegal all non-deterministic programs.  Its only drawback
> > (albeit a major one) is that the compiler cannot enforce it.
> > 
> 
> Uh, wouldn't that restriction also make this non-standard-conforming?
> 	FORALL ( I = 2:N-1 )
> 	  A(I) = A(I) * 2		! S1
> 	  B(I) = A(I-1) + A(I+1)	! S2
> 	END FORALL
> A(I) in S1 is read by S2 on different iterations...
> 

Yes, this would be non-standard conforming.  In my not-too-careful reading
of Marc's proposal I had missed that S1 completes before S2 starts.  If you
take my restriction to consider only single statements within a FORALL loop
at a time, it probably works.  But you're right, it has to be carefully 
worded.


> Does that mean you can do a good job on this?
>	X = 0.0
>	FORALL ( I = 1:N )
>	  IF (A(I) = 0.0) THEN
>	    GOTO 100
>	  END IF
>	END FORALL
>	X = 1.0
> 100	PRINT X
> While we're on the subject, what's printed?

Ahhh,  your question states:

> Which of the following statements should be allowed in a block FORALL?

which I read to mean "within".  In my opinion,  GOTO's should be allowed 
only within the bounds of the FORALL block, not arbitrarily in or out, 
otherwise they have the same problem as STOP and PAUSE.  In other words,
a FORALL must be entered through the FORALL header and exited via the END
FORALL.


							Andy


From zrlp09@trc.amoco.com  Thu Jun 25 14:52:46 1992
Received: from noc.msc.edu by cs.rice.edu (AA18081); Thu, 25 Jun 92 14:52:46 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA07302; Thu, 25 Jun 92 14:52:39 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA07111; Thu, 25 Jun 92 14:52:36 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA19982; Thu, 25 Jun 92 14:52:32 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA03609; Thu, 25 Jun 92 14:52:29 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA04818; Thu, 25 Jun 92 14:52:28 CDT
Message-Id: <9206251952.AA04818@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: Re: Forall 
In-Reply-To: Your message of Wed, 24 Jun 92 15:27:41 -0500.
             <9206242027.AA08420@erato.cs.rice.edu> 
Date: Thu, 25 Jun 92 14:52:28 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>

> > 
> > 	FORALL ( I = 1:N )
> > 	  IF ( ABS(A(I)) < EPS ) THEN
> > 	    A(I) = 0.0
> > 	  ELSE
> > 	    A(I) = A(I-1) + A(I+1)
> > 	  END IF
> > 	END FORALL
> > 
> > ...

>	Users want synchronization between statements in the same branch
>	Users do not want synchronization between the branches


Right.  Are we also agreed that the above FORALL containing IF-THEN-ELSE
should have the same meaning the following?
   TEMP = A(1:N)
   FORALL (I=1:N, ABS(TEMP(I))< EPS) A(I)=0.0
   FORALL (I=1:N, ABS(TEMP(I))>=EPS) A(I)=TEMP(I-1)+TEMP(I+1)

I think this would be the most appealing analog to the meaning of
array assignments without IF-THEN-ELSE, such as the following:
   FORALL (I=1:N) A(I) = A(I-1) + A(I+1) ! both forms use current values,
   A(1:N) = A(0:N-1) + A(2:N+1)          ! build an array, then assign

            
Straw poll for the group:

rlp votes:

1. Should HPF define a scalar WHERE statement (with semantics as in Guy
Steele's proposal) in a FORALL...
 no              A) If we cannot agree on a semantics for IF?
 probably no     B) If we do agree on a semantics for IF that is different
	            from WHERE semantics?
 no              C) If we agree to use WHERE semantics for IF?

2. Which of the following statements should be allowed in a block FORALL?
	(Let's assume that assignment makes the cut :-)
 yes    A) single-statement FORALL
 yes	B) block FORALL
 no	C) array WHERE (i.e. current Fortran 90 WHERE)
 no	D) scalar WHERE (i.e. Marc's proposal)
 maybe	E) IF-THEN-ELSE with FORALL-invariant conditions
 maybe	F) IF-THEN-ELSE, conditions can depend on FORALL indices
 maybe	G) DO loop with FORALL-invariant bounds
 maybe	H) DO loop, bounds can depend on FORALL indices
 no	J) GOTO
 no	K) STOP and PAUSE
 maybe	L) I/O statements (OPEN, CLOSE, READ, WRITE, ...)
 maybe	M) ALLOCATE and DEALLOCATE
 maybe	N) CALL

     ("Maybe" goes to yes if adding these facilities makes it possible
      to avoid local subroutines.)

 - Rex Page

From zrlp09@trc.amoco.com  Thu Jun 25 14:53:25 1992
Received: from noc.msc.edu by cs.rice.edu (AA18152); Thu, 25 Jun 92 14:53:25 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA07306; Thu, 25 Jun 92 14:53:24 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA07137; Thu, 25 Jun 92 14:53:23 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA19988; Thu, 25 Jun 92 14:53:19 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA03621; Thu, 25 Jun 92 14:53:16 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA04826; Thu, 25 Jun 92 14:53:15 CDT
Message-Id: <9206251953.AA04826@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: Re: triangular array access and assignment-only block foralls 
In-Reply-To: Your message of Wed, 24 Jun 92 15:57:00 -0500.
             <9206242057.AA08441@erato.cs.rice.edu> 
Date: Thu, 25 Jun 92 14:53:15 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>

Chuck said:
  Yes, a Fortran 90 definition would have eliminated the need for the
  COPY_SEND intrinsic.  I think SUM_SEND and the others would still have
  been needed, because Fortran 90 doesn't have C-style += operators.

It doesn't seem to me that the lack of += operators should be an obstacle.
What's wrong with  A(v)=A(v)+B instead of A(v)+=B?

Chuck said:
  Yes, COPY_SEND is nondeterministic (as defined now, it could probably
  be changed).  SUM_SEND can be implemented deterministically on all
  architectures I'm aware of as long as you keep the number of
  processors constant.  A few have hefty performance penalties, though.

Performance is the key issue.  Nondeterminism similarly can be defined away
in assignments like A(v)=... when v contains duplicates, but why restrict
the processor?  A sequential program is always an option when the programmer
needs the extra synchonization.  Defining away the nondeterminism sacrifices
performance in the cases where the programmer is satisfied with any
interleaving of the updates.  Another advantage of permitting nondeterminism
over defining it away is that the definition will be clumsy and difficult
to remember; description of the nondeterministic semantics will be simpler
(operationally, at least).

On side effects:
> One restriction I would favor is to prohibit side-effects in functions
> used in this FORALLs (such as assigning to B(I) in your MESSY_FUNCTION).

Chuck responded:
  I assume you mean at least "all side effects visible in the caller"
  (we'll discuss side-effects in SAVE variables later).  Sounds rational
  to me.  The floor is open for users to complain that this is too
  restrictive...

No, I mean side effects that change the subsequent result of the
computation.  This includes assignments to SAVE variables, writes,
reads, assignments to COMMON or MODULE variables, assignments to dummy
arguments, and probably a bunch of stuff I haven't thought of.

I don't want the processor to have to execute any statments in a function
at all if the processor can figure out the value of the expression
containing the funtion's invocation in some other way.  (I know you didn't
mean to open the floor to people who thought it wasn't restrictive enough,
Chuck, but as you can see I have strong feelings about this.) 


 - Rex Page

From chk@erato.cs.rice.edu  Thu Jun 25 15:01:28 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA18597); Thu, 25 Jun 92 15:01:28 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA09005); Thu, 25 Jun 92 15:01:25 CDT
Message-Id: <9206252001.AA09005@erato.cs.rice.edu>
To: John Merlin <jhm@ecs.southampton.ac.uk>
Cc: hpff-forall@erato.cs.rice.edu
Subject: Re: Forall 
In-Reply-To: Your message of Thu, 25 Jun 92 18:17:13 -0000.
             <6071.9206251717@bacchus.ecs.soton.ac.uk> 
Date: Thu, 25 Jun 92 15:01:23 -0500
From: chk@erato.cs.rice.edu


> From: John Merlin <jhm@ecs.southampton.ac.uk>
> Date: Thu, 25 Jun 92 18:17:13 BST
> Subject: Re: Forall
> 
> >From chk@edu.rice.cs Wed Jun 24 21:46:24 1992
> 
> > Case study:
> > 	FORALL ( I = 1 : N )
> > 	  IF ( A(I) < EPS ) THEN		! S0
> > 	    A(I) = 0.0				! S1
> > 	    B(I) = 0.0				! S2
> > 	  ELSE
> > 	    A(I) = (A(I-1) + A(I+1)) / 2	! S3
> > 	    B(I) = A(I) * 3			! S4
> > 	  END IF
> > 	END FORALL
> > ...
> 
> This seems to be a case for allowing conditional expressions (as in C)
> in forall-assignments, which is really what you want to express 
> in the above example.  (I proposed this once before in the context of
> alignment and distribution directives).  Using these, the above example
> without stmts S2 and S4 could be written:

I agree that this would be a better language design (well, I'd use a
less terse syntax than C).  However, we are trying to make minimal
changes to Fortran.  I think that leaves us with the options of
agreeing on a semantics for IF for the troublesome cases, agreeing on
restrictions to IF inside FORALL, or not allowing IF in FORALL at all.

> ...
> If you introduce the conditional expressions for this case, then
> the other case, where you want S3 to use the *new* values computed by
> S1, can be expressed with the IF construct using its normal meaning:
> 
>  	FORALL ( I = 1 : N )
>  	  IF ( A(I) < EPS ) THEN		! S0
>  	    A(I) = 0.0				! S1
>  	  ELSE
>  	    A(I) = (A(I-1) + A(I+1)) / 2	! S3
>  	  END IF
>  	END FORALL
> 
> Then you don't need to introduce a scalar WHERE construct within
> FORALL for this case, which I also think should be avoided on the
> same grounds as above.

The problem is, I'm unconvinced that this *is* the "normal" meaning of
IF.  My intuition says that reversing the sense of the condition and
exchanging the THEN and ELSE branches shouldn't change the meaning of
the IF; that's not true with the semantics here.

> 
>                     John.
> ...
> P.S.2:  I plan to input something about user-defined elemental 
> procedures soon.  It's just a matter of knuckling down to it!
> 

Glad to have the input.

	Chuck

From zrlp09@trc.amoco.com  Thu Jun 25 15:29:58 1992
Received: from noc.msc.edu by cs.rice.edu (AA19586); Thu, 25 Jun 92 15:29:58 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA09673; Thu, 25 Jun 92 15:29:57 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA07989; Thu, 25 Jun 92 15:29:56 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA20120; Thu, 25 Jun 92 15:29:52 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA04516; Thu, 25 Jun 92 15:29:49 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA05031; Thu, 25 Jun 92 15:29:48 CDT
Message-Id: <9206252029.AA05031@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: forall, Andy Meltzer's update restrictions
Date: Thu, 25 Jun 92 15:29:48 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>


> > 	"Any storage location which is updated by one forall-block 
> > 	 (in a function, subroutine, or assignment) may not be 
> > 	 updated or read by any other forall-block."
> > 
> 
> Uh, wouldn't that restriction also make this non-standard-conforming?
> 	FORALL ( I = 2:N-1 )
> 	  A(I) = A(I) * 2		! S1
> 	  B(I) = A(I-1) + A(I+1)	! S2
> 	END FORALL
> A(I) in S1 is read by S2 on different iterations...
> 

Andy says:
  Yes, this would be non-standard conforming.  In my not-too-careful reading
  of Marc's proposal I had missed that S1 completes before S2 starts.  If you
  take my restriction to consider only single statements within a FORALL loop
  at a time, it probably works.  But you're right, it has to be carefully 
  worded.


Well, shouldn't the following array assignments have the same effect?
And doesn't the forall violate your constraint?

    forall (i=2:n-1) A(i) = A(i-1) + A(i+1)
       and
    A(2:n-1) = A(1:n-2) + A(3:n)

     
  - Rex


From meltzer@tamarack.cray.com  Thu Jun 25 16:34:33 1992
Received: from timbuk.cray.com by cs.rice.edu (AA21572); Thu, 25 Jun 92 16:34:33 CDT
Received: from willow14.cray.com by timbuk.cray.com (4.1/CRI-MX 1.6ag)
	id AA23438; Thu, 25 Jun 92 16:34:32 CDT
Received: by willow14.cray.com
	id AA06175; 4.1/CRI-5.6; Thu, 25 Jun 92 16:34:30 CDT
Date: Thu, 25 Jun 92 16:34:30 CDT
From: meltzer@tamarack.cray.com (Andy Meltzer)
Message-Id: <9206252134.AA06175@willow14.cray.com>
To: hpff-forall@cs.rice.edu
Subject: Re:  forall, Andy Meltzer's update restrictions

> > 	FORALL ( I = 2:N-1 )
> > 	  A(I) = A(I) * 2		! S1
> > 	  B(I) = A(I-1) + A(I+1)	! S2
> > 	END FORALL
> > A(I) in S1 is read by S2 on different iterations...
> > 
> 
> Andy says:
>   Yes, this would be non-standard conforming.  In my not-too-careful reading
>   of Marc's proposal I had missed that S1 completes before S2 starts.  If you
>   take my restriction to consider only single statements within a FORALL loop
>   at a time, it probably works.  But you're right, it has to be carefully 
>   worded.
> 
> 
> Well, shouldn't the following array assignments have the same effect?
> And doesn't the forall violate your constraint?
> 
>     forall (i=2:n-1) A(i) = A(i-1) + A(i+1)
>        and
>     A(2:n-1) = A(1:n-2) + A(3:n)
> 
>      
>   - Rex
> 

No, the effect is different.  In Chuck's example B ends up with the "final"
write.  In yours the final value is in A.   


							Andy


From wu@cs.buffalo.edu  Thu Jun 25 22:12:19 1992
Received: from ruby.cs.Buffalo.EDU by cs.rice.edu (AA26455); Thu, 25 Jun 92 22:12:19 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA14560; Thu, 25 Jun 92 23:12:20 EDT
Date: Thu, 25 Jun 92 23:12:20 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9206260312.AA14560@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, hpff-forall@cs.rice.edu, rpage@trc.amoco.com
Subject: Re: triangular array access and assignment-only block foralls


> Let me observe that function calls have essentially all the problems
> that CALL statements do within FORALLs.  I want to be able to say
> 	FORALL ( I = 1:N ) A(I) = SIN( 2*I*PI / N )
> but be protected from suprises in
> 	FORALL ( I = 1:N ) A(I) = MESSY_FUNCTION( I, B(I), INDX(:) )
> where MESSY_FUNCTION could potentially 
> 	Assign to B(I) [ no problem, they're independent ]
> 	Assign to B(INDX(I)) [ big problem, in general ]
> 	Reference all of INDX [ possibly complicated if INDX is distributed ]
> 	Assign anything in INDX [ big problem ]

Could you write a MESSY_FUNCTION to include all of this cases?

Min-You

From glossa@cix.clink.co.uk  Fri Jun 26 04:59:14 1992
Received: from eros.uknet.ac.uk by cs.rice.edu (AA28628); Fri, 26 Jun 92 04:59:14 CDT
Received: from compulink.co.uk by eros.uknet.ac.uk with UUCP 
          id <16683-0@eros.uknet.ac.uk>; Fri, 26 Jun 1992 10:27:35 +0100
Date: Fri, 26 Jun 92 10:12 GMT
From: Glossa <glossa@cix.clink.co.uk>
Subject: TRIANGULAR FORALL
To: hpff-forall@cs.rice.edu
Reply-To: glossa@cix.clink.co.uk
Message-Id: <memo.493667@cix.compulink.co.uk>


WHY IS FOR-ALL SO CONFUSING?

One of the reasons is that many people are not at all familiar with
the SIMD style of programming. Unfortunately, the FORTRAN 90
definers, who were being shouted down by some of those suppliers who
are now most keen on the HPFF which has resulted, were not able to
add a full set of array constructors to their language. In
particular, they didn't include the APL IOTA function or generator
of 1..N as an array. This makes use of WHERE in FOR-ALL rather less
elegant. The program usually likes nice once you have these index
arrays present.

      INTEGER IOTA(N)

C I shouldn't have to write the following loop, there should have been an 
C intrinsic. Sometimes its convenient to have such arrays globally available.

      DO I = 1,N
        IOTA(I) = I
      END DO

      FORALL ( I = 1:N, J = 1:N )
        WHERE ( SPREAD(IOTA,1,N) .GE. SPREAD(IOTA,2,N)) THERE
           body of loop 
        END WHERE
      END FORALL

I can't take very seriously the objection that this is inefficient because
the loop body may consist simply of A(I,J) = 0.0. If this were the main 
processing load then we wouldn't need an HPF!  

Tom Lake


From zrlp09@trc.amoco.com  Fri Jun 26 07:11:27 1992
Received: from noc.msc.edu by cs.rice.edu (AA00315); Fri, 26 Jun 92 07:11:27 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA04483; Fri, 26 Jun 92 07:11:25 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA05682; Fri, 26 Jun 92 07:11:24 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA22249; Fri, 26 Jun 92 07:11:20 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA00693; Fri, 26 Jun 92 07:11:19 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA07678; Fri, 26 Jun 92 07:11:18 CDT
Message-Id: <9206261211.AA07678@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: Re:  forall, Andy Meltzer's update restrictions
Date: Fri, 26 Jun 92 07:11:17 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>


> > 	FORALL ( I = 2:N-1 )
> > 	  A(I) = A(I) * 2		! S1
> > 	  B(I) = A(I-1) + A(I+1)	! S2
> > 	END FORALL
> > A(I) in S1 is read by S2 on different iterations...
> > 
> 
> Andy says:
>   Yes, this would be non-standard conforming.  In my not-too-careful reading
>   of Marc's proposal I had missed that S1 completes before S2 starts.  If you
>   take my restriction to consider only single statements within a FORALL loop
>   at a time, it probably works.  But you're right, it has to be carefully 
>   worded.
> 
> 
> Well, shouldn't the following array assignments have the same effect?
> And doesn't the forall violate your constraint?
> 
>     forall (i=2:n-1) A(i) = A(i-1) + A(i+1)
>        and
>     A(2:n-1) = A(1:n-2) + A(3:n)
> 
>      
>   - Rex
> 

Andy replied:
   No, the effect is different.  In Chuck's example B ends up with the "final"
   write.  In yours the final value is in A.   

Sorry, Andy, I didn't say what I meant.
My point was that the two array assignments in my example have the same
effect as each other (not the same effect as your example), that they have
a desirable effect (one that we would want to retain), and that the forall
violates your constraint, even when reworded to consider only single
statements within a forall.

  - Rex


From zrlp09@trc.amoco.com  Fri Jun 26 07:54:50 1992
Received: from noc.msc.edu by cs.rice.edu (AA00530); Fri, 26 Jun 92 07:54:50 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA06974; Fri, 26 Jun 92 07:54:49 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA23562; Fri, 26 Jun 92 07:54:47 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA22303; Fri, 26 Jun 92 07:54:43 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA01130; Fri, 26 Jun 92 07:54:42 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA07965; Fri, 26 Jun 92 07:54:41 CDT
Message-Id: <9206261254.AA07965@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: APL iota function in Fortran: (/ (i, i=1,n) /)
Date: Fri, 26 Jun 92 07:54:40 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>

Fortran 90 array constructors provide the iota function of APL.

Three ways to assign   APL-iota n   to a Fortran array:

   DO i=1,n           !Tom 
     iota(i)=i        ! Lake's
   END DO             !  loop

   iota = (/ (i,i=1,n) /)    ! using array constructor

   forall(i=1:n) iota(i)=i   ! using forall

  - Rex

From chk@erato.cs.rice.edu  Fri Jun 26 09:28:38 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA01935); Fri, 26 Jun 92 09:28:38 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA09431); Fri, 26 Jun 92 09:28:34 CDT
Message-Id: <9206261428.AA09431@erato.cs.rice.edu>
To: glossa@cix.clink.co.uk
Cc: hpff-forall@erato.cs.rice.edu
Subject: Re: TRIANGULAR FORALL 
In-Reply-To: Your message of Fri, 26 Jun 92 10:12:00 +0000.
             <memo.493667@cix.compulink.co.uk> 
Date: Fri, 26 Jun 92 09:28:33 -0500
From: chk@erato.cs.rice.edu


> Date: Fri, 26 Jun 92 10:12 GMT
> From: Glossa <glossa@cix.clink.co.uk>
> Subject: TRIANGULAR FORALL
> 
> WHY IS FOR-ALL SO CONFUSING?
> 
> One of the reasons is that many people are not at all familiar with
> the SIMD style of programming. Unfortunately, the FORTRAN 90
> definers, who were being shouted down by some of those suppliers who
> are now most keen on the HPFF which has resulted, were not able to
> add a full set of array constructors to their language. In
> particular, they didn't include the APL IOTA function or generator
> of 1..N as an array. This makes use of WHERE in FOR-ALL rather less
> elegant. The program usually likes nice once you have these index
> arrays present.

Can I suggest you propose your favorite set of array constructors to
hpff-intrinsics?  Rob Schreiber there is a friend of SIMD and array
constructs, and would probably welcome a rich set of such functions.

>       INTEGER IOTA(N)
> 
> C I shouldn't have to write the following loop, there should have been an 
> C intrinsic. Sometimes its convenient to have such arrays globally available.
> 
>       DO I = 1,N
>         IOTA(I) = I
>       END DO
> 
>       FORALL ( I = 1:N, J = 1:N )
>         WHERE ( SPREAD(IOTA,1,N) .GE. SPREAD(IOTA,2,N)) THERE
>            body of loop 
>         END WHERE
>       END FORALL
> 
> I can't take very seriously the objection that this is inefficient because
> the loop body may consist simply of A(I,J) = 0.0. If this were the main 
> processing load then we wouldn't need an HPF!  
> 
> Tom Lake
> 

Are you seriously claiming that the FORALL loop in your example is
clearer than
	FORALL ( I = 1:N, J = 1:N, I.GE.J )
	    body of loop
	END FORALL
(I'm guessing this is what you mean, let me know if you have a
different indexing pattern in mind.)
I agree that array constructors are useful, but I don't think this
example should require them.  For starters, why do you want to
generate two (presumably large) 2-D arrays in order to run a
triangular loop?

	Chuck

From @eros.uknet.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Fri Jun 26 11:41:55 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA07551); Fri, 26 Jun 92 11:41:55 CDT
Via: uk.ac.uknet-relay; Fri, 26 Jun 1992 17:41:00 +0100
Received: from ecs.soton.ac.uk by eros.uknet.ac.uk via JANET with NIFTP (PP) 
          id <5711-0@eros.uknet.ac.uk>; Fri, 26 Jun 1992 17:40:17 +0100
Via: camra.ecs.soton.ac.uk; Fri, 26 Jun 92 17:36:22 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Fri, 26 Jun 92 17:41:34 BST
Date: Fri, 26 Jun 92 17:39:02 BST
Message-Id: <6443.9206261639@bacchus.ecs.soton.ac.uk>
To: chk@cs.rice.edu
Subject: Re: Forall
Cc: hpff-forall@cs.rice.edu

> Chuck says:
>
> > From: John Merlin <jhm@ecs.southampton.ac.uk>
> >
> > ...
> > This seems to be a case for allowing conditional expressions (as in C)
> > in forall-assignments, which is really what you want to express 
> > in the above example.  (I proposed this once before in the context of
> > alignment and distribution directives).  Using these, the above example
> > without stmts S2 and S4 could be written:
> 
> I agree that this would be a better language design (well, I'd use a
> less terse syntax than C).  However, we are trying to make minimal
> changes to Fortran. 


But at least it would lessen the culture shock if and when you introduce 
HPF bindings for C:-)


> > ...
> > If you introduce the conditional expressions for this case, then
> > the other case, where you want S3 to use the *new* values computed by
> > S1, can be expressed with the IF construct using its normal meaning:
> > 
> >  	FORALL ( I = 1 : N )
> >  	  IF ( A(I) < EPS ) THEN		! S0
> >  	    A(I) = 0.0				! S1
> >  	  ELSE
> >  	    A(I) = (A(I-1) + A(I+1)) / 2	! S3
> >  	  END IF
> >  	END FORALL
> > 
> > Then you don't need to introduce a scalar WHERE construct within
> > FORALL for this case, which I also think should be avoided on the
> > same grounds as above.
> 
> The problem is, I'm unconvinced that this *is* the "normal" meaning of
> IF.  My intuition says that reversing the sense of the condition and
> exchanging the THEN and ELSE branches shouldn't change the meaning of
> the IF; that's not true with the semantics here.

(and Rex Page makes the same point).  Yes, I take your point.  
Basically I'm uncomfortable with the above example whatever the semantics 
(which is why I was tempted to suggest conditional expressions in the 
first place, as a more obvious way of expressing one possible meaning 
of the above, in a way that appears compatible with forall-assignemnts).

Really I'm inclined to think that each instance (iteration?) of the IF 
construct should be data independent, otherwise the result should be 
undefined (and perhaps not standard conforming).  Also, there should be 
only one statement in each branch of the IF.  

In fact, my support for IF constructs in FORALL is only at the level 
of 'maybe'.  Sorry I couldn't be of more assistance!

                            John.

From chk@erato.cs.rice.edu  Fri Jun 26 18:50:37 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA19405); Fri, 26 Jun 92 18:50:37 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA09850); Fri, 26 Jun 92 18:50:35 CDT
Message-Id: <9206262350.AA09850@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Word-Of-The-Day: prosaic : (adj) 1. characteristic of prose; factual
	2. having a dull, flat, or unimaginative quality
Subject: Auf Wiedersehen
Date: Fri, 26 Jun 92 18:50:34 -0500
From: chk@erato.cs.rice.edu


I'm heading for Germany and Austria for two weeks (business, not
pleasure!), so my email access is likely to be spotty at best.  Please
don't let my absence stop the discussions.  Although I disagree with
some of these ideas, the conversation is definitely healthy.

As I see it, the straw poll now indicates:
	Strong support for generalized assignments, including nested
		FORALL, vector assignment, and vector WHERE
	No better than 50-50 support for scalar WHERE
	Better than 50-50 support for some form of conditional
		assignment, although we disagree on syntax and
		semantics for it
	A vocal minority in support of allowing very general
		statements in a FORALL, but definitely in the minority.
In outline, here's my current feeling:
	Only generalized assignments should be allowed in FORALL
	Keep talking about conditionals, we're nowhere near
		consensus.  I still favor scalar WHERE, but it seems
		I'm outvoted.  Nested IF with WHERE semantics is (barely)
		acceptable, but only if no better proposals gain support.
	All other statements should not be allowed in FORALL.  They
		can always be used in DO INDEPENDENT.

Nobody has suggested any concrete restrictions on function calls in
FORALLs, but John Merlin says he's working on something.  This should
be top priority after conditionals in FORALL.

The only comments I've gotten re: Min-You's INDEPENDENT proposal have
been positive.  I think the syntax needs some adjustment, but
otherwise it seems OK.  (INDEPENDENT is now used to mark loops and the
start of blocks; the second use should be changed to BEGIN INDEPENDENT
to avoid ambiguity.)  I'll try to generate an amended version while
I'm on the road (in the air).

I hesitate to bring up the LOCAL SUBROUTINE proposals from Guy and
Marc until we get the FORALL issues settled.

	Chuck

From wu@cs.buffalo.edu  Sat Jun 27 09:30:58 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA24519); Sat, 27 Jun 92 09:30:58 CDT
Received: from ruby.cs.Buffalo.EDU by erato.cs.rice.edu (AA10116); Sat, 27 Jun 92 09:30:55 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA15242; Sat, 27 Jun 92 10:30:53 EDT
Date: Sat, 27 Jun 92 10:30:53 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9206271430.AA15242@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, hpff-forall@erato.cs.rice.edu
Subject: Re:  Auf Wiedersehen
Cc: wu@cs.buffalo.edu

> 
> The only comments I've gotten re: Min-You's INDEPENDENT proposal have
> been positive.  I think the syntax needs some adjustment, but
> otherwise it seems OK.  (INDEPENDENT is now used to mark loops and the
> start of blocks; the second use should be changed to BEGIN INDEPENDENT
> to avoid ambiguity.)  I'll try to generate an amended version while
> I'm on the road (in the air).
> 
> 	Chuck
> 

This point is taken.  BEGIN INDEPENDENT is clear, though it seems
too long.  I am still thinking about better words to substitute 
BEGIN INDEPENDENT and END INDEPENDENT.  Any suggestion?

Min-You

From wu@cs.buffalo.edu  Sat Jun 27 13:06:46 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA26509); Sat, 27 Jun 92 13:06:46 CDT
Received: from ruby.cs.Buffalo.EDU by erato.cs.rice.edu (AA10140); Sat, 27 Jun 92 13:06:43 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA15630; Sat, 27 Jun 92 14:06:42 EDT
Date: Sat, 27 Jun 92 14:06:42 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9206271806.AA15630@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, hpff-forall@erato.cs.rice.edu
Subject: Re: Forall
Cc: wu@cs.buffalo.edu


> > > 	FORALL ( I = 1:N ) A(I) = MESSY_FUNCTION( I, B(I), INDX(:) )
> 
> 	REAL FUNCTION MESSY_FUNCTION( K, A, IA )
> 	INTEGER K
> 	INTEGER, DIMENSION(:) :: IA
> 	REAL A(K)
>S1: 	A(1) = 1.0		! assign to B(I)
>S2: 	A(IA(K)-K+1) = 2.0	! assign to B(INDX(K)) [ note
> 				! subscript offset tricks ] 
> 	DO I = LBOUND(IA), UBOUND(IA)
>S3: 	  PRINT IA(I)		! reference all of INDX
>S4: 	  IA(I) = 3.0		! assign all of INDX
> 	ENDDO
>S5: 	MESSY_FUNCTION = A(IA(K))
> 	END
> 
> Chuck
> 

I will use this example to illustrate some thought about function calls
in FORALL.

First, this MESSY_FUNCTION cannot be called from an INDEPENDENT block
because there is an output dependence (S1 and S2), an anti-dependence
(S2 and S4), and data (true) dependences (S4 and S5, S1,S2 and S5).  
Furthermore, S4 is not valid to be included in a FORALL since INDX(I) 
will be assigned N times (though the same value).  Although S3 is a 
valid statement, INDX will be printed N times.  S2 is valid assuming 
INDX is a permutation.  

Let me make a few functions that are valid to be called from an INDEPENDENT
block:

=========================================================================
 	REAL FUNCTION NOT_MESSY_FUNCTION1( K, A, IA )
 	INTEGER K
 	INTEGER, DIMENSION(:) :: IA
 	REAL A(K)
S1: 	A(1) = 1.0		! assign to B(I)
S6: 	NOT_MESSY_FUNCTION1 = A(1)
 	END

=========================================================================
 
 	REAL FUNCTION NOT_MESSY_FUNCTION2( K, A, IA )
 	INTEGER K
 	INTEGER, DIMENSION(:) :: IA
 	REAL A(K)
S2: 	A(IA(K)-K+1) = 2.0	! assign to B(INDX(K)) [ note
 				! subscript offset tricks ] 
S7: 	NOT_MESSY_FUNCTION2 = A(IA(K)-K+1)
 	END
 
=========================================================================

 	REAL FUNCTION NOT_MESSY_FUNCTION3( K, A, IA )
 	INTEGER K
 	INTEGER, DIMENSION(:) :: IA
 	REAL A(K)
 	DO I = LBOUND(IA), UBOUND(IA)
S3: 	  PRINT IA(I)		! reference all of INDX
 	ENDDO
S5: 	NOT_MESSY_FUNCTION3 = A(IA(K))
 	END
 
=========================================================================

The function calls (as well as subroutine calls) to be included in an
INDEPENDENT block must have some restrictions.  Simply speaking, the 
restriction is "NO DEPENDENCE".  In my INDEPENDENT proposal, no function 
calls can be made from a non-independent block.  However, I am not sure 
if it is possible to do so.  Let's exam Chuck's example:

 	REAL FUNCTION MESSY_FUNCTION( K, A, IA )
 	INTEGER K
 	INTEGER, DIMENSION(:) :: IA
 	REAL A(K)
S1: 	A(1) = 1.0		! assign to B(I)
S2: 	A(IA(K)-K+1) = 2.0	! assign to B(INDX(K)) [ note
 				! subscript offset tricks ] 
 	DO I = LBOUND(IA), UBOUND(IA)
S3: 	  PRINT IA(I)		! reference all of INDX
S4: 	  IA(I) = 3.0		! assign all of INDX
 	ENDDO
S5: 	MESSY_FUNCTION = A(IA(K))
 	END
 
It cannot be called from an INDEPENDENT block.  When it is considered 
to be called from a non-independent block, there are two possible 
semantics (as I can see).  The first one is to force each instance 
of the call independent, that is, no interaction is permitted between 
instances of parallel calls (for simplicity, I will use "call" instead 
of "instance of parallel calls" in the following context).  Then we 
need to make copies for each call.  Each call will executeindependently, 
and at the end of call, some rule must be enforced to resolve possible
contention.  It is not the original Fortran semantics, but is close 
to Fortran D semantics. (Note that with this semantics S4 is valid).

The second semantics is to allow interaction between calls.  Then
we have to make some synchronizations in the function.  The function 
can be executed synchronously.  On the other hand, to reduce the number 
of synchronizations, we may apply INDEPENDENT directives inside of 
the function. 

 	REAL FUNCTION MESSY_FUNCTION( K, A, IA )
 	INTEGER K
 	INTEGER, DIMENSION(:) :: IA
 	REAL A(K)
!HPF$BEGIN INDEPENDENT
S1: 	A(1) = 1.0		! assign to B(I)
!HPF$END INDEPENDENT
!HPF$BEGIN INDEPENDENT
S2: 	A(IA(K)-K+1) = 2.0	! assign to B(INDX(K)) [ note
 				! subscript offset tricks ] 
!HPF$END INDEPENDENT
!HPF$BEGIN INDEPENDENT
 	DO I = LBOUND(IA), UBOUND(IA)
S3: 	  PRINT IA(I)		! reference all of INDX
 	ENDDO
S5: 	MESSY_FUNCTION = A(IA(K))
!HPF$END INDEPENDENT
 	END
 
Note that S4 is not valid and eliminated from the function.  I am not 
proposing INDEPENDENT directives in a function definition here.
I only want to see if INDEPENDENT blocks can be applied.  

I don't have a strong position at this moment on which semantics 
is better for HPF, or simply not allow calls from non-independent block.
However, as I see, a function call from non-independent block is possible.

Min-You

From @eros.uknet.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Mon Jun 29 03:30:41 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA21239); Mon, 29 Jun 92 03:30:41 CDT
Via: uk.ac.uknet-relay; Mon, 29 Jun 1992 09:30:09 +0100
Received: from ecs.soton.ac.uk by eros.uknet.ac.uk via JANET with NIFTP (PP) 
          id <7229-0@eros.uknet.ac.uk>; Mon, 29 Jun 1992 09:29:32 +0100
Via: camra.ecs.soton.ac.uk; Mon, 29 Jun 92 09:25:18 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Sat, 27 Jun 92 17:28:19 BST
Date: Sat, 27 Jun 92 17:25:47 BST
Message-Id: <6615.9206271625@bacchus.ecs.soton.ac.uk>
To: hpff-forall@cs.rice.edu
Subject: Conditional assignments in FORALL

I've realised I was being very thick when I suggested introducing 
C-style conditional expressions into HPF, as Fortran 90 already
provides exactly this functionality with the 'MERGE' intrinsic:

	e1 ? e2 : e3         in C is equivalent to
	MERGE (e2, e3, e1)   in F90.

It's elemental, so it can be used in 'forall-assignment's (I would
maintain!).

Thus, one interpretation of Chuck's example:

 	FORALL ( I = 1:N )
 	  IF ( ABS(A(I)) < EPS ) THEN
 	    A(I) = 0.0                 ! s1
 	  ELSE
 	    A(I) = A(I-1) + A(I+1)     ! s2
 	  END IF
 	END FORALL

(the interpretation where concurrency extends over both branches
of the IF, so old values are always used on the RHS of 's1' and 's2'), 
can be expressed simply as:

	FORALL (i=1:n)  a(i) = MERGE (0.0, a(i-1)+a(i+1), ABS(a(i)) < eps)

The case where no 's2' branch is performed until all 's1' branches
have executed (but using the old values in the mask) can be written:

	temp = ABS (a) < eps
	FORALL (i=1:n, temp)       a(i) = 0.0
	FORALL (i=1:n, .NOT.temp)  a(i) = a(i-1) + a(i+1)


I'm beginning to reach a very conservative opinion about what should be
allowed in a FORALL construct:  that IF and DO should be restricted to 
forbid references to the FORALL indices, that stmts should be executed 
in strict sequential order and be severly limited in type (perhaps just 
forall-assignments, elemental subroutine calls, IF and DO -- and perhaps 
excluding the last 2), that 'forall-subscript-expr's can't reference 
'forall-indices', etc  (i.e. pretty much as in Mark Snir's proposal but 
restricting the allowable stmts in a FORALL to those whose semantics 
are very obvious!)

                      Cheers,
                          John Merlin.

From @eros.uknet.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Mon Jun 29 03:51:03 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA21291); Mon, 29 Jun 92 03:51:03 CDT
Via: uk.ac.uknet-relay; Mon, 29 Jun 1992 09:50:32 +0100
Received: from ecs.soton.ac.uk by eros.uknet.ac.uk via JANET with NIFTP (PP) 
          id <9151-0@eros.uknet.ac.uk>; Mon, 29 Jun 1992 09:44:11 +0100
Via: camra.ecs.soton.ac.uk; Mon, 29 Jun 92 09:25:23 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Sat, 27 Jun 92 17:01:19 BST
Date: Sat, 27 Jun 92 16:58:46 BST
Message-Id: <6600.9206271558@bacchus.ecs.soton.ac.uk>
To: hpff-forall@cs.rice.edu
Subject: User-defined elemental procedures

I'd like to propose extending the Fortran 90 concept of 'elemental 
procedures' for the purpose of HP Fortran -- the extension being to 
allow programmers to define such procedures, which in F90 are restricted 
to a subset of the intrinsic procedures.  I believe there are several 
grounds for introducing such procedures in HPF (and indeed in F90 
generally), namely: enhanced expressiveness and elegance; efficiency; 
and the possibility of expressing a limited form of MIMD parallelism 
in a simple and elegant way.

(Incidentally, I've heard on the grapevine, and in one of Chuck's 
messages, that this proposal was made at the last HPF meeting, 
so this is probably more of a case of 'seconding' than 'proposing'.
However, since I don't know anything about that proposal I still have 
an excuse for giving my two-pennies worth:-)

In this message I'll outline the proposal, explain the constraints,
and summarise what I see as the uses and advantages of this feature.

Introduction
------------

Fortran 90 introduces the concept of 'elemental procedures',
which are defined for scalar arguments but may also be applied to
conforming array-valued arguments.  For an elemental function,
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g SIN(X).

Unfortunately, Fortran 90 restricts the application of elemental
procedures to a subset of the intrinsic procedures---the programmer
cannot write his own.  This note therefore proposes the extension of 
allowing the programmer to define elemental procedures.

Informal proposal
-----------------
To define an elemental procedure, the subroutine or function statement 
is prefixed with the new keyword 'ELEMENTAL', e.g.:

	ELEMENTAL SUBROUTINE S (X)

	ELEMENTAL REAL FUNCTION F (X)

Henceforth, most of the discussion is in terms of elemental functions;
the properties of elemental subroutines follow in an obvious fashion.


For a function to be valid for use as an elemental function, it must 
obey the following constraints:

	1. Its arguments must have intent '(IN)'.
	2. Local arrays cannot be SAVEd.
	3. It cannot define any global variable.
	4. Its arguments, result and local variables must not be distributed.
	5. It can only reference a global variable if it is *not distributed*.

('local' and 'global' are used here in the normal programming sense, 
rather than referring to distribution!).

Incidentally, I've chosen not to insist that the dummy arguments and 
result are scalar---that's another extension to the F90 definition!  
Rather, each dummy argument and the result can be an array of
constant shape, and their shapes need not be related.  An elemental
invocation of the function corresponds to passing it actual arguments
conforming with the corresponding dummy arguments, in the normal way.
However, the function can also be invoked such that each actual argument
is an array of shape (s1,...sn) of the corresponding dummy argument
-- where (s1,...sn) is the same for all arguments.  This is equivalent 
to an array of elemental function invocations, which are conceptually 
computed in parallel, and whose result is an array of shape (s1,...sn) 
of 'elemental' results, each of which is the same as would have been 
obtained by applying the function to the corresponding 'elements' of 
the arguments.   (This generalisation, if acceptable, would mean that 
the proposed functions are a combination of 'elemental' and 
'transformational' functions in the F90 terminology.  I don't foresee 
any problems with this generalisation of the F90 concept -- except for 
one restriction on its use -- but I may be wrong!  I'll say why I think 
this extension may be useful, and mention the restriction, below).

This can be expressed more formally as follows:

	In an invocation of an user-defined elemental function,
	the shape (ai_1,ai_2,...ai_ni) of each actual argument Ai 
	must be related to the shape (di_1,di_2,...di_mi) of the 
	corresponding dummy argument Di as follows:

	For all arguments i:
		ni >= mi
		ai_k = di_k,   k = [1, mi]

	(i.e. the lowest dimensions of each actual argument must conform 
	with the corresponding dummy argument).

	For all argument pairs i,j:
		(ni - mi) = (nj - mj)
		ai_k = aj_k',    k > mi,  k' > mj,  (k - mi) = (k' - mj). 

	(i.e. the 'extra' dimensions of all arguments must conform).

	The shape of the actual function result, (f_1,f_2,...f_n), is
	then related to that of the dummy function result, (r_1,r_2,...r_m)
	as follows:
		(n - m) = (ni - mi)
		f_k = r_k,    k = [1, m]
		f_k = ai_k,   k > m
	
	(i.e. the lowest dimensions of the actual result conform with the 
	dummy result, and the 'extra' dimensions of the result conform with
	the 'extra' dimensions of each argument).

Comments
--------
The constraints are obviously designed so that the function can
be invoked concurrently at each of a set of grid points.  (By 'grid 
points' I really mean elements of an underlying array, but with with the 
understanding that the data objects at each grid point can themselves 
be arrays).

Constraints 1-3 ensure that the function has no side effects.
(Actually not quite -- I should also add the rule that the function 
performs no I/O!).  The last two concern the use of the function in 
an environment with distributed data, and ensure that it does not 
perform any data communications.  Since by definition an elemental 
function is invoked independently in different processes (and may be
invoked in some processes but not others), this constraint is essential, 
for otherwise the function could not be used safely in environments 
where both the sending and receiving processes must participate in a 
communication (e.g. most message-passing machines).

These rules show that a compiler cannot deduce from a procedure's 
interface whether it can validly be used as an elemental procedure, 
as that depends on its local declarations and internal operations.  
Hence, it is necessary to use a specifier like 'ELEMENTAL' in the 
procedure interface to identify such procedures.  The compiler can 
check that the procedure satisfies all the necessary constraints 
when it compiles the procedure itself.


Uses and MIMD aspects
---------------------

1. Array expressions

Obviously such functions can be used in array expressions with the 
same interpretation as F90 elemental intrinsic functions.
The function result must be scalar or conform with the expression.
In the former case, the result is effectively 'broadcast' to an array
of the required shape (although in the HPF context no data communications 
are required, as the function would be invoked identically in all 
processes where the result is required).  In the latter case, it's as 
if the function is invoked elementally at each element of the 'underlying' 
array (i.e. the array left after omitting the first 'm' dimensions, 
where 'm' is the rank of the result).

I've allowed the generalisation of array-valued dummy arguments and 
result to handle applications that are data-parallel over an underlying 
grid, but where the data objects at each grid point are arrays rather 
than scalars.  An example is QCD (a physics application), which is
defined on a 4d grid, and where the data objects are a vector of
length 3 at each grid point and a 3*3 array on each link.  The vectors
and matrices can be initialised independently at each grid point
(viz. with an elemental function), but each vector and matrix must be
initialised as a whole as their individual elements are not independent
(e.g. the vectors must have length 1).

Incidentally, I should mention that elemental procedures can be used 
to perform an operation at a grid point even if the operation is a 
function of values at *other* grid points;  this can be done in parallel 
as long as the old values at the other sites are required.  In the 
implementation I envisage, all data communications are done *outside* 
the elemental procedure prior to the call, and the required values are 
received by the procedure locally via its argument list.  E.g. in QCD,
the updating of the matrix on a link is a function of the (old values of)
the matrices on the 3 neighbouring links forming a 'staple', e.g:

                        Ub
                     --------
                     |      |
                  Ua |      | Uc
                     |      |
                     ........
                      U_new

This can be computed using an elemental function, e.g.:

	U_new = link_update (Ua, Ub, Uc)

where the arguments and result are arrays of shape (3,3).

It can be done in parallel over the whole grid in an array assignment.
(Here I update the links in direction 1 of a 2d plane for simplicity.
'U1' and 'U2' are arrays containing the link matrices in directions
1 and 2 respectively.  They have dimensions (3,3,n1,n2) -- the first 
two dimensions being the link matrix and the last two the lattice indices):

	U1 = link_update (U2,                  &    ! Ua
	                  CSHIFT (U1, 1, 3),   &    ! Ub--shifted in dim 3
	                  CSHIFT (U2, 1, 4))        ! Uc--shifted in dim 4

The data communications are done (by 'CSHIFT()') prior to calling 
the elemental function.

2. WHERE statement & construct

Because of its elemental property, an elemental function can also be used 
in masked array assignments in WHERE statement and constructs (cf. normal 
array-valued functions cannot).

Constraint:  An elemental function used in a WHERE statement or construct
must have a scalar result.  (This is because the 'mask' array can select 
individual elements of the assignment for execution.  I don't think that 
any restriction on the shape of the dummy arguments is necessary though).

In the HPF context, this may be one way of obtaining MIMD parallelism, e.g.:


	REAL x (10,10)
	LOGICAL edges (10,10)
	INTERFACE
	  ELEMENTAL REAL FUNCTION f_egde (x)
	     REAL x
	  END FUNCTION f_edge
	  ELEMENTAL REAL FUNCTION f_interior (x)
	    REAL x
	  END FUNCTION f_interior
	END INTERFACE

	...  initialise mask array 'edges'

	WHERE (edges)
	  x = f_egde (x)
	ELSE WHERE
	  x = f_interior (x)
	END WHERE


(Of course, MIMD parallelism is only obtained if the compiler
can establish that the two assignments are independent and doesn't
force a synchronisation at the ELSEWHERE, which is probably a reasonable 
assumption in simple cases).


3. FORALL statement & construct

These functions can also be used in a FORALL.  Because a 'forall-assignment'
may be an 'array-assignment' (in most definitions anyway) the elemental
function can have an array result.  E.g. if a certain problem
is data-parallel over a 2d grid, and the data structure at each grid
point is a vector of length 3 (2d QCD?), we could have:


	REAL  v (3,10,10)
	INTERFACE
	  ELEMENTAL FUNCTION f (x)
	    REAL, DIMENSION(3) :: f, x
	  END FUNCTION f
	END INTERFACE
	...
	FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 

	
If 'IF' constructs are allowed in FORALL (a thorny subject at the moment!)
one has more opportunities for MIMD parallelism, e.g.:


	FORALL (i=1:n, j=1:n)
	  IF (i==1 .OR. i==n .OR. j==1 .OR. j==n)
	    x (i,j) = f_edge (x(i,j))
	  ELSE
	    x (i,j) = f_interior (x(i,j))
	  ENDIF
	END FORALL


(Incidentally, this can also be coded without an IF construct, using the 
F90 elemental intrinsic function MERGE(), thus:

	FORALL (i=1:n, j=1:n)                                        &
	    x (i,j) =  MERGE (f_edges (x(i,j)),                      &
	                      f_interior (x(i,j)),                   &
	                      (i==1 .OR. i==n .OR. j==1 .OR. j==n))
)

I would propose that only elemental functions (both user-defined and
intrinsic) are allowed in FORALL, just as in array expressions.  
Elemental subroutine calls could also be allowed in FORALL, with a very 
similar interpretation to elemental functions.  The main reason for using 
them would be to allow the return of multiple results.  
(Incidentally, arguments returning a result can have intent '(IN)' or 
'(INOUT)' as in F90).

4. MIMD parallelism

Perhaps the most direct way of obtaining MIMD parallelism is by
means of branches within an elemental function which depend on argument 
values.  These branches can be governed by content-based or index-based 
conditionals (the latter in a FORALL context).  For example:


	ELEMENTAL FUNCTION f (x, i)
	  IF (x > 0) THEN     ! content-based conditional
	    ...
	  ELSE IF (i==1 .OR. i==n) THEN    ! index-based conditional
	    ...
	  ENDIF
	END FUNCTION

	...
	FORALL (i=1:n)  x (i) = f(x(i), i)
	...


Content-based conditionals can be exploited generally, including in
array assignments, which may sometimes obviate the need for WHERE 
statements and constructs with their potential synchronisation overhead. 

Other advantages
----------------

There are other advantages to user-defined elemental procedures, apart 
from their MIMD potential and the ability to use them in WHERE and FORALL.

They would be a very convenient programming tool, as the same elemental 
procedure can be applied to arguments of any rank.

In addition, the implementation of an elemental function array in an
array expression is likely to be more efficient than that of an 
equivalent array function.  One reason is that it requires less temporary
storage for the result (i.e. storage for a single result versus storage 
for the entire array of results).  Another is that it saves on
looping if an array expression is implemented by sequential iteration
over the component elemental expressions (as may be done for the 'segment'
of the array expression local to each process).  This is because,
in the sequential version, the elemental function can be invoked
elementally in situ within the expression.  The array function,
on the other hand, must be executed before the expression is evaluated, 
storing its result in a temporary array for use within the expression.
Looping is then required during the execution of the array function
body as well as the expression evaluation.


                        John Merlin.


(P.S. I'm away for the next two weeks so I won't be able to answer your 
comments till mid July.  Sorry about that!)
-----------------------------------------------------------------------
John H. Merlin                               email: jhm@uk.ac.soton.ecs
Dept. of Electronics and Computer Science,   tel:   +44 703 593368
University of Southampton,                   fax:   +44 703 593045
Southampton S09 5NH,  U.K.

From joelw@mozart.convex.com  Mon Jun 29 07:49:25 1992
Received: from convex.convex.com by cs.rice.edu (AA23619); Mon, 29 Jun 92 07:49:25 CDT
Received: from mozart.convex.com by convex.convex.com (5.64/1.35)
	id AA05401; Mon, 29 Jun 92 07:49:11 -0500
Received: by mozart.convex.com (5.64/1.28)
	id AA06105; Mon, 29 Jun 92 07:49:06 -0500
From: joelw@mozart.convex.com (Joel Williamson)
Message-Id: <9206291249.AA06105@mozart.convex.com>
Subject: Re: User-defined elemental procedures
To: jhm@ecs.southampton.ac.uk (John Merlin)
Date: Mon, 29 Jun 92 7:49:06 CDT
Cc: hpff-forall@cs.rice.edu
In-Reply-To: <6600.9206271558@bacchus.ecs.soton.ac.uk>; from "John Merlin" at Jun 27, 92 4:58 pm
X-Mailer: ELM [version 2.3 PL11]

John Merlin writes:
> 

	...stuff deleted...
> 
> Informal proposal
> -----------------
> To define an elemental procedure, the subroutine or function statement 
> is prefixed with the new keyword 'ELEMENTAL', e.g.:
> 
> 	ELEMENTAL SUBROUTINE S (X)
> 
> 	ELEMENTAL REAL FUNCTION F (X)
> 
> Henceforth, most of the discussion is in terms of elemental functions;
> the properties of elemental subroutines follow in an obvious fashion.
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> 
> For a function to be valid for use as an elemental function, it must 
> obey the following constraints:
> 
> 	1. Its arguments must have intent '(IN)'.
> 	2. Local arrays cannot be SAVEd.
> 	3. It cannot define any global variable.
> 	4. Its arguments, result and local variables must not be distributed.
> 	5. It can only reference a global variable if it is *not distributed*.
> 
> ('local' and 'global' are used here in the normal programming sense, 
> rather than referring to distribution!).

It seems that the above constraints preclude the possibility of
elemental subroutines unless constraint 1 is relaxed.
> 
	...stuff deleted...
> 
> I would propose that only elemental functions (both user-defined and
> intrinsic) are allowed in FORALL, just as in array expressions.  
> Elemental subroutine calls could also be allowed in FORALL, with a very 
> similar interpretation to elemental functions.  The main reason for using 
> them would be to allow the return of multiple results.  
> (Incidentally, arguments returning a result can have intent '(IN)' or 
> '(INOUT)' as in F90).                                         ^^
                                                               OUT? 

This seems to be the needed relaxation, but allowing modification of
subroutine arguments probably opens yet another proverbial can of worms,
since a pathological and insidious programmer can find ways to introduce
innumerable side effects via this mechanism.  I think we'll be far
better off restricting ourselves to elemental functions only.


In any case, if we choose to allow function references within FORALL,
this proposal appears to define it in a clean, implementable way.

Best regards,

Joel Williamson

From zrlp09@trc.amoco.com  Mon Jun 29 09:33:52 1992
Received: from noc.msc.edu by cs.rice.edu (AA25038); Mon, 29 Jun 92 09:33:52 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA10547; Mon, 29 Jun 92 09:33:49 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA06575; Mon, 29 Jun 92 09:33:48 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA29304; Mon, 29 Jun 92 09:33:43 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA02789; Mon, 29 Jun 92 09:33:41 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA18309; Mon, 29 Jun 92 09:33:40 CDT
Message-Id: <9206291433.AA18309@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: User-defined elemental procedures (> John Merlin)
Date: Mon, 29 Jun 92 09:33:40 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>


> I'd like to propose extending the Fortran 90 concept of 'elemental 
> procedures' for the purpose of HP Fortran -- the extension being to 
> allow programmers to define such procedures, which in F90 are restricted 
> to a subset of the intrinsic procedures.  

True, programmers cannot define elemental procedures in a technical sense,
but they can get the effect of such procedures by overloading a procedure
name.  Overloading is resolved by matching the type/kind/rank pattern of
the actual arguments in an invocation against that of the dummy arguments
in a procedure definition.  For elemental procedures, rank is the relevant
issue.  Since rank must be between zero and seven, the programmer can
overload a procedure name with eight definitions, one for each rank, to
get the effect of elemental functions.

HPFF seems reluctant to add syntax to Fortran 90 (fortunately, in my view).
Adding syntax to make an existing capability less clumsy will, I hope,
have a low priority in our list of tasks.


> Informal proposal
> To define an elemental procedure, the subroutine or function statement 
> is prefixed with the new keyword 'ELEMENTAL', e.g.:
> 	ELEMENTAL SUBROUTINE S (X)
>	ELEMENTAL REAL FUNCTION F (X)

Fortran 8x contained, at one time, a facility for user-defined, elemental
procedures.  If HPF is to have this feature, we should review old drafts
of Fortran 8x to get the benefit of earlier thinking on the matter.


> Incidentally, I've chosen not to insist that the dummy arguments and 
> result are scalar---that's another extension to the F90 definition!  

The overloading device covers this extension.


> For a function to be valid for use as an elemental function, it must 
> obey the following constraints:
>	1. Its arguments must have intent '(IN)'.
>	2. Local arrays cannot be SAVEd.
>	3. It cannot define any global variable.
>	4. Its arguments, result and local variables must not be distributed.
>	5. It can only reference a global variable if it is *not distributed*.

I like constraints 1-3 (plus the prohibition of i/o, which John mentions
later is his proposal).  I don't see the need for contraint 4.  (Does "local
variable" mean dummy argument?)  Elemental functions operate locally on
arrays.  The result has the same shape as the arguments.  As long as the
argument and result arrays all have the same distribution, this seems a
perfect opportunity for parallel computation on distributed arrays.

Constraint 5 applies no more strongly to elemental functions than to ordinary
ones.  A computation will be more efficient when a function executes on the
processor whose fastest-access memory contains the data the function refers
to.  When the data resides elsewhere, the computation takes a performance hit.
Programmers wishing to avoid this will express themselves in some other way.


> These functions can also be used in a FORALL.  Because a 'forall-assignment'
> may be an 'array-assignment' (in most definitions anyway) the elemental
> function can have an array result.
>  ...
>
>	REAL  v (3,10,10)
>	INTERFACE
>	  ELEMENTAL FUNCTION f (x)
>	    REAL, DIMENSION(3) :: f, x
>	  END FUNCTION f
>	END INTERFACE
>	...
>	FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 

The invocation of f in the preceding FORALL is an ordinary invocation.
For this, f need not be elemental.

If f were invoked as    v=f(v)   then it would need to be elemental.
The question of the shape of the object f(v) needs careful attention.
John Merlin's proposal matches the first subscript of the actual and
dummy arguments and makes f elemental on the last two subscripts in
this invocation, so the shape of the result would be [10,10].

However, other matchings are possible.  Why not match the last subscripts?
The result shape would then be [3,10]?  The decision seems arbitrary; it
will be easy for programmers to forget which decision the language takes
-- a likely source of errors.


> I would propose that only elemental functions (both user-defined and
> intrinsic) are allowed in FORALL, just as in array expressions.  

I don't understand what this means.  Array expressions, even in FORALL
assignment, may refer to non-elemental functions.  John, will you clarify
this for me?


> Elemental subroutine calls could also be allowed in FORALL, with a very 
> similar interpretation to elemental functions.  The main reason for using 
> them would be to allow the return of multiple results.  

Structure-valued functions provide a way to return multiple results, so
this doesn't seem to be a strong reason for including subroutine calls
in FORALL.


> In addition, the implementation of an elemental function array in an
> array expression is likely to be more efficient than that of an 
> equivalent array function.  One reason is that it requires less temporary
> storage for the result ... .  Another is that it saves on looping ... .
> ... the elemental function can be invoked elementally in situ within
> the expression.

True, the ELEMENTAL designation provides more specific information to the
processor.  It raises the level of communication possible in the language.
This is a good argument for adding syntax.  It sways me, for one, but not
quite enough to support the idea.

  -  Rex Page

From jim@meiko.co.uk  Mon Jun 29 10:53:59 1992
Received: from marge.meiko.com by cs.rice.edu (AA27499); Mon, 29 Jun 92 10:53:59 CDT
Received: from hub.meiko.co.uk by marge.meiko.com (4.1/SMI-4.1)
	id AA03661; Mon, 29 Jun 92 11:50:14 EDT
Received: from spica.co.uk (spica.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA03374; Mon, 29 Jun 92 16:51:40 BST
Date: Mon, 29 Jun 92 16:51:40 BST
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9206291551.AA03374@hub.meiko.co.uk>
Received: by spica.co.uk (4.1/SMI-4.1)
	id AA15323; Mon, 29 Jun 92 16:51:39 BST
To: zrlp09@trc.amoco.com
Cc: hpff-forall@cs.rice.edu
In-Reply-To: "Rex Page"'s message of Mon, 29 Jun 92 09:33:40 -0500 <9206291433.AA18309@backus.trc.amoco.com>
Subject: User-defined elemental procedures (> John Merlin)

> > For a function to be valid for use as an elemental function, it must 
> > obey the following constraints:
> >	1. Its arguments must have intent '(IN)'.
> >	2. Local arrays cannot be SAVEd.
> >	3. It cannot define any global variable.
> >	4. Its arguments, result and local variables must not be distributed.
> >	5. It can only reference a global variable if it is *not distributed*.
> 
> I like constraints 1-3 (plus the prohibition of i/o, which John mentions
> later is his proposal).  I don't see the need for contraint 4.  (Does "local
> variable" mean dummy argument?)  Elemental functions operate locally on
> arrays.  The result has the same shape as the arguments.  As long as the
> argument and result arrays all have the same distribution, this seems a
> perfect opportunity for parallel computation on distributed arrays.

The local variables of a function are the ones which are declared
inside the function, and are not visible outside it. (In most
recursive languages they are declared on the stack). Rather different
from the dummy arguments.

> 
> Constraint 5 applies no more strongly to elemental functions than to ordinary
> ones.  A computation will be more efficient when a function executes on the
> processor whose fastest-access memory contains the data the function refers
> to.  When the data resides elsewhere, the computation takes a performance hit.
> Programmers wishing to avoid this will express themselves in some other way.

Quite right !

I think John is trying to achieve two separate effects here (since
he's away I think I can say this without fear of immediate
contradiction !)

1) To insist that the functions are "mathematical" in that they always
produce the same results when called with the same arguments, and have
no side effects. Rules 1,2,3 and "no I/O" are intended to achieve this.
(This is required to allow them in the context of FORALL, where the
order of evaluation of the function calls is explicitly not
specified). This is the major attribute of their being elemental, the
ability to deal with arguments of different rank is syntactical sugar
which makes sense because of the underlying property.

2) To ensure that all data references made by the function can be
achieved locally. (Rules four and five are trying to achieve this)

Point 1) is important if we're not throwing away a semantic
specification of what happens inside the FORALL.

Point 2) is "solely an implementation issue" :-). John wants it
because he wants a run-time which doesn't require demand driven
communication. It is easy to construct elemental functions which
access non-local data.  (Ignoring John's rules 4 and 5 for now)
e.g.
	REAL ELEMENTAL FUNCTION REMOTE (X)
	REAL X
	REAL Y(1000)
	COMMON /FOO/Y   ! Assuming Y can be distributed, add
			! appropriate MODULE if required 
CHPF$   TEMPLATE T(1000)
CHPF$   ALIGN Y WITH T
CHPF$   PROCESSORS P(10)
CHPF$   DISTRIBUTE T BLOCK ONTO P

 	REMOTE = Y(INT(ABS(X))

	END

C Some call of this
	REAL Z(1000),BAH(1000)	
	FORALL(I = 1:1000)
	   BAH(I) = REMOTE(Z(I))
	ENDFORALL

This function is definitely elemental, but will also require an
arbitrary communication pattern (which depends on the values in the
actuals), when the 1000 instances are called on the 10 processors. In
particular there is no knowledge local to any particular processor
that its values of Y will be required by any instance of the
subroutine. To implement this (in parallel) one needs demand
driven communication, so either a send receive model with a remote
store access daemon, or a direct (unsynchronised) remote read, remote
write model.

It is actually unclear that there is any easy way of making the
restrictions John wants visible in hpf, since "every array is create
with an alignment to some template, and every template is created
with some distribution onto some arrangement of processors."
Therefore my example would stand EVEN without any explicit mapping
statements for the global data.

In summary :-

Beware of confusing the semantic requirement of ELEMENTALITY with the
(MIMD) implementation issue of local access.

--Jim
James Cownie 
Meiko Limited
650 Aztec West
Bristol BS12 4SD
England

Phone : +44 454 616171
FAX   : +44 454 618188
E-Mail: jim@meiko.co.uk or jim@meiko.com


From gls@think.com  Mon Jun 29 12:54:28 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA01204); Mon, 29 Jun 92 12:54:28 CDT
Received: from mail.think.com by erato.cs.rice.edu (AA10740); Mon, 29 Jun 92 12:54:25 CDT
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Mon, 29 Jun 92 13:54:01 -0400
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.2)
	id AA10591; Mon, 29 Jun 92 13:54:00 EDT
Date: Mon, 29 Jun 92 13:54:00 EDT
Message-Id: <9206291754.AA10591@strident.think.com>
To: wu@cs.buffalo.edu
Cc: chk@cs.rice.edu, hpff-forall@erato.cs.rice.edu, wu@cs.buffalo.edu
In-Reply-To: Min-You Wu's message of Sat, 27 Jun 92 10:30:53 EDT <9206271430.AA15242@ruby.cs.Buffalo.EDU>
Subject:  Auf Wiederstehen

   Date: Sat, 27 Jun 92 10:30:53 EDT
   From: wu@cs.buffalo.edu (Min-You Wu)

   > 
   > The only comments I've gotten re: Min-You's INDEPENDENT proposal have
   > been positive.  I think the syntax needs some adjustment, but
   > otherwise it seems OK.  (INDEPENDENT is now used to mark loops and the
   > start of blocks; the second use should be changed to BEGIN INDEPENDENT
   > to avoid ambiguity.)  I'll try to generate an amended version while
   > I'm on the road (in the air).
   > 
   > 	Chuck
   > 

   This point is taken.  BEGIN INDEPENDENT is clear, though it seems
   too long.  I am still thinking about better words to substitute 
   BEGIN INDEPENDENT and END INDEPENDENT.  Any suggestion?

I can't resist:  BEGINDEPENDENT and ENDEPENDENT.
--Guy

From zrlp09@trc.amoco.com  Mon Jun 29 14:49:39 1992
Received: from noc.msc.edu by cs.rice.edu (AA05091); Mon, 29 Jun 92 14:49:39 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA29059; Mon, 29 Jun 92 14:49:38 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA04297; Mon, 29 Jun 92 14:49:35 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA00246; Mon, 29 Jun 92 14:49:30 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA08485; Mon, 29 Jun 92 14:49:28 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA19333; Mon, 29 Jun 92 14:49:28 CDT
Message-Id: <9206291949.AA19333@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: User-defined elemental procedures (>Cownie, >>>Merlin)
Date: Mon, 29 Jun 92 14:49:27 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>


> > > For a function to be valid for use as an elemental function, it must 
> > > obey the following constraints:
> > >	1. Its arguments must have intent '(IN)'.
> > >	2. Local arrays cannot be SAVEd.
> > >	3. It cannot define any global variable.
> > >	4. Its arguments, result and local variables must not be distributed.
> > >	5. It can only reference a global variable if it is *not distributed*.

> I think John is trying to achieve two separate effects here (since
> he's away I think I can say this without fear of immediate
> contradiction !)

> 1) To insist that the functions are "mathematical" ...

> 2) To ensure that all data references made by the function can be
> achieved locally. (Rules four and five are trying to achieve this)

> ...

> Point 2) is "solely an implementation issue" :-). John wants it
> because he wants a run-time which doesn't require demand driven
> communication. It is easy to construct elemental functions which
> access non-local data.

>  ...

> It is actually unclear that there is any easy way of making the
> restrictions John wants visible in hpf, since "every array is create
> with an alignment to some template, and every template is created
> with some distribution onto some arrangement of processors."

This seems to leave the distribution of some arrays up to the processor.
It doesn't seem to require that the distribution will be the same for
all arrays where distribution is not specified.  A processor could,
for example, allocate local arrays locally, giving the effect John wants
when the function is being evaluated in elemental mode.


  - Rex Page

From gls@think.com  Mon Jun 29 15:25:13 1992
Received: from mail.think.com by cs.rice.edu (AA06571); Mon, 29 Jun 92 15:25:13 CDT
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Mon, 29 Jun 92 16:25:09 -0400
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.2)
	id AA14353; Mon, 29 Jun 92 16:25:09 EDT
Date: Mon, 29 Jun 92 16:25:09 EDT
Message-Id: <9206292025.AA14353@strident.think.com>
To: zrlp09@trc.amoco.com
Cc: chk@cs.rice.edu, hpff-forall@cs.rice.edu
In-Reply-To: "Rex Page"'s message of Thu, 25 Jun 92 09:02:40 -0500 <9206251402.AA03627@backus.trc.amoco.com>
Subject: triangular array access and assignment-only block foralls 

   Date: Thu, 25 Jun 92 09:02:40 -0500
   From: "Rex Page" <zrlp09@trc.amoco.com>

   Chuck said:
     Yes, a Fortran 90 definition would have eliminated the need for the
     COPY_SEND intrinsic.  I think SUM_SEND and the others would still have
     been needed, because Fortran 90 doesn't have C-style += operators.

   It doesn't seem to me that the lack of += operators should be an obstacle.
   What's wrong with  A(v)=A(v)+B instead of A(v)+=B?

If v has duplicate values, then  A(v)=A(v)+B  is forbidden by Fortran 90.
SUM_SEND is intended to cover precisely this possibility.

Chuck is correct that COPY_SEND would not be needed if  A(v)=...  were
in fact defined to result in storing some one of the assigned values
in the case of duplicate values in v.  (Note that such a definition might
present some implementation difficulties in some architectures--consider
that explicit synchronization might be required, even in a shared-memory
machine, if elements of A were large enough that they could not be
updated in a single atomic memory operation.)  However, SUM_SEND would
still be needed, because if v has duplicates,  A(v)=A(v)+B  would result
in adding *one* of the B values to A(i), where i is some value duplicated
in v, rather than adding in *all* the B values which which the corresponding
element of v is i.  (You can find more discussion of this topic in the
context of the C language in:

  Rose, John R., and Steele, Guy L. Jr.  "C*: An Extended C Language for
  Data Parallel Programming."  Proc. Second International Conference on
  Supercomputing.  International Supercomputing Institute, Inc. (Santa
  Clara, 1987) Volume II, 2-16.

See especially section 5.4.)

--Guy

From jim@meiko.co.uk  Tue Jun 30 05:03:16 1992
Received: from marge.meiko.com by cs.rice.edu (AA16414); Tue, 30 Jun 92 05:03:16 CDT
Received: from hub.meiko.co.uk by marge.meiko.com (4.1/SMI-4.1)
	id AA13077; Tue, 30 Jun 92 05:59:32 EDT
Received: from spica.co.uk (spica.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA05686; Tue, 30 Jun 92 11:00:58 BST
Date: Tue, 30 Jun 92 11:00:58 BST
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9206301000.AA05686@hub.meiko.co.uk>
Received: by spica.co.uk (4.1/SMI-4.1)
	id AA19386; Tue, 30 Jun 92 11:00:55 BST
To: zrlp09@trc.amoco.com
Cc: hpff-forall@cs.rice.edu
In-Reply-To: "Rex Page"'s message of Mon, 29 Jun 92 14:49:27 -0500 <9206291949.AA19333@backus.trc.amoco.com>
Subject: User-defined elemental procedures (>> Cownie > Page)

>> It is actually unclear that there is any easy way of making the
>> restrictions John wants visible in hpf, since "every array is create
>> with an alignment to some template, and every template is created
>> with some distribution onto some arrangement of processors."
>
>This seems to leave the distribution of some arrays up to the processor.
>It doesn't seem to require that the distribution will be the same for
>all arrays where distribution is not specified.  A processor could,
>for example, allocate local arrays locally, giving the effect John wants
>when the function is being evaluated in elemental mode.

Terminology
===========
"local" is very confusing in this discussion, as it refers in closely
related contexts both to scope ("local array") and to the mapping of
the data onto processors ("the array can be locally accessed").
I think we should keep "local" for the scoping property (since they
got there first), so does anyone have a good suggestion for the 
property of being in the processor's memory ? (poor suggestion : "resident").

Content
=======
The local arrays for elemental functions should certainly be allocated
resident, however the problem also arises with the global data (see
example in previous mail).

A solution which would meet John's objective (but is not permissible
under his restrictions) would be to insist that the global data
accessed by the elemental function was replicated, thus ensuring that
it is resident on all processors.

(maybe resident isn't so poor !)

--Jim
James Cownie 
Meiko Limited
650 Aztec West
Bristol BS12 4SD
England

Phone : +44 454 616171
FAX   : +44 454 618188
E-Mail: jim@meiko.co.uk or jim@meiko.com


From zrlp09@trc.amoco.com  Tue Jun 30 07:52:29 1992
Received: from noc.msc.edu by cs.rice.edu (AA18073); Tue, 30 Jun 92 07:52:29 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA27775; Tue, 30 Jun 92 07:52:28 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA02198; Tue, 30 Jun 92 07:52:27 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA02619; Tue, 30 Jun 92 07:52:23 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA05340; Tue, 30 Jun 92 07:52:22 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA22358; Tue, 30 Jun 92 07:52:21 CDT
Message-Id: <9206301252.AA22358@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: User-defined elemental (> Cownie) - resident; local; global
Date: Tue, 30 Jun 92 07:52:19 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>

> Terminology
> ===========
> "local" is very confusing in this discussion, as it refers in closely
> related contexts both to scope ("local array") and to the mapping of
> the data onto processors ("the array can be locally accessed").
> I think we should keep "local" for the scoping property (since they
> got there first), so does anyone have a good suggestion for the 
> property of being in the processor's memory ?
> (poor suggestion : "resident").

We do need a new term, and I think "resident" is a good choice.


> Content
> =======
> The local arrays for elemental functions should certainly be allocated
> resident, however the problem also arises with the global data (see
> example in previous mail).

> A solution which would meet John's objective (but is not permissible
> under his restrictions) would be to insist that the global data
> accessed by the elemental function was replicated, thus ensuring that
> it is resident on all processors.

If the demand-driven-data-access-problem is bad,
the replicated-data-coherence-problem is worse.
I hope we don't try to tackle that one this go-round.

How about requiring that global data accessed by elemental functions
be constant data (identifiers defined in PARAMETER statements,
made accessible through MODULE USE)?  That would circumvent the
coherence problem.

   - Rex Page

From jim@meiko.co.uk  Tue Jun 30 09:42:37 1992
Received: from marge.meiko.com by cs.rice.edu (AA20772); Tue, 30 Jun 92 09:42:37 CDT
Received: from hub.meiko.co.uk by marge.meiko.com (4.1/SMI-4.1)
	id AA15715; Tue, 30 Jun 92 10:38:53 EDT
Received: from spica.co.uk (spica.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA06127; Tue, 30 Jun 92 15:40:18 BST
Date: Tue, 30 Jun 92 15:40:18 BST
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9206301440.AA06127@hub.meiko.co.uk>
Received: by spica.co.uk (4.1/SMI-4.1)
	id AA08139; Tue, 30 Jun 92 15:38:46 BST
To: hpff-forall@cs.rice.edu
In-Reply-To: "Rex Page"'s message of Tue, 30 Jun 92 07:52:19 -0500 <9206301252.AA22358@backus.trc.amoco.com>
Subject: User-defined elemental (> Page) - replicated data coherence

> If the demand-driven-data-access-problem is bad,
> the replicated-data-coherence-problem is worse.
> I hope we don't try to tackle that one this go-round.

Ahh, but we already are trying to tackle that one. The distribution
proposal shows precisely how to express this e.g. quoting the example
on page 7 (of June 9 version)
"
CHPF$ ALIGN A(:) WITH D(:,*)	
means that a copy of A is aligned with every column of D.
"

Since we're already accepting that we have to solve the "replicated
data coherence problem" allowing READ ONLY access to such replicated
global data inside ELEMENTAL functions in FORALL loops seems fine. 

In particular the ELEMENTAL assertion assures us that we don't have to
worry about any updates to this replicated global data, since
elemental functions are forbidden to update global data. (John's rule
3). Therefore this is actually a simpler case than the general one,
where replicated data can be updated. 

It's actually the interaction of loosely synchronised MIMD parallelism
with the replicated data which leads to the problems, in the SIMD (or
strongly synchronised MIMD) model which is supported by HPF at the
moment, the replicated data is not a problem, since read/write or
write/write races cannot occur because all the processors are
conceptually executing the same program with the same data in lock
step. Naively obvious implementations either do exactly this and
replicate the execution of the code and all the scalars on all nodes,
or centralise it (onto the "host node") and then broadcast the updates
before the next "parallel" operation. Serious implementations will
doubtless do a lot of data-flow analysis to determine exactly which
values are required where to reduce the amount of communication they
have to perform.

These store races only become possible when there are many threads of
control, each of which can be generating a different memory access
pattern, as inside the functions called from a FORALL, (and possibly
inside an HPF$INDEPENDENT do loop, or an HPF$BEGINDEPENDENT,
HPF$ENDEPENDENT block).

It's not clear to me (yet) whether there are any restrictions on what
one is allowed to put in an INDEPENDENT loop, and whether similar
issues  re-appear in that context.

--Jim
James Cownie 
Meiko Limited
650 Aztec West
Bristol BS12 4SD
England

Phone : +44 454 616171
FAX   : +44 454 618188
E-Mail: jim@meiko.co.uk or jim@meiko.com


From loveman@ftn90.enet.dec.com  Tue Jun 30 11:04:44 1992
Received: from enet-gw.pa.dec.com by cs.rice.edu (AA23035); Tue, 30 Jun 92 11:04:44 CDT
Received: by enet-gw.pa.dec.com; id AA03721; Tue, 30 Jun 92 09:04:36 -0700
Message-Id: <9206301604.AA03721@enet-gw.pa.dec.com>
Received: from ftn90.enet; by decwrl.enet; Tue, 30 Jun 92 09:04:42 PDT
Date: Tue, 30 Jun 92 09:04:42 PDT
From: David Loveman <loveman@ftn90.enet.dec.com>
To: jhm@ecs.southampton.ac.uk
Cc: hpff-forall@cs.rice.edu, loveman@ftn90.enet.dec.com
Apparently-To: jhm@ecs.southampton.ac.uk, hpff-forall@cs.rice.edu
Subject: RE: Conditional assignments in FORALL


Your example:

>The case where no 's2' branch is performed until all 's1' branches
>have executed (but using the old values in the mask) can be written:
>
>	temp = ABS (a) < eps
>	FORALL (i=1:n, temp)       a(i) = 0.0
>	FORALL (i=1:n, .NOT.temp)  a(i) = a(i-1) + a(i+1)

doesn't quite work - the FORALL requires a scalar mask espression.  The
fixup is easy:

	temp = ABS (a) < eps
	FORALL (i=1:n, temp(i))       a(i) = 0.0
	FORALL (i=1:n, .NOT.temp(i))  a(i) = a(i-1) + a(i+1)

-David

From jim@meiko.co.uk  Tue Jun 30 11:10:02 1992
Received: from marge.meiko.com by cs.rice.edu (AA23189); Tue, 30 Jun 92 11:10:02 CDT
Received: from hub.meiko.co.uk by marge.meiko.com (4.1/SMI-4.1)
	id AA16868; Tue, 30 Jun 92 12:06:18 EDT
Received: from spica.co.uk ([192.131.108.50]) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA06228; Tue, 30 Jun 92 16:30:16 BST
Date: Tue, 30 Jun 92 16:30:16 BST
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9206301530.AA06228@hub.meiko.co.uk>
Received: by spica.co.uk (4.1/SMI-4.1)
	id AA08378; Tue, 30 Jun 92 16:28:44 BST
To: hpff-forall@cs.rice.edu
In-Reply-To: "Rex Page"'s message of Tue, 30 Jun 92 07:52:19 -0500 <9206301252.AA22358@backus.trc.amoco.com>
Subject: User-defined elemental (> Page) - replicated data coherence

> If the demand-driven-data-access-problem is bad,
> the replicated-data-coherence-problem is worse.
> I hope we don't try to tackle that one this go-round.

Ahh, but we already are trying to tackle that one. The distribution
proposal shows precisely how to express this e.g. quoting the example
on page 7 (of June 9 version)
"
CHPF$ ALIGN A(:) WITH D(:,*)	
means that a copy of A is aligned with every column of D.
"

Since we're already accepting that we have to solve the "replicated
data coherence problem" allowing READ ONLY access to such replicated
global data inside ELEMENTAL functions in FORALL loops seems fine. 

In particular the ELEMENTAL assertion assures us that we don't have to
worry about any updates to this replicated global data, since
elemental functions are forbidden to update global data. (John's rule
3). Therefore this is actually a simpler case than the general one,
where replicated data can be updated. 

It's actually the interaction of loosely synchronised MIMD parallelism
with the replicated data which leads to the problems, in the SIMD (or
strongly synchronised MIMD) model which is supported by HPF at the
moment, the replicated data is not a problem, since read/write or
write/write races cannot occur because all the processors are
conceptually executing the same program with the same data in lock
step. Naively obvious implementations either do exactly this and
replicate the execution of the code and all the scalars on all nodes,
or centralise it (onto the "host node") and then broadcast the updates
before the next "parallel" operation. Serious implementations will
doubtless do a lot of data-flow analysis to determine exactly which
values are required where to reduce the amount of communication they
have to perform.

These store races only become possible when there are many threads of
control, each of which can be generating a different memory access
pattern, as inside the functions called from a FORALL, (and possibly
inside an HPF$INDEPENDENT do loop, or an HPF$BEGINDEPENDENT,
HPF$ENDEPENDENT block).

It's not clear to me (yet) whether there are any restrictions on what
one is allowed to put in an INDEPENDENT loop, and whether similar
issues  re-appear in that context.

--Jim
James Cownie 
Meiko Limited
650 Aztec West
Bristol BS12 4SD
England

Phone : +44 454 616171
FAX   : +44 454 618188
E-Mail: jim@meiko.co.uk or jim@meiko.com


From loveman@ftn90.enet.dec.com  Tue Jun 30 12:31:14 1992
Received: from enet-gw.pa.dec.com by cs.rice.edu (AA25028); Tue, 30 Jun 92 12:31:14 CDT
Received: by enet-gw.pa.dec.com; id AA11357; Tue, 30 Jun 92 10:31:05 -0700
Message-Id: <9206301731.AA11357@enet-gw.pa.dec.com>
Received: from ftn90.enet; by decwrl.enet; Tue, 30 Jun 92 10:31:06 PDT
Date: Tue, 30 Jun 92 10:31:06 PDT
From: David Loveman <loveman@ftn90.enet.dec.com>
To: hpff-forall@cs.rice.edu
Cc: loveman@ftn90.enet.dec.com
Apparently-To: hpff-forall@cs.rice.edu
Subject: 2 cents on functions in FORALL


The 3 current FORALL proposals, Guy's, mine, and Marc's, all provide
roughly comparable definitions of single assignment FORALL.  These
definitions all allow functions, including user functions, subject only
to a restriction of the form:

     A function reference appearing in any expression in the 
     forall-assignment must not redefine any subscript name.

and the general Fortran 90 restriction (7.1.7):

     The evaluation of a function reference must neither affect nor be 
     affected by the evaluation of any other entity within the statement.

Thus the issue of functions in FORALL is more one of implementation -
finding cases where a parallel implementation is correct and a scalar
implementation is not required.  The meaning of function references in
FORALL is given in all 3 proposals by "scalarizations" such as the one
following.  I believe it is *very* important to preserve this form of
scalar semantics for FORALL.  It both allows for a straightforward
scalar implementation and answers questions such as "how many times is
the mask evaluated."  

   A forall-stmt of the general form:

   FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn [, mask] ) &
         a(e1,...,em) = rhs

   is equivalent to the following standard Fortran 90 code:

   !evaluate subscript and stride expressions in any order
   templ1 = l1
   tempu1 = u1
   temps1 = s1
   templ2 = l2
   tempu2 = u2
   temps2 = s2
     ...
   templn = ln
   tempun = un
   tempsn = sn

   !then evaluate the scalar mask expression
   DO v1=templ1,tempu1,temps1
     DO v2=templ2,tempu2,temps2
       ...
         DO vn=templn,tempun,tempsn
           tempmask(v1,v2,...,vn) = mask
         END DO
   	  ...
     END DO
   END DO

   !then evaluate the expr in the forall-assignment 
   !(and lhs subscripts) 
   !for all valid combinations of subscript names 
   !for which the scalar mask expression is true
   DO v1=templ1,tempu1,temps1
     DO v2=templ2,tempu2,temps2
       ...
         DO vn=templn,tempun,tempsn
           IF (tempmask(v1,v2,...,vn)) THEN
             !in any order
             tempe1(v1,v2,...,vn) = e1
               ...
             tempem(v1,v2,...,vn) = em
             temprhs(v1,v2,...,vn) = rhs
           END IF
         END DO
   	  ...
     END DO
   END DO

   !then perform the assignment of these values to 
   !the corresponding elements of the array being assigned to
   DO v1=templ1,tempu1,temps1
     DO v2=templ2,tempu2,temps2
       ...
         DO vn=templn,tempun,tempsn
           IF (tempmask(v1,v2,...,vn)) THEN
             a(tempe1(v1,v2,...,vn),...,tempem(v1,v2,...,vn)) 
                  = temprhs(v1,v2,...,vn)
           END IF
         END DO
	     ...
     END DO
   END DO


It's worth noting(1) that there is a more efficient scalarized definition of FORALL:

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn
tempa = a   ! array assignment
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF mask THEN
          tempa(e1,...,em) = rhs
        END IF
      END DO
	  ...
  END DO
END DO
a = tempa   ! array assignment

In many cases, compiler optimizations such as copy propagation can
eliminate the requirement for temporaries for lower bounds, upper
bounds and strides.  Similarly, dependence analysis may eliminate the
requirement for the array temporary.  Thus a FORALL statement such as (2)

FORALL (I=1:N, J=1:M:2) A(I,J) = I * B(J)

could be implemented on a scalar machine as

DO I=1,N
  DO J=1,M,2
    A(I,J) = I * B(J)
  END DO
END DO

On a parallel SIMD or MIMD machine it can, of course, be implemented in parallel.


(1) Loveman, David.  "Element Array Assignment - the FORALL Statement,"
presented at Third Workshop on Compilers for Parallel Computers,
Vienna, Austria, July 6-9, 1992.

(2) Eugene Albert, Joan D. Lukas, and Guy L. Steele, Jr.  "Data
Parallel Computers and the FORALL Statement,"  Journal of Parallel and
Distributed Computing, October 1991.

From joelw@mozart.convex.com  Wed Jul  8 10:06:39 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA22804); Wed, 8 Jul 92 10:06:39 CDT
Received: from convex.convex.com by erato.cs.rice.edu (AA13218); Wed, 8 Jul 92 10:06:32 CDT
Received: from mozart.convex.com by convex.convex.com (5.64/1.35)
	id AA02895; Wed, 8 Jul 92 10:07:45 -0500
Received: by mozart.convex.com (5.64/1.28)
	id AA28010; Wed, 8 Jul 92 10:07:42 -0500
From: joelw@mozart.convex.com (Joel Williamson)
Message-Id: <9207081507.AA28010@mozart.convex.com>
Subject: Testing, testing...
To: hpff-distribute@cs.rice.edu (HPFF Distribute Subgroup),
        hpff-forall@erato.cs.rice.edu (HPFF FORALL Group)
Date: Wed, 8 Jul 92 10:07:42 CDT
X-Mailer: ELM [version 2.3 PL11]

This page unintentionally left non-blank.

I haven't received any email from the working groups for several weeks,
so I'm just testing the net.  Perhaps the whole world is on vacation.

Cheers,

Joel

From karp@hplahk.hpl.hp.com  Tue Jul 14 11:05:43 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA26297); Tue, 14 Jul 92 11:05:43 CDT
Received: from hplms2.hpl.hp.com by erato.cs.rice.edu (AA15330); Tue, 14 Jul 92 11:05:20 CDT
Received: from hplahk.hpl.hp.com by hplms2.hpl.hp.com with SMTP
	(16.5/15.5+IOS 3.20) id AA20404; Tue, 14 Jul 92 09:05:13 -0700
Received: by hplahk.hpl.hp.com
	(16.6/15.5+IOS 3.14) id AA02037; Tue, 14 Jul 92 09:05:04 -0700
Date: Tue, 14 Jul 92 09:05:04 -0700
From: Alan Karp <karp@hplahk.hpl.hp.com>
Message-Id: <9207141605.AA02037@hplahk.hpl.hp.com>
To: hpff-forall@erato.cs.rice.edu
Reply-To: "Alan Karp" <karp@hplms2.hpl.hp.com>
Subject: Back to square one

I just recently got a copy of all the discussion on FORALL and have
plowed through most of it. It seems that people are asking the wrong
question. I don't want to decide what kind of FORALL is to be adopted
until we decide if there should be any kind of FORALL at all. 

Attached is a copy of a report I wrote while I was with IBM, before I
read all the discussion. IBM did not want it widely distributed
because they were afraid someone might think it was an official IBM
position. (How anyone could believe that my rantings are the official
position of anyone is beyond me.)

What will it take to convince me that FORALL is necessary? Some code
examples where compiler parallelization assisted with directives won't
work would be a start. However, only examples from real programs are
allowed. You must also give me the opportunity to rewrite them as I
would do if I were tuning for a new machine. Notational convenience
won't do it since I think there are other mechanisms that would do the
job at least as well as FORALL.

-----------------------------------------------------------------

%%**start of header
%forall.tex

\documentstyle[11pt]{article}
\pagestyle{plain}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin  .25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin   -.5in
\textwidth 6in
\textheight  9.0in
\def\forall{{\tt FORALL}}


\title{The Case Against FORALL}
\author{Alan Karp}
%\date{   }


\begin{document}

\maketitle

%%**end of header
%
\section{Introduction} 	
\label{sec:intro}

Programming parallel processors is hard.
One way to make the job easier is to put the burden on the compiler.
While this approach won't work for ``dusty decks'',
it shows promise for newly written programs.
A number of extensions to Fortran have been proposed to make this
job easier.
These extensions include array language,\cite{f90explained}
data distribution statements,\cite{dpf}
and various forms of \forall.\cite{forall,fortrand}

In this note I will start with a discussion of the various
\forall\ proposals.
Next, I will explain why I think \forall\ is both unnecessary
and dangerous.

\section{\forall}
\label{sec:forall}

The \forall\ statement is a convenient way to denote certain
array operations.
At various times it was included in the Fortran 8x language\cite{f8xstandard}
although it did not make it into the Fortran 90 standard.\cite{f90standard}
\forall\ is an important part of CM Fortran\cite{forall},
Data Parallel Fortran\cite{dpf}, and both
Fortran D and Fortran 90D.\cite{fortrand}

\forall\ was introduced for a couple of reasons.
Firstly, there is no way to express certain constructs in Fortran 90.
For example, there is no way of assigning the diagonal of a matrix to a 
vector or of working with non-conformable arrays.\cite{forall}

\begin{figure*}
  \begin{verbatim}
  a(2:n-1,2:n-1,2:n-1) =
 *   c(2:n-1,2:n-1,2:n-1)*(f(2:n-1,2:n-1,2:n-1)-f(1:n-2,2:n-1,2:n-1))
 * + d(2:n-1,2:n-2,2:n-1)*(f(2:n-1,2:n-1,2:n-1)-f(2:n-1,1:n-2,2:n-1))
 * + e(2:n-1,2:n-1,2:n-1)*(f(2:n-1,2:n-1,2:n-1)-f(2:n-1,2:n-1,1:n-2))
  \end{verbatim}
  \caption[Typical PDE statement in Fortran 90]
    {Fortran 90 statement that might be found in a code solving
     a partial differential equation.}
  \label{fig:pdef90}
\end{figure*}

\begin{figure*}
  \begin{verbatim}
   forall(i=2,n-1;j=2,n-1;k=2,n-1)
      a(i,j,k) = c(i,j,k)*(f(i,j,k)-f(i-1,j,k))
  *            + d(i,j,k)*(f(i,j,k)-f(i,j-1,k))
  *            + e(i,j,k)*(f(i,j,k)-f(i,j,k-1))
   endforall
  \end{verbatim}
  \caption[Typical PDE statement using a \forall]
    {The statement of Figure~\ref{fig:pdef90} written using a \forall.}
  \label{fig:pdeforall}
\end{figure*}

Another reason is notational convenience.
Consider a discretization problem that treats the boundary
variables differently from those on the interior.
A typical Fortran 90 statement is shown in Figure~\ref{fig:pdef90}.
Figure~\ref{fig:pdeforall} shows how it could be written with a \forall.
Clearly, the latter form is both easier to write and easier to read.

Thinking Machines Corporation had a different motivation for
introducing \forall.\cite{cmfortran}
Their SIMD back end is controlled by a front end processor.
In CM Fortran, any code not in an array language statement
is executed sequentially on the front end processor;
only code written in array language is executed in parallel
on the SIMD part of the machine.
Thus, the only way to get parallel execution of a construct
that is not expressible in Fortran 90 array notation is to use \forall.

There are two flavors proposed for \forall\ --
the single statement and block forms.
A single statement \forall\ takes as its range a single line of code,
but there are restrictions on what can be in that line.
For example, in CM Fortran any reference to an external function or an 
array stored on the front end machine causes the loop to be executed serially.

The block \forall\ takes a block of code as its argument,
but there is disagreement on the semantics of the block of code.
A standard Fortran 90 statement is executed as if the entire right hand
side is evaluated before any assignments are made.
This {\em copy-in} semantics side steps the problem of
statements with dependences.
The controversy centers on whether or not the block of code should
be understood in this way.

The block \forall\ of Fortran 90D extends copy-in semantics to the entire block with
an exception.
Each loop iteration is considered to be executed by a separate task.
The task sees the current value of array elements that it owns;
it sees the value at loop entry for all others.
For example,

\begin{verbatim}
   a(0:n) = 1.0
   forall i=1,n
      a(i) = a(i-1) + 1.0
      b(i) = a(i-1)
      c(i) = a(i)
   endforall
\end{verbatim}
%
At the completion of this loop,
{\tt a(1:n) = 2.0}, {\tt b(1:n) = 1.0}, and {\tt c(1:n) = 2.0}.

\section{Dangerous}
\label{sec:dangerous}

I believe that \forall\ is dangerous because the meaning of a
statement depends on its context.
Show the loop at the end of the last section to 10 Fortran programmers,
and ask them to tell you the values of the arrays at the
end of the loop.
You are likely to get 10 different answers.
(Well, maybe not 10.)
If this problem occurs for a simple loop,
what will happen with a more complicated one?

Imagine debugging an application code,
especially one you did not write.
Let's say that there is a loop coded with a \forall\ 
with one statement that should have been in a serial loop.
How do you find the error?
You can't simply change the \forall\ to a conventional {\tt DO}
because that would change the meaning of the other statements.
Asking the compiler to do a dependence analysis for you
probably won't help because there will be many reported
dependences that the copy-in semantics handles correctly.
Visual examination might find the error,
but the fact that statements within a \forall\ are interpreted
differently from those in a {\tt DO} will make the job hard.

The one line \forall\ is not as bad,
The copy-in semantics is not as serious a problem,
but could still result in bugs that are hard to find.
The line of code {\tt A(I)=A(I-1)+1.0} means something different in a
\forall\ than it does in a {\tt DO}.
A program containing one when the other was intended would not run correctly.
Finding such an error would be difficult.

\section{Unnecessary}
\label{sec:unnecessary}

\forall\ is unnecessary.
Modern compilers do a very good job parallelizing loops.\cite{pfparticle}
They are much better than programmers in doing dependence analysis.
When the compiler doesn't have enough information to decide if
a data dependence exists,
programmers can insert directives.

There are very few loop constructs expressible with \forall\
that a compiler could not automatically parallelize except
for those that depend on copy-in semantics.
It is my feeling that a programmer who wants copy in semantics
can introduce temporary variables.

\begin{verbatim}
   a(0:n) = 1.0
   t = a
   do i=1,n
      a(i) = t(i-1) + 1.0
      b(i) = t(i-1)
      c(i) = a(i)
   enddo
\end{verbatim}
%
We can even expect the compiler to recognize that {\tt t} is a temporary
and not use any more storage than required with \forall.
A programmer who later changes the {\tt t} in the first line
of the loop to an {\tt a} might get a serial loop.
What won't happen is a statement that gives different results  depending
on its context.

\begin{figure*}
  \begin{verbatim}
    m = spread ( p(:,:,ic) .lt.  spread(amid,1,nn), 3, 10 )
    px(1:nn/2,:,:) = reshape( pack(p,m), (/(nn/2),nc,10/) )
    px(nn/2+1:nn,:,:) = reshape( pack(p,.not.m), (/(nn/2),nc,10/) )
  \end{verbatim}
  \caption[Fortran 90 version of part of an N-body code.]
    {Part of a tree-based N-body code that recursively partitions
     the particles into two groups.}
  \label{fig:nbodyf90}
\end{figure*}

Some very complicated constructs become simple to express
if we combine array language with conventional {\tt DO} loops.
For example, the code in Figure~\ref{fig:nbodyf90} is used to construct part 
of the data structure needed for an N-body simulation that
uses a tree structure.\cite{barneshut}
%
\begin{figure*}
  \begin{verbatim}
    iota = (/(1:nn)/)
    do k = 1, nc
       m = p(:,k,ic) .lt. amid
       p(:,k,:) = p( (/ pack(iota,m), pack(iota,.not.m) /), k, : )
    enddo
  \end{verbatim}
  \caption[Mixed version of tree code.]
    {The code of Figure~\ref{fig:nbodyf90} mixing Fortran 90 and
     an ordinary {\tt DO} loop.}
  \label{fig:nbodyforall}
\end{figure*}
% 
An APL programmer can figure out what is going on,
but most Fortran programmers would find it cryptic,
to say the least.
(This version is much clearer than the one line version I created
on the first pass.)

Compare it with the mixed version shown in Figure~\ref{fig:nbodyforall}.
Although the code is a bit longer,
it is easier to understand,
and the compiler should have no trouble parallelizing over {\tt k}.
The \forall\ version of this loop gained me nothing in
either expressiveness or clarity.

\section{Conclusions}
\label{sec:conclusions}

People are smart and can learn most anything.
(As a graduate student I was able to use an APL type ball on
a 1050 terminal to do my Fortran work.)
However, \forall\ is particularly insidious because it
forces us to look at familiar statements and interpret them
differently.
Worse yet, the meaning of any statement depends on its context;
the output from {\tt a(i) = a(i-1) + 1.0} depends on whether
it is contained in a {\tt DO} loop or a \forall.
It is the schizophrenia \forall\ forces on programmers that
is the most dangerous part of the construct.

The only argument in favor of \forall\ that makes sense to me
is the notational convenience.
I think we should be able to find a more direct way to 
simplify the expressions such as the {\tt DOMAIN} statement in Data
Parallel Fortran.\cite{dpf}

In conclusion, I think I have shown ways in which \forall\ is
dangerous.
I have given examples where I have shown that it is unnecessary.
I have even suggested a way to simplify the expression of
complicated statements.

\forall\ should not be adopted as a standard,
de facto or otherwise,
without much more discussion than has been given the question to date.
In describing High Performance Fortran,
the statement

\begin{quote}
What sort of \forall\ should be included?
\end{quote}
%
has been made.\cite{hpfsummary}
I would be happier if the statement read

\begin{quote}
What sort of \forall , {\em if any}, should be included?
\end{quote}
%
% Bibliography
%
\begin{thebibliography}{10}

\bibitem{forall}
E.~Albert, J.~D. Lukas, and G.~L. {Steele, Jr.}
\newblock {Data Parallel Computers and the FORALL Statement}.
\newblock {\em Journal of Parallel and Distributed Computing}, 13(2):185--192,
  October 1991.

\bibitem{f8xstandard}
{American National Standards Institute, Inc.}
\newblock {American National Standards for Information Systems Programming
  Language Fortran (Fortran 8x)}.
\newblock Technical Report Draft S8, Version 109 (X3.9-198x), American National
  Standards Institute, Washington, DC, August 1988.

\bibitem{f90standard}
{American National Standards Institute, Inc.}
\newblock {American National Standards for Information Systems Programming
  Language Fortran (Fortran 90)}.
\newblock Technical Report ISO/IEC 1539: 1991(E), American National Standards
  Institute, Washington, DC, 1991.

\bibitem{barneshut}
J.~Barnes and P.~Hut.
\newblock {A Hierarchical O(NlogN) Force-Calculation Algorithm}.
\newblock {\em Nature}, 324:446--449, 1986.

\bibitem{hpfsummary}
Walt Brainerd.
\newblock {High Performance Fortran}.
\newblock {\em Fortran Journal}, 4(1):14--15, January/February 1992.

\bibitem{cmfortran}
Thinking~Machines Corporation.
\newblock {\em {CM Fortran}}.
\newblock Thinking Machines Corporation, Cambridge, Mass., 1990.

\bibitem{dpf}
P.~M. Elustondo, L.~A. Vazquez, O.~J. Nestares, J.~S. Avalos, G.~A. Alvarez,
  C.-T. Ho, and J.~L.~C. Sanz.
\newblock {Data Parallel Fortran}.
\newblock Technical report, IBM RJ 8690, March 1992.

\bibitem{fortrand}
F.~Fox, S.~Hirandandani, K.~Kennedy, C.~Koelbel, U.~Kremer, C.~Tseng, and
  M.~Wu.
\newblock {Fortran D Language Specification}.
\newblock Technical Report TR90--141, Computer Science Dept., Rice University,
  December 1990.

\bibitem{f90explained}
M.~Metcalf and J.~Reid.
\newblock {\em {Fortran 90 Explained}}.
\newblock Oxford Science Publishers, Oxford, 1990.

\bibitem{pfparticle}
Leslie~J. Toomey, Emily~C. Plachy, Randolf~G. Scarborough, Richard~J. Sahulka,
  Jin~F. Shaw, and Alfred~W. Shannon.
\newblock {IBM Parallel Fortran}.
\newblock {\em IBM Systems Journal}, 27(4):416--435, 1988.

\end{thebibliography}
%
\end{document}

From @eros.uknet.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Tue Jul 14 12:18:43 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA28762); Tue, 14 Jul 92 12:18:43 CDT
Via: uk.ac.uknet-relay; Tue, 14 Jul 1992 18:17:49 +0100
Received: from ecs.soton.ac.uk by eros.uknet.ac.uk via JANET with NIFTP (PP) 
          id <1786-0@eros.uknet.ac.uk>; Tue, 14 Jul 1992 18:15:43 +0100
Via: camra.ecs.soton.ac.uk; Tue, 14 Jul 92 18:11:08 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Tue, 14 Jul 92 18:16:49 BST
Date: Tue, 14 Jul 92 18:14:02 BST
Message-Id: <257.9207141714@bacchus.ecs.soton.ac.uk>
To: jim@meiko.co.uk, joelw@mozart.convex.com, zrlp09@trc.amoco.com
Subject: Re: User-defined elemental procedures
Cc: hpff-forall@cs.rice.edu

Firstly, thanks to everyone who read my lengthy proposal for 
'user-defined elemental procedures', and particularly to Joel 
Williamson, Rex Page and Jim Cownie for their comments.  Here 
I'll dip in with some responses to the discussion I've seen so far.


Joel Williamson <joelw@com.convex.mozart Mon Jun 29 13:53:00 1992> writes:

* > For a function to be valid for use as an elemental function, it must 
* > obey the following constraints:
* > 
* > 	1. Its arguments must have intent '(IN)'.
* > 	2. Local arrays cannot be SAVEd.
* > 	3. It cannot define any global variable.
* > 	4. Its arguments, result and local variables must not be distributed.
* > 	5. It can only reference a global variable if it is *not distributed*.
* > 
* > ('local' and 'global' are used here in the normal programming sense, 
* > rather than referring to distribution!).
* 
* It seems that the above constraints preclude the possibility of
* elemental subroutines unless constraint 1 is relaxed.

Agreed.  Perhaps I should modify this to:

     1. For an elemental function, every argument must have intent '(IN)'.


* > I would propose that only elemental functions (both user-defined and
* > intrinsic) are allowed in FORALL, just as in array expressions.  
* > Elemental subroutine calls could also be allowed in FORALL, with a very 
* > similar interpretation to elemental functions.  The main reason for using 
* > them would be to allow the return of multiple results.  
* > (Incidentally, arguments returning a result can have intent '(IN)' or 
* > '(INOUT)' as in F90).                                         ^^
*                                                                OUT? 

Well spotted!  (although requiring the subroutine results to have intent 
'(IN)' would eliminate the undesirable side effects you mention next:-))

* This seems to be the needed relaxation, but allowing modification of
* subroutine arguments probably opens yet another proverbial can of worms,
* since a pathological and insidious programmer can find ways to introduce
* innumerable side effects via this mechanism.  I think we'll be far
* better off restricting ourselves to elemental functions only.

Perhaps you're right.  However, Fortran 90 has a rule that any dummy
argument that is modified may not be accessed via an alias 
(e.g. via another dummy argument, or a common block or module, or
by host association).  Explicit interfaces, which I think should be 
mandatory in this context, actually allow this rule to be partially 
checked (i.e. with respect to dummy argument aliases).  

I believe that this rule prevents any undesirable side effects  (or are 
there other problems that I've missed?).  If so, then one could allow 
elemental subroutines but emphasise that they must adhere to the 
standard rules---taking the view that non standard-conforming programs 
can't be expected to give well-defined results, and that their authors 
deserve what they get.  Perhaps elemental subroutines would be sufficiently 
useful to make this course of action preferable.


Rex Page <zrlp09@com.amoco.trc Mon Jun 29 16:08:53 1992> writes:

* > I'd like to propose extending the Fortran 90 concept of 'elemental 
* > procedures' for the purpose of HP Fortran -- the extension being to 
* > allow programmers to define such procedures, which in F90 are restricted 
* > to a subset of the intrinsic procedures.  
* 
* True, programmers cannot define elemental procedures in a technical sense,
* but they can get the effect of such procedures by overloading a procedure
* name.  Overloading is resolved by matching the type/kind/rank pattern of
* the actual arguments in an invocation against that of the dummy arguments
* in a procedure definition.  For elemental procedures, rank is the relevant
* issue.  Since rank must be between zero and seven, the programmer can
* overload a procedure name with eight definitions, one for each rank, to
* get the effect of elemental functions.
* 
* HPFF seems reluctant to add syntax to Fortran 90 (fortunately, in my view).
* Adding syntax to make an existing capability less clumsy will, I hope,
* have a low priority in our list of tasks.

I don't think that the equivalence you suggest is correct:  an elemental 
procedure call with array-valued arguments isn't equivalent to an 
array-valued procedure.  Elemental procedures have (or rather can be 
defined to have) restrictions that permit independent and concurrent 
execution over the elements of the arguments, which ordinary procedures 
don't have.  Evidence of this is that you can use the former in a WHERE 
construct, but not the latter.

The additional syntax I'm suggesting is very minor---just the additional
keyword 'ELEMENTAL'.


* Fortran 8x contained, at one time, a facility for user-defined, elemental
* procedures.  If HPF is to have this feature, we should review old drafts
* of Fortran 8x to get the benefit of earlier thinking on the matter.

I agree, and confess that I haven't done so yet!


* > Incidentally, I've chosen not to insist that the dummy arguments and 
* > result are scalar---that's another extension to the F90 definition!  
* 
* The overloading device covers this extension.

See above.


* > For a function to be valid for use as an elemental function, it must 
* > obey the following constraints:
* >	1. Its arguments must have intent '(IN)'.
* >	2. Local arrays cannot be SAVEd.
* >	3. It cannot define any global variable.
* >	4. Its arguments, result and local variables must not be distributed.
* >	5. It can only reference a global variable if it is *not distributed*.
* 
* I like constraints 1-3 (plus the prohibition of i/o, which John mentions
* later is his proposal).  I don't see the need for contraint 4.  (Does "local
* variable" mean dummy argument?)  Elemental functions operate locally on
* arrays.  The result has the same shape as the arguments.  As long as the
* argument and result arrays all have the same distribution, this seems a
* perfect opportunity for parallel computation on distributed arrays.


Firstly, I agree that I should add the prohibition of I/O to this list of
constraints.  (I had it in my first draft but deleted it, for reasons
of cowardice really---I didn't want to turn the reader off by giving
too many constraints!  My thinking was that allowing I/O would permit
the old-fashioned technique for debugging, but supporting it would really 
complicate the implementation, and I don't think it's worth it!)

My apologies about rule 4 -- I should have written it more clearly
as follows:

	4. Within the procedure body, its dummy arguments, result and 
local variables must not be distributed.

That is, I mean that nothing that is referenced *within the procedure body*
may be distributed---all accessed data must be available locally without 
communication.  This is to avoid race conditions and the need to implement 
virtual shared memory for distributed memory platforms (but see later for 
a partial withdrawal from this stance).

Of course, on the *caller's* side these procedures may be invoked with 
distributed actual arguments---indeed this is a perfect opportunity for 
parallel computation as you say, and even for MIMD computation.  
In fact, I see no need for the actual arguments and result to have the 
*same* distribution---the implementation should be such that the caller 
automatically redistributes the actual arguments into temporary arrays 
if necessary, and passes in these temporaries, so that at the point of 
call corresponding elements of each argument reside on the same processor.  
Likewise, the dummy result can be re-distributed if necessary into the 
actual result array or section.  This implementation would require no 
extra mechanisms beyond those already required in HPF for other purposes.


* Constraint 5 applies no more strongly to elemental functions than to ordinary
* ones.  A computation will be more efficient when a function executes on the
* processor whose fastest-access memory contains the data the function refers
* to.  When the data resides elsewhere, the computation takes a performance hit.
* Programmers wishing to avoid this will express themselves in some other way.


I would argue that condition 5 is *required* for elemental procedures
in HPF, otherwise the language can only be implemented on platforms with 
(virtual) shared memory, which I believe is contrary to its main objective.
I'll say more about this later, in reply to Jim Cownie's message on the 
same subject.

It isn't required for 'ordinary' procedures, whose data communications are
handled by the normal mechanisms that must be used to implement HPF.

To summarise, the essential difference between an 'ordinary' array-valued 
function and an 'elemental' one is as follows.  In the HPF program, 
a single textual occurrence of an 'ordinary' array function corresponds 
to a single identical call in all processors, which executes the same 
code in SPMD mode.  Therefore, if a data communication is required, 
both sending and receiving processes are aware of this and can cooperate.  
By contrast, an elemental function is invoked independentally for each 
element of its arguments.  Thus, a single textual occurrence of 
an elemental function in an array expression may be equivalent to several 
calls on each processor (because it stores several elements of the array 
arguments);  the number of calls may depend on the processor 
(because of non-uniform distribution of the selected elements, which 
may come from an array section and may be masked in a WHERE or FORALL); 
and different calls may execute different code  (at least for user-defined 
elemental functions, because of data-dependent branches and loop bounds).

At the considerable risk (or certaintly!) of labouring the point, 
I can show the transformation to sequential Fortran 77 of:

	Y = X + F (X)

in the two cases.  Here, 'X', 'Y' and function 'F()' are array-valued
with dimension 'N'.  If 'F()' is an 'ordinary' array-valued function, 
the transformation is:

	CALL F_SUB (TEMP, X)      ! F_SUB() is a subroutine equivalent to F().
	                          ! TEMP is a temp array of dimension N.
	DO I = LOCAL_LB, LOCAL_UB ! restrict index range to the local segment.
	  Y(I) = X(I) + TEMP(I)
	ENDDO

If 'F()' is elemental, the transformation is:

	DO I = LOCAL_LB, LOCAL_UB
	  Y(I) = X(I) + F (X(I))
	ENDDO


* > These functions can also be used in a FORALL.  Because a 'forall-assignment'
* > may be an 'array-assignment' (in most definitions anyway) the elemental
* > function can have an array result.
* >  ...
* >
* >	REAL  v (3,10,10)
* >	INTERFACE
* >	  ELEMENTAL FUNCTION f (x)
* >	    REAL, DIMENSION(3) :: f, x
* >	  END FUNCTION f
* >	END INTERFACE
* >	...
* >	FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 
* ...
*
* The question of the shape of the object f(v) needs careful attention.
* John Merlin's proposal matches the first subscript of the actual and
* dummy arguments and makes f elemental on the last two subscripts in
* this invocation, so the shape of the result would be [10,10].
* 
* However, other matchings are possible.  Why not match the last subscripts?
* The result shape would then be [3,10]?  The decision seems arbitrary; it
* will be easy for programmers to forget which decision the language takes
* -- a likely source of errors.

Here you're making a case against the generalisation of allowing the 
'elements' of elemental procedures to be compound data objects as well as
scalars.  That's certainly a debatable point.  I proposed this 
generalisation because I found it to be very useful for some applications 
that I'm familiar with, one of which I outlined in the proposal.
In full F90 implementations of HPF, the 'elements' of elemental
procedures could be data structures as well as arrays, so these procedures 
could be applied to arrays of structures as well as arrays of arrays.  
I think this flexibility makes these procedures much more powerful and 
widely applicable.  The disadvantage, as you point out, is that it may 
be confusing for programmers.

However, if HPF were to adopt this generalisation, there's definitley no 
arbitrariness about which indices of an actual argument match those of the 
corresponding dummy argument, and which are elemental.  Viewing the actual 
argument as an array of compound data objects, its leading indices must be 
matched with the dummy argument, as dictated by Fortran's column-major 
ordering.


* > I would propose that only elemental functions (both user-defined and
* > intrinsic) are allowed in FORALL, just as in array expressions.  
* 
* I don't understand what this means.  Array expressions, even in FORALL
* assignment, may refer to non-elemental functions.  John, will you clarify
* this for me?


I don't know what I meant by the clause  '...just as in array expressions'!
I guess it was a mistake and should be deleted!

As for your statement that array expressions in FORALL may refer to
non-elemental functions, I would dispute that---you need the constraints 
I listed to ensure that the function may be evaluated concurrently over 
all elements of the FORALL assignment.


* > Elemental subroutine calls could also be allowed in FORALL, with a very 
* > similar interpretation to elemental functions.  The main reason for using 
* > them would be to allow the return of multiple results.  
* 
* Structure-valued functions provide a way to return multiple results, so
* this doesn't seem to be a strong reason for including subroutine calls
* in FORALL.


But it's a nuisance to have to clump together logically distinct
data objects into a single structure simply to achieve this effect.
Also, only one grouping of data objects can be achieved this way statically
---other groupings must be created by assignment to a structure,
which will be expensive in time and storage.  (Also, early HPF 
implementations may not support structures, though that's not a 
strong argument!).


* > In addition, the implementation of an elemental function array in an
* > array expression is likely to be more efficient than that of an 
* > equivalent array function.  One reason is that it requires less temporary
* > storage for the result ... .  Another is that it saves on looping ... .
* > ... the elemental function can be invoked elementally in situ within
* > the expression.
* 
* True, the ELEMENTAL designation provides more specific information to the
* processor.  It raises the level of communication possible in the language.
* This is a good argument for adding syntax.  It sways me, for one, but not
* quite enough to support the idea.
* 
*   -  Rex Page

I'm glad you're partly swayed :-))


James Cownie <jim@uk.co.meiko Mon Jun 29 17:29:08 1992> writes
(In-Reply-To: "Rex Page"'s message of Mon, 29 Jun 92 09:33:40 -0500):

* > Constraint 5 applies no more strongly to elemental functions than to ordinary
* > ones.  A computation will be more efficient when a function executes on the
* > processor whose fastest-access memory contains the data the function refers
* > to.  When the data resides elsewhere, the computation takes a performance hit.
* > Programmers wishing to avoid this will express themselves in some other way.
* 
* Quite right !
* 
* I think John is trying to achieve two separate effects here (since
* he's away I think I can say this without fear of immediate
* contradiction !)
* 
* 1) To insist that the functions are "mathematical" in that they always
* produce the same results when called with the same arguments, and have
* no side effects. Rules 1,2,3 and "no I/O" are intended to achieve this.
* ...
* 2) To ensure that all data references made by the function can be
* achieved locally. (Rules four and five are trying to achieve this)
* 
* Point 1) is important if we're not throwing away a semantic
* specification of what happens inside the FORALL.
* 
* Point 2) is "solely an implementation issue" :-). John wants it
* because he wants a run-time which doesn't require demand driven
* communication. It is easy to construct elemental functions which
* access non-local data.  (Ignoring John's rules 4 and 5 for now)
* e.g.
* 	REAL ELEMENTAL FUNCTION REMOTE (X)
* 	REAL X
* 	REAL Y(1000)
* 	COMMON /FOO/Y   ! Assuming Y can be distributed, add
* 			! appropriate MODULE if required 
* CHPF$   TEMPLATE T(1000)
* CHPF$   ALIGN Y WITH T
* CHPF$   PROCESSORS P(10)
* CHPF$   DISTRIBUTE T BLOCK ONTO P
* 
*  	REMOTE = Y(INT(ABS(X))
* 
* 	END
* 
* C Some call of this
* 	REAL Z(1000),BAH(1000)	
* 	FORALL(I = 1:1000)
* 	   BAH(I) = REMOTE(Z(I))
* 	ENDFORALL
* 
* This function is definitely elemental, but will also require an
* arbitrary communication pattern (which depends on the values in the
* actuals), when the 1000 instances are called on the 10 processors. In
* particular there is no knowledge local to any particular processor
* that its values of Y will be required by any instance of the
* subroutine. To implement this (in parallel) one needs demand
* driven communication, so either a send receive model with a remote
* store access daemon, or a direct (unsynchronised) remote read, remote
* write model.


Of course you're perfectly right---constraints 4 and 5 are designed to 
avoid data communication within the body of elemental procedures, to
allow them to be implemented on distributed memory message-passing 
architectures without the need for virtual shared memory (which you 
call 'demand driven communication').

I believe it's crucial that HPF *doesn't* require virtual shared memory
for its implementation---indeed, I thought that was a principal objective
of the exercise.  Providing VSM would be very expensive on some
platforms, and if you have it it's not clear that HPF is the appropriate 
programming or execution model: why not use PCF or some BSP-type model?

Therefore, I would say it's more than just an implementation issue; 
rather, it's a question of adhering to (what I believe to be) the 
underlying execution model and purpose of HPF, which is that it's 
implementable on DM message-passing machines, where both sender 
and receiver co-operate in communications, without shared memory
emulation.


* It is actually unclear that there is any easy way of making the
* restrictions John wants visible in hpf, since "every array is create
* with an alignment to some template, and every template is created
* with some distribution onto some arrangement of processors."
* Therefore my example would stand EVEN without any explicit mapping
* statements for the global data.


That's a good point!  According to the current HPF proposal, the 
lack of ALIGN and DISTRIBUTE directives for a data object doesn't 
necessarily mean it's undistributed---it's implementation dependent.  
Perhaps that doesn't matter, provided that the absence of mapping 
directives can be interpreted, on architecutures without shared memory
support, to mean that the data is undistributed (i.e. replicated or 
private).

I must confess that when I wrote this proposal I was only thinking
of distributed memory platforms.  Now you've raised the point, 
it seems unnecessarily restrictive to forbid non-local data accesses
on all platforms---they probably cause no problems on SM machines.  
These considerations suggest that perhaps constraints 4 and 5 should 
be changed to the following:

	4. Within the procedure body, its dummy arguments, result and 
local variables must not be subject to data mapping directives.
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

	5. It can only reference a global variable if it is not 
subject to data mapping directives.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(i.e. replacing the word 'distributed' by the underscored words).
That way we avoid restrictions that are inappropriate on some 
architectures, while allowing the implementation to impose them 
when necessary.  Any comments?

(Incidentally, I'm not really happy with the number of different
meanings attached to the absence of mappings directives.  For 
dummy arguments they seem to mean 'inherited mapping' -- except, 
if this proposal is adopted, in the case of elemental procedures;
in other contexts they mean implementation-dependent mapping.)


James Cownie <jim@uk.co.meiko Tue Jun 30 11:38:27 1992>) writes:

* Content
* =======
* The local arrays for elemental functions should certainly be allocated
* resident, however the problem also arises with the global data (see
* example in previous mail).
* 
* A solution which would meet John's objective (but is not permissible
* under his restrictions) would be to insist that the global data
* accessed by the elemental function was replicated, thus ensuring that
* it is resident on all processors.
* 
* (maybe resident isn't so poor !)
* 
* --Jim
* James Cownie 

Exactly --- I was thinking that the global data accessed by elemental 
procedures must be replicated (at least on the architectures I'm 
considering).  Why isn't that permissible under my restrictions?


Rex Page <zrlp09@com.amoco.trc Tue Jun 30 14:06:43 1992> writes
(in reply to Jim Cownie):
* > Content
* > =======
* > ...
* > A solution which would meet John's objective (but is not permissible
* > under his restrictions) would be to insist that the global data
* > accessed by the elemental function was replicated, thus ensuring that
* > it is resident on all processors.
* 
* If the demand-driven-data-access-problem is bad,
* the replicated-data-coherence-problem is worse.
* I hope we don't try to tackle that one this go-round.
* 
* How about requiring that global data accessed by elemental functions
* be constant data (identifiers defined in PARAMETER statements,
* made accessible through MODULE USE)?  That would circumvent the
* coherence problem.
* 
*    - Rex Page

As Jim Cownie subsequently replied, there's no coherence problem for 
read-only accesses; and for updates data coherence has to be ensured anyway.

I think it's probably too restrictive to forbid access to global data.
Certainly accessing constants causes no problems, but that's quite 
different from the global data issue.

                   Best regards,
                        John Merlin.

From chk@cs.rice.edu  Wed Jul 15 07:55:28 1992
Received: from  by cs.rice.edu (AB23701); Wed, 15 Jul 92 07:55:28 CDT
Message-Id: <9207151255.AB23701@cs.rice.edu>
Date: Wed, 15 Jul 1992 08:14:48 -0600
To: hpff-distribute, hpff-forall
From: chk@cs.rice.edu
Subject: Thoughts from Europe


While at the 3rd Wrkshop on Compiling for Parallel Computers in Vienna, 
several of us took advantage of the opportunity for a face-to-face 
meeting of European and American HPF "enthusiasts".  This note includes 
some of the questions, concerns, and other issues raised there; since no 
firm conclusions were reached, I didn't include any of them.  I haven't 
caught up on the HPFF mail yet, so some of this may have already been 
discussed.

Meeting participants:
Chuck Koelbel
David Loveman
Hans Zima
Barbara Chapman
Piyush Mehrotra
Henk Sips
Tom Lake
John Merlin
Rob Schreiber

The meeting started with a rundown from Henk of the last HPFF-Europe 
workshop.  They had decided to focus on the issues that were not active 
in the US group at the moment, including MIMD support, irregular 
distributions, and parallel I/O.  The intent was to make the groups 
complementary rather than competitive; the Americans were very supportive 
of this approach.

Clemens Thle is starting an electronic discussion group on the topic of 
MIMD support.  Several Americans indicated they would be interested in 
joining this discussion.  It was clear to all that these features would 
be important in the future, but could not be finalized in time for the 
January HPF draft.

There is apparently some discussion in Europe regarding parallel I/O; the 
US group is stagnant.  There is some hope that discussion can be revived 
here.  Hopefully the discussions will not diverge.

Hans Zima reopened the question of the TEMPLATE construct and whether it 
is needed in HPF.  Some discussion ensued about the complexity of 
distributions with and without TEMPLATEs (they are the same) and 
distributions over subsets of the processor array or replicated 
distributions.  TEMPLATE is still controversial in Europe, but it is now 
accepted into HPF.

The global naming issue was raised again.  Without F90 modules, global 
naming produces too many suprises.  The group at this meeting agreed to 
keep local names; now if only the rest of the discussion groups can agree 
to this...

Subroutine interfaces were a hot topic.  The group recognized three 
features they wanted:
	Redistribute any actual to match the dummy's distribution
	Declare (assert) the actual to have the dummy's distribution
	Declare the dummy to inherit the actual's distribution
The first two are possible in HPF as it stands, the last is not.  Some of 
the meeting attendees thought this should be the default behavior (i.e. 
when there are no distributions), but this is not practical on some 
machines.  We recommend that the proposal be expanded to allow this 
possibility.

Some discussion centered on allocated arrays.  The current proposal talks 
about all arrays being "created with an alignment"; it is unclear whether 
arrays are created when they are declared or when they are allocated.  
Either choice causes some problems.  Clarification is sought.

There needs to be some defined relation between multiple processor 
arrays.  This is most useful for arrays of different dimensions but with 
the same number of elements, and may be covered by the proposals already 
circulating around the group.

More control over partitioning of computation and load balancing is 
needed.  Some form of the ON clause may be enough, maybe not.  Local 
subroutines may also provide this.  Some question how this is related to 
MIMD support.

OK, that should start the discussions rolling (as if they needed much help).

	Chuck


From zrlp09@trc.amoco.com  Wed Jul 15 10:58:43 1992
Received: from noc.msc.edu by cs.rice.edu (AA27475); Wed, 15 Jul 92 10:58:43 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA22474; Wed, 15 Jul 92 10:58:41 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA07818; Wed, 15 Jul 92 10:58:37 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA04929; Wed, 15 Jul 92 10:58:33 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA03516; Wed, 15 Jul 92 10:58:30 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA18975; Wed, 15 Jul 92 10:58:29 CDT
Message-Id: <9207151558.AA18975@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: User-defined elemental procedures
         (Page reply to Merlin response)
Date: Wed, 15 Jul 92 10:58:28 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>


In my comments on John's proposal for elemental procedures, I claimed
that programmers could get the effect of elemental procedures through
overloading.  I should have said "one of the effects."  What I had
in mind was that a programmer writing a function could use overloading
to make it possible for invocations of the function to operate on
either scalars or arrays.  People using the function could invoke
it, then, as if it were an elemental function.  

John pointed out that the version of such a function that operates
on array arguments (and delivers an array value) will probably be
viewed by the compiler in the same way as other array-valued functions.
That is, the compiler probably won't view the invocation as a
collection of parallel invocations unless it does a tremendous amount
of inter-procedural analysis.  Yet that view that would be obvious if
the invocation were elemental.  The point here is that the elemental
facility allows programmers to add precision to their specifications,
that is to raise their level of communication.

I agree that this is a powerful argument, and I was opposed to
removing elementals from Fortran 90.  Nevertheless they were removed.
Putting them back in raises the ante for compiler developers, which
is a trade-off HPFF must consider if it is important for people to be
able to run their code on both sequential workstations and high
performance systems.

The last version of Fortran 8x to contain a facility for user-defined
elemental functions did so by defining elemental invocations.  The
language included no syntax for designating a function as elemental.
It required the interface to be visible, making it possible for the
compiler to recognize an elemental invocation by matching arrays in
the actual argument list with scalars in corresponding positions of
the dummy argument list.

If we add elementals to HPF using the F8x approach, then we will be
adding semantics to Fortran 90, but not syntax.  This simplifies the
plight of someone trying to run an HPF program on a Fortran 90 system.
People lacking an HPF compiler can link in a collection of overloads
to simulate HPF elemental functions; they won't need to change any of
the invoking code.

On all of the above points, I think John agree on the facts.  We differ
a little on potential benefits, but not much.  I like elemental functions
as a mechanism for SPMD programming.

The following are some points of disagreement:

Page:
*    REAL v(3,10,10)
*    INTERFACE
*      ELEMENTAL FUNCTION f(x)
*        REAL, DIMENSION(3) :: f,x
*      END FUNCTION f
*    END INTERFACE
* ...
* The question of the shape of the object f(v) needs careful attention.
* John Merlin's proposal matches the first subscript of the actual and
* dummy arguments and makes f elemental on the last two subscripts in
* this invocation, so the shape of the result would be [10,10].
* 
* However, other matchings are possible.  Why not match the last subscripts?
* The result shape would then be [3,10]?  The decision seems arbitrary; it
* will be easy for programmers to forget which decision the language takes
* -- a likely source of errors.

Merlin's reply:
> Here you're making a case against the generalisation of allowing the 
> 'elements' of elemental procedures to be compound data objects as well as
> scalars.
> ...
> In full F90 implementations of HPF, the 'elements' of elemental
> procedures could be data structures as well as arrays, so these procedures 
> could be applied to arrays of structures as well as arrays of arrays.  

No, I'm not opposed to compound elements.  As John points out, Fortran 90
already permits this, and I think it is an essential concept.  It is
essentially the same idea as structure-valued expressions, without which
Fortran 90 programmers would be forced back to the ways of the ancients. 

An array of arrays is a bit clumsy in Fortran 90, since it must be
expressed as an array of structures (the structure has only one component,
which is an array).  But the facility has important uses, as John pointed
out.

Merlin's reply, continued:
> However, if HPF were to adopt this generalisation, there's definitley no 
> arbitrariness about which indices of an actual argument match those of the 
> corresponding dummy argument, and which are elemental.  Viewing the actual 
> argument as an array of compound data objects, its leading indices must be 
> matched with the dummy argument, as dictated by Fortran's column-major 
> ordering.

Fortran 90 defines an array element ordering only so that a whole array
in an i/o list will imply a particular sequence of i/o elements.  F90
uses this notion in a storage association sense only where it needs this
concept for compatibility with Fortran 77.  When one assumes that the
leading indices are the ones that must match, one elevates the concept
of array element ordering and drifts back towards the storage association
regime.  HPF could define a particular matching, such as first indices,
but should not try to justify the idea through appeals to array element
ordering.

I tried to sell the idea of arrays of arrays (without going through
structures) several times to the Fortran 8x committee, but came up short.
One of the arguments against it was the arbitrary choice of indices to form
elements of a lower-rank array from one of higher rank.  The array-of-arrays
concept would be a major addition to the language.  The issue is whether
it adds enough expressiveness to make it worthwhile.

Merlin:
* > I would propose that only elemental functions (both user-defined and
* > intrinsic) are allowed in FORALL, just as in array expressions.  

Page's reply:
* I don't understand what this means.  Array expressions, even in FORALL
* assignment, may refer to non-elemental functions.  John, will you clarify
* this for me?

Part of Merlin's clarification:
> I would dispute [your statement that array expressions in FORALL may
> refer to non-elemental functions]---you need the constraints 
> I listed to ensure that the function may be evaluated concurrently over 
> all elements of the FORALL assignment.

Fortran 90 (and 77 and 66) includes constraints against side-effects that
make the value of an expression depend on the order of evaluation of
subexpressions.  I would like HPF to expand these contraints to permit
concurrent evaluation of subexpressions.  (I think that John's constraints
are adequate and that they add to Fortran's existing constraints only the
requirement that functions within the same statement must avoid sharing
temporary resources.)  Then forall statements could call non-elemental
functions with impunity, expanding their usefulness for SPMD programming.

If we add these constraints under all circumstances, we improve the
language, but invalidate certain standard-conforming programs.  (I think
the improvement would be worth the cost in this case, but I don't expect
to convince many others.)  We will need to add the constraint in the case
of FORALL regardless of the decision on elemental functions.  Otherwise,
forall won't be a parallel construct.

(Actually, if we define forall to permit parallelism, then we inherit the
constraints we need from Fortran's prohibition of side-effects in functions
when they may change the meaning of an assignment statement.)

Merlin:
* > Elemental subroutine calls could also be allowed in FORALL, with a very 
* > similar interpretation to elemental functions.  The main reason for
* > using them would be to allow the return of multiple results.  

Page's reply: 
* Structure-valued functions provide a way to return multiple results, so
* this doesn't seem to be a strong reason for including subroutine calls
* in FORALL.

Merlin's response:
> But it's a nuisance to have to clump together logically distinct
> data objects into a single structure simply to achieve this effect.
> Also, only one grouping of data objects can be achieved this way statically
> ---other groupings must be created by assignment to a structure,
> which will be expensive in time and storage.  (Also, early HPF 
> implementations may not support structures, though that's not a 
> strong argument!).

If a subroutine is returning multiple data objects that are logically
distinct, then the subroutine is ill-conceived.  If it is worth defining
a subroutine, it is worth defining a datatype corresponding to the
result it delivers.

The Fortran 8x version of elementals permitted elemental subroutines
only for subroutines used to overload the assignment operation.  I'd
like to restrict the elemental notion to functions only.  Subroutines
(unlike functions) are an outmoded concept in Fortran 90.
Accomplished programmers will not use them -- except to package i/o
or other side-effects, or possibly for efficiency reasons to cope with
poor compilers.

 - Rex Page

From jim@meiko.co.uk  Thu Jul 16 05:17:30 1992
Received: from marge.meiko.com by cs.rice.edu (AA04789); Thu, 16 Jul 92 05:17:30 CDT
Received: from hub.meiko.co.uk by marge.meiko.com (4.1/SMI-4.1)
	id AA08822; Thu, 16 Jul 92 06:13:32 EDT
Received: from spica.co.uk (spica.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA01896; Thu, 16 Jul 92 11:14:47 BST
Date: Thu, 16 Jul 92 11:14:47 BST
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9207161014.AA01896@hub.meiko.co.uk>
Received: by spica.co.uk (4.1/SMI-4.1)
	id AA18836; Thu, 16 Jul 92 11:14:44 BST
To: jhm@ecs.soton.ac.uk
Cc: hpff-forall@cs.rice.edu
In-Reply-To: John Merlin's message of Tue, 14 Jul 92 18:14:02 BST <257.9207141714@bacchus.ecs.soton.ac.uk>
Subject: User-defined elemental procedures

> * A solution which would meet John's objective (but is not permissible
> * under his restrictions) would be to insist that the global data
> * accessed by the elemental function was replicated, thus ensuring that
> * it is resident on all processors.
> * 
> * (maybe resident isn't so poor !)
> * 
> * --Jim
> * James Cownie 
> 
> Exactly --- I was thinking that the global data accessed by elemental 
> procedures must be replicated (at least on the architectures I'm 
> considering).  Why isn't that permissible under my restrictions?

Because I need to use mapping directives on the global data objects to
ensure that they are replicated. But your restriction number 5 :

> 	5. It can only reference a global variable if it is not 
> subject to data mapping directives.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

explicitly forbids me from doing this...

-- Jim
James Cownie 
Meiko Limited
650 Aztec West
Bristol BS12 4SD
England

Phone : +44 454 616171
FAX   : +44 454 618188
E-Mail: jim@meiko.co.uk or jim@meiko.com


From jim@meiko.co.uk  Thu Jul 16 05:55:25 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA04976); Thu, 16 Jul 92 05:55:25 CDT
Received: from marge.meiko.com by erato.cs.rice.edu (AA16376); Thu, 16 Jul 92 05:55:21 CDT
Received: from hub.meiko.co.uk by marge.meiko.com (4.1/SMI-4.1)
	id AA09322; Thu, 16 Jul 92 06:51:26 EDT
Received: from spica.co.uk (spica.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA02075; Thu, 16 Jul 92 11:52:40 BST
Date: Thu, 16 Jul 92 11:52:40 BST
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9207161052.AA02075@hub.meiko.co.uk>
Received: by spica.co.uk (4.1/SMI-4.1)
	id AA21658; Thu, 16 Jul 92 11:52:36 BST
To: karp@hplms2.hpl.hp.com
Cc: hpff-forall@erato.cs.rice.edu
In-Reply-To: Alan Karp's message of Tue, 14 Jul 92 09:05:04 -0700 <9207141605.AA02037@hplahk.hpl.hp.com>
Subject: Back to square one

Alan Karp argues strongly against introducing FORALL on the grounds of
the semantic complications it introduces into the language, and in
particular the way it modifies the semantics of embedded statements.

I would also argue against its inclusion in HPF because it is the ONLY
place where HPF is going to add syntax to Fortran. All of the other
HPF additions are comment annotations of (potentially) standard
conforming source, and leave the source still as standard conforming
as it was before they were added. We really should think (and think
again) before taking this step, since we're about to encourage people
to write non-standard code, and also make valid HPF programs invalid
Fortran. (Are we even allowed to call it Fortran after doing this ?)

(I know there was a straw poll still in favour of it at the last
meeting, after Joel raised exactly this point, but I think a lot of
people are still nervous about it...)

I don't find the argument that 
"Lots of us already have an implementation" 
a convincing one for forcing everyone else to have one too. 

HPF is not about to outlaw other extensions to Fortran (this is
clearly impossible !), so if people already have FORALL in their code
they can continue to use it on the machines which support it.
(Presumably these users knew what they were getting into when they started to
use it !). 

At the very least we should NOT add FORALL to the HPF core standard. (I
have no objection to putting it in a subsidiary document of extensions
along the lines of "If you do this do it this way", so that at least
everyone does the same thing with FORALL, David Loveman's document on
FORALL seems a fine place to start for this).

Please can we think long and hard about this.

--Jim
James Cownie 
Meiko Limited
650 Aztec West
Bristol BS12 4SD
England

Phone : +44 454 616171
FAX   : +44 454 618188
E-Mail: jim@meiko.co.uk or jim@meiko.com


From shapiro@think.com  Thu Jul 16 08:49:34 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA06526); Thu, 16 Jul 92 08:49:34 CDT
Received: from mail.think.com by erato.cs.rice.edu (AA16418); Thu, 16 Jul 92 08:49:31 CDT
Return-Path: <shapiro@Think.COM>
Received: from Django.Think.COM by mail.think.com; Thu, 16 Jul 92 09:49:16 -0400
From: Richard Shapiro <shapiro@think.com>
Received: by django.think.com (4.1/Think-1.2)
	id AA20320; Thu, 16 Jul 92 09:49:15 EDT
Date: Thu, 16 Jul 92 09:49:15 EDT
Message-Id: <9207161349.AA20320@django.think.com>
To: jim@meiko.co.uk
Cc: karp@hplms2.hpl.hp.com, hpff-forall@erato.cs.rice.edu
In-Reply-To: James Cownie's message of Thu, 16 Jul 92 11:52:40 BST <9207161052.AA02075@hub.meiko.co.uk>
Subject: Back to square one? I hope not...

   Date: Thu, 16 Jul 92 11:52:40 BST
   From: jim@meiko.co.uk (James Cownie)

   Alan Karp argues strongly against introducing FORALL on the grounds of
   the semantic complications it introduces into the language, and in
   particular the way it modifies the semantics of embedded statements.


FORALL gives us something that we don't get with a DO loop--order
independent "loops". I claim that this cannot be obtained with a simple DO
loop with directives, because such a directive breaks a fundamental rule
about directives not changing program semantics. For example, consider the
following 3 programs which produce a pseudo-random permutation of an
array:

C
C     Permute an array semi-randomly with FORALL
C
      SUBROUTINE SHUFFLE(DECK,NCARDS,SEED1,SEED2)
      INTEGER DECK(NCARDS),NCARDS,SEED1,SEED2
C
C     Make sure we will do a permutation by asserting that SEED1
C     and NCARDS are realtively prime
C
      CALL ASSERT_RELATIVELY_PRIME(NCARDS,SEED1)
      FORALL(I=1:NCARDS) DECK(I) = DECK(MOD(SEED2+SEED1*I,NCARDS)+1)
      RETURN
      END

This example uses a FORALL to achieve the shuffle. (I am aware that the
same program could be re-written using a vector-valued subscript, but I'm
trying to keep the example as simple as possible. If the arrays involved
were multidimensional, VVS's wouldn't suffice).  The code is short and causes
the user to see no temporary arrays.  A translation into a correct DO loop
looks like:

C
C     Permute an array semi-randomly with DO, correctly
C
      SUBROUTINE SHUFFLE(DECK,NCARDS,SEED1,SEED2)
      INTEGER DECK(NCARDS),NCARDS,SEED1,SEED2,TEMP(NCARDS)
C
C     Make sure we will do a permutation by asserting that SEED1
C     and NCARDS are realtively prime
C
      CALL ASSERT_RELATIVELY_PRIME(NCARDS,SEED1)
      DO I = 1,NCARDS
         TEMP(I) = DECK(MOD(SEED2+SEED1*I,NCARDS)+1)
      ENDDO
      DECK = TEMP
      RETURN
      END

Note the addition of an additional variable TEMP.  Finally, here is an
example with a "directive".

C
C     Permute an array semi-randomly with DO, and directive
C
      SUBROUTINE SHUFFLE(DECK,NCARDS,SEED1,SEED2)
      INTEGER DECK(NCARDS),NCARDS,SEED1,SEED2
C
C     Make sure we will do a permutation by asserting that SEED1
C     and NCARDS are realtively prime
C
      CALL ASSERT_RELATIVELY_PRIME(NCARDS,SEED1)

CHPF$ ALL_ITERATIONS_SIMULTANEOUS
      DO I = 1,NCARDS
         DECK(I) = DECK(MOD(SEED2+SEED1*I,NCARDS)+1)
      ENDDO
      RETURN
      END


Essentially, this is a directive which specifies that I want each iteration
of the loop to happen "at the same time" (someone has already invented a
term for this, no doubt).  In other words, each iteration of the loop must
occur as if no other iteration has yet occurred.  But now if I take my
correct code (with the directive) and remove the directive, I get different
answers.  

The FORALL statement adds a *semantics* which users have found very useful
in practical codes.  Of course, the same semantics can be achieved using DO
loops and other language features in an appropriate manner (in F90, *not*
in F77), but so what?  If we take this tack, why do we need the array
syntax? Or, to be very silly, why do we need DO at all?

I agree that we need to be *extremely* careful on multi-statement FORALL
(given the many possible meanings), but single-statement FORALL provides a
functionality which is very clumsily obtained any other way.

If anyone really wants an example (from a real applications code) of an
FORALL statement which cannot be re-written in any other F90 array-like
manner without the translation similar to that described in David Loveman's
document, I will be happy to provide one.

		Richard Shapiro,
		Thinking Machines Corporation
		(formerly of United Technologies)
		(shapiro@think.com)

From jim@meiko.co.uk  Thu Jul 16 09:43:56 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA10572); Thu, 16 Jul 92 09:43:56 CDT
Received: from marge.meiko.com by erato.cs.rice.edu (AA16434); Thu, 16 Jul 92 09:43:53 CDT
Received: from hub.meiko.co.uk by marge.meiko.com (4.1/SMI-4.1)
	id AA11932; Thu, 16 Jul 92 10:39:43 EDT
Received: from spica.co.uk (spica.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA00520; Thu, 16 Jul 92 15:40:55 BST
Date: Thu, 16 Jul 92 15:40:55 BST
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9207161440.AA00520@hub.meiko.co.uk>
Received: by spica.co.uk (4.1/SMI-4.1)
	id AA24200; Thu, 16 Jul 92 15:40:52 BST
To: shapiro@think.com
Cc: karp@hplms2.hpl.hp.com, hpff-forall@erato.cs.rice.edu
In-Reply-To: Richard Shapiro's message of Thu, 16 Jul 92 09:49:15 EDT <9207161349.AA20320@django.think.com>
Subject: Back to square one? I hope not...

> FORALL gives us something that we don't get with a DO loop--order
> independent "loops". I claim that this cannot be obtained with a simple DO
> loop with directives, because such a directive breaks a fundamental rule
> about directives not changing program semantics. 

I disagree (what a surprise !).

A directive provides information to the compiler. The compiler can
ignore the information if it feels fit, or generate a warning if it
can show that the information in the directive is false.

Your third example :
C     Permute an array semi-randomly with DO, and directive
C
      SUBROUTINE SHUFFLE(DECK,NCARDS,SEED1,SEED2)
      INTEGER DECK(NCARDS),NCARDS,SEED1,SEED2
C
C     Make sure we will do a permutation by asserting that SEED1
C     and NCARDS are realtively prime
C
      CALL ASSERT_RELATIVELY_PRIME(NCARDS,SEED1)

CHPF$ ALL_ITERATIONS_SIMULTANEOUS
      DO I = 1,NCARDS
         DECK(I) = DECK(MOD(SEED2+SEED1*I,NCARDS)+1)
      ENDDO
      RETURN
      END
is a well defined Fortran program, whether or not the directive is
present. The fact that it doesn't do what you want is neither
here nor there, and the directive doesn't (in my mind) make it do what
you want. (Either the directive is ignored, or it is ignored and you
get a warning).

If you want to achieve the permutation you should write it like
your second example :
C
C     Permute an array semi-randomly with DO, correctly
C
      SUBROUTINE SHUFFLE(DECK,NCARDS,SEED1,SEED2)
      INTEGER DECK(NCARDS),NCARDS,SEED1,SEED2,TEMP(NCARDS)
C
C     Make sure we will do a permutation by asserting that SEED1
C     and NCARDS are realtively prime
C
      CALL ASSERT_RELATIVELY_PRIME(NCARDS,SEED1)
      DO I = 1,NCARDS
         TEMP(I) = DECK(MOD(SEED2+SEED1*I,NCARDS)+1)
      ENDDO
      DECK = TEMP
      RETURN
      END

Surely then adding a
CHPF$ INDPENDENT 
above the loop expresses exactly what you want (or have I
misunderstood the meaning of this directive ?)

In summary : 
1) Directives NEVER change the semantics.
2) If you assert that the loop iterations have no data dependence,
   then you must write your code so that this is true.

--Jim
James Cownie 
Meiko Limited
650 Aztec West
Bristol BS12 4SD
England

Phone : +44 454 616171
FAX   : +44 454 618188
E-Mail: jim@meiko.co.uk or jim@meiko.com


From shapiro@think.com  Thu Jul 16 10:02:33 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA11275); Thu, 16 Jul 92 10:02:33 CDT
Received: from mail.think.com by erato.cs.rice.edu (AA16441); Thu, 16 Jul 92 10:02:30 CDT
Return-Path: <shapiro@Think.COM>
Received: from Django.Think.COM by mail.think.com; Thu, 16 Jul 92 11:02:13 -0400
From: Richard Shapiro <shapiro@think.com>
Received: by django.think.com (4.1/Think-1.2)
	id AA20968; Thu, 16 Jul 92 11:02:12 EDT
Date: Thu, 16 Jul 92 11:02:12 EDT
Message-Id: <9207161502.AA20968@django.think.com>
To: jim@meiko.co.uk
Cc: karp@hplms2.hpl.hp.com, hpff-forall@erato.cs.rice.edu
In-Reply-To: James Cownie's message of Thu, 16 Jul 92 15:40:55 BST <9207161440.AA00520@hub.meiko.co.uk>
Subject: Back to square one? I hope not...

   Date: Thu, 16 Jul 92 15:40:55 BST
   From: jim@meiko.co.uk (James Cownie)

   > FORALL gives us something that we don't get with a DO loop--order
   > independent "loops". I claim that this cannot be obtained with a simple DO
   > loop with directives, because such a directive breaks a fundamental rule
   > about directives not changing program semantics. 

   I disagree (what a surprise !).

Actually, we probably don't disagree. You made the point that I was trying
to make; if I want to get the semantics of a FORALL, I can't get it with a
single DO loop + directives.  I am arguing that we *want* the semantics of
the FORALL, because it can expresses what I am trying to say in my
algorithm more directly.

Note that I also agree with Alan Karp that the DO loop in his figure 4 is
much more readable than his figure 3 Fortran 90.  This is not an argument
against FORALL, but rather an argument against compilers which don't
paralellize DO loops. FORALL is as useful on an IBM PC as it is on a
Connection Machine.


		Richard Shapiro,
		Thinking Machines Corporation
		(shapiro@think.com)

From jim@meiko.co.uk  Thu Jul 16 10:43:36 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA12892); Thu, 16 Jul 92 10:43:36 CDT
Received: from marge.meiko.com by erato.cs.rice.edu (AA16454); Thu, 16 Jul 92 10:43:31 CDT
Received: from hub.meiko.co.uk by marge.meiko.com (4.1/SMI-4.1)
	id AA12430; Thu, 16 Jul 92 11:39:32 EDT
Received: from spica.co.uk (spica.meiko.co.uk) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA00598; Thu, 16 Jul 92 16:40:36 BST
Date: Thu, 16 Jul 92 16:40:35 BST
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9207161540.AA00598@hub.meiko.co.uk>
Received: by spica.co.uk (4.1/SMI-4.1)
	id AA26922; Thu, 16 Jul 92 16:40:33 BST
To: shapiro@think.com
Cc: karp@hplms2.hpl.hp.com, hpff-forall@erato.cs.rice.edu
In-Reply-To: Richard Shapiro's message of Thu, 16 Jul 92 11:02:12 EDT <9207161502.AA20968@django.think.com>
Subject: Back to square one? I hope not...

> Actually, we probably don't disagree. 
Good !
> You made the point that I was trying
> to make; if I want to get the semantics of a FORALL, I can't get it with a
> single DO loop + directives.  
... without explicitly adding the temporary arrays.

> I am arguing that we *want* the semantics of
> the FORALL, because it can expresses what I am trying to say in my
> algorithm more directly.
True, but then maybe APL could express it even better, or a language
which already had
	SOLVE_RICHES_PROBLEM 
as an intrinsic, so where do we stop ???

Stopping at the point of adding NOTHING to the base language is very
clean, clear, and simple. The statements "Any F90 program is an HPF
program", and "Any HPF program is an F90 program" are extremely
attractive ones to be able to make.

The attraction of FORALL seems to be that various people already have
implementations of it, but that's a very poor reason for putting it
in.

> Note that I also agree with Alan Karp that the DO loop in his figure 4 is
> much more readable than his figure 3 Fortran 90.  This is not an argument
> against FORALL, but rather an argument against compilers which don't
> paralellize DO loops. 
or against writing obscure code, or against "using the full power of
the language" :-)

> FORALL is as useful on an IBM PC as it is on a
> Connection Machine.
Sure, but this is saying that it should have been in F90. Since it
isn't (whatever the rights and wrongs of that maybe), it seems wrong
to me to make this the one place where we extend the base language,
and thus make a "standards conforming HPF" program NOT be a "standards
conforming F90" program.

--Jim
James Cownie 
Meiko Limited
650 Aztec West
Bristol BS12 4SD
England

Phone : +44 454 616171
FAX   : +44 454 618188
E-Mail: jim@meiko.co.uk or jim@meiko.com


From joelw@mozart.convex.com  Thu Jul 16 13:27:18 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA18285); Thu, 16 Jul 92 13:27:18 CDT
Received: from convex.convex.com by erato.cs.rice.edu (AA16509); Thu, 16 Jul 92 13:27:13 CDT
Received: from mozart.convex.com by convex.convex.com (5.64/1.35)
	id AA04011; Thu, 16 Jul 92 13:27:08 -0500
Received: by mozart.convex.com (5.64/1.28)
	id AA00595; Thu, 16 Jul 92 13:27:07 -0500
From: joelw@mozart.convex.com (Joel Williamson)
Message-Id: <9207161827.AA00595@mozart.convex.com>
Subject: Re: Back to square one? I hope not...
To: jim@meiko.co.uk (James Cownie)
Date: Thu, 16 Jul 92 13:27:06 CDT
Cc: shapiro@think.com, karp@hplms2.hpl.hp.com, hpff-forall@erato.cs.rice.edu
In-Reply-To: <9207161540.AA00598@hub.meiko.co.uk>; from "James Cownie" at Jul 16, 92 4:40 pm
X-Mailer: ELM [version 2.3 PL11]

James Cownie writes:
> 
> 
> Stopping at the point of adding NOTHING to the base language is very
> clean, clear, and simple. The statements "Any F90 program is an HPF
> program", and "Any HPF program is an F90 program" are extremely
> attractive ones to be able to make.
> 

This is a very powerful argument.  Furthermore, SISD vendors have almost
no incentive to include FORALL in an otherwise compliant F90 compiler,
which will reduce portability of HPF programs.

Joel Williamson

From @eros.uknet.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Thu Jul 16 14:45:39 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA20889); Thu, 16 Jul 92 14:45:39 CDT
Via: uk.ac.uknet-relay; Thu, 16 Jul 1992 20:44:52 +0100
Received: from ecs.soton.ac.uk by eros.uknet.ac.uk via JANET with NIFTP (PP) 
          id <3464-0@eros.uknet.ac.uk>; Thu, 16 Jul 1992 20:38:24 +0100
Via: camra.ecs.soton.ac.uk; Thu, 16 Jul 92 20:33:51 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Thu, 16 Jul 92 20:39:35 BST
Date: Thu, 16 Jul 92 20:36:46 BST
Message-Id: <712.9207161936@bacchus.ecs.soton.ac.uk>
To: jim@meiko.co.uk, zrlp09@trc.amoco.com
Subject: Re: User-defined elemental procedures
Cc: hpff-forall@cs.rice.edu

Rex Page <zrlp09@com.amoco.trc Wed Jul 15 18:12:49 1992> writes:
> 
> In my comments on John's proposal for elemental procedures, I claimed
> that programmers could get the effect of elemental procedures through
> overloading.  I should have said "one of the effects." 
> .....
>
> John pointed out that the version of such a function that operates
> on array arguments (and delivers an array value) will probably be
> viewed by the compiler in the same way as other array-valued functions.
> That is, the compiler probably won't view the invocation as a
> collection of parallel invocations unless it does a tremendous amount
> of inter-procedural analysis.  Yet that view that would be obvious if
> the invocation were elemental.  The point here is that the elemental
> facility allows programmers to add precision to their specifications,
> that is to raise their level of communication.
> 
> I agree that this is a powerful argument, and I was opposed to
> removing elementals from Fortran 90.  Nevertheless they were removed.
> Putting them back in raises the ante for compiler developers, which
> is a trade-off HPFF must consider if it is important for people to be
> able to run their code on both sequential workstations and high
> performance systems.


But FORALL was also removed from F8x and is being considered for HPF.  
On a more general point, I personally don't think that HPF should be 
hamstrung by a requirement not to introduce *any* new facilities to F90 
(this comment is also directed to Jim Cownie about FORALL), provided 
the new syntax is kept to a minimum and the new facilities are clear, 
expressive, clearly useful for our purpose, and implementable across 
the whole range of architectures.  (I have reservations about complex 
MIMD extensions, for these reasons).  


> The last version of Fortran 8x to contain a facility for user-defined
> elemental functions did so by defining elemental invocations.  The
> language included no syntax for designating a function as elemental.
> It required the interface to be visible, making it possible for the
> compiler to recognize an elemental invocation by matching arrays in
> the actual argument list with scalars in corresponding positions of
> the dummy argument list.


Question:  How did F8x propose to check that such procedures are free of 
side-effects  (i.e. local arrays are not SAVEd, no I/O, and no definition 
of global data), which is necessary to make elemental invocations 
deterministic?  

(I assume it didn't check these things.  If not, that's arguably
acceptable in F90, where the consequence of breaking these rules is
simply non-determinism, which can arise anyway from breaking other 
rules that traditionally aren't checked in Fortran.  I believe it's
unacceptable not to check the constraints in the context of HPF,
as there are additional constraints, namely on data distribution, 
which if broken would result in deadlock).


> If we add elementals to HPF using the F8x approach, then we will be
> adding semantics to Fortran 90, but not syntax.  This simplifies the
> plight of someone trying to run an HPF program on a Fortran 90 system.
> People lacking an HPF compiler can link in a collection of overloads
> to simulate HPF elemental functions; they won't need to change any of
> the invoking code.


I'd be glad to avoid new syntax.  I just want some way of annotating
a procedure to assert that it obeys my constraints, so that the 
constraints can be checked.  I'd be happy with an HPF directive like:

	!HPF$ ELEMENTAL proc-name

at the end of the procedure statement or on the next line.  Is that 
an acceptable solution?


> Page:
> *    REAL v(3,10,10)
> *    INTERFACE
> *      ELEMENTAL FUNCTION f(x)
> *        REAL, DIMENSION(3) :: f,x
> *      END FUNCTION f
> *    END INTERFACE
> * ...
> * The question of the shape of the object f(v) needs careful attention.
> * John Merlin's proposal matches the first subscript of the actual and
> * dummy arguments and makes f elemental on the last two subscripts in
> * this invocation, so the shape of the result would be [10,10].
> * 
> * However, other matchings are possible...
>
> ...
>
> Merlin's reply, continued:
>
> > ... if HPF were to adopt this generalisation, there's definitley no 
> > arbitrariness about which indices of an actual argument match those of the 
> > corresponding dummy argument, and which are elemental.  Viewing the actual 
> > argument as an array of compound data objects, its leading indices must be 
> > matched with the dummy argument, as dictated by Fortran's column-major 
> > ordering.
> 
> Fortran 90 defines an array element ordering only so that a whole array
> in an i/o list will imply a particular sequence of i/o elements.  F90
> uses this notion in a storage association sense only where it needs this
> concept for compatibility with Fortran 77.  When one assumes that the
> leading indices are the ones that must match, one elevates the concept
> of array element ordering and drifts back towards the storage association
> regime.  HPF could define a particular matching, such as first indices,
> but should not try to justify the idea through appeals to array element
> ordering.
> 
> I tried to sell the idea of arrays of arrays (without going through
> structures) several times to the Fortran 8x committee, but came up short.
> One of the arguments against it was the arbitrary choice of indices to form
> elements of a lower-rank array from one of higher rank.  The array-of-arrays
> concept would be a major addition to the language.  The issue is whether
> it adds enough expressiveness to make it worthwhile.

Ok, I take your point.  My personal opinion is that, in this context, 
the 'arrays-of-arrays' concept would add considerably to the 
expressiveness of the language, and I don't think that one should be 
forced to use structures for the compound elements just for this purpose 
(i.e. if there's no other reason for using them).  
Also, I note that some other languages have the 'array-of-arrays' concept, 
at least to some extent (e.g. C, Pascal and occam---these being the only 
other langauges I know well!), but I admit that this concept sits
very naturally on these languages because of their row-wise element
ordering, and not-at-all naturally on Fortran with its columnwise ordering.

However, I'm not adamant on this point.  As always, it's a matter to be 
decided by concensus.


> Part of Merlin's clarification:
> > I would dispute [your statement that array expressions in FORALL may
> > refer to non-elemental functions]---you need the constraints 
> > I listed to ensure that the function may be evaluated concurrently over 
> > all elements of the FORALL assignment.
> 
> Fortran 90 (and 77 and 66) includes constraints against side-effects that
> make the value of an expression depend on the order of evaluation of
> subexpressions.  I would like HPF to expand these contraints to permit
> concurrent evaluation of subexpressions.  (I think that John's constraints
> are adequate and that they add to Fortran's existing constraints only the
> requirement that functions within the same statement must avoid sharing
> temporary resources.)  Then forall statements could call non-elemental
> functions with impunity, expanding their usefulness for SPMD programming.
> 
> If we add these constraints under all circumstances, we improve the
> language, but invalidate certain standard-conforming programs.  (I think
> the improvement would be worth the cost in this case, but I don't expect
> to convince many others.)  We will need to add the constraint in the case
> of FORALL regardless of the decision on elemental functions.  Otherwise,
> forall won't be a parallel construct.

Are you proposing that all functions be side-effect free and perform
no communications?  If so it seems a bit harsh.  Sorry if I've misunderstood 
you.


> (Actually, if we define forall to permit parallelism, then we inherit the
> constraints we need from Fortran's prohibition of side-effects in functions
> when they may change the meaning of an assignment statement.)

True, but that leaves the whole responsibility for correctness up
to the programmer, with no guidelines and no possibility of checking
---my preference is to give some rules and be able to check them.
Also, I want to impose more constraints than strictly necessary to satisfy
this criterion, e.g. to prohibit access to distributed data (see earlier).


> Merlin:
> * > Elemental subroutine calls could also be allowed in FORALL, with a very 
> * > similar interpretation to elemental functions.  The main reason for
> * > using them would be to allow the return of multiple results.  
> 
> ... stuff omitted
>
> If a subroutine is returning multiple data objects that are logically
> distinct, then the subroutine is ill-conceived.  If it is worth defining
> a subroutine, it is worth defining a datatype corresponding to the
> result it delivers.
> 
> The Fortran 8x version of elementals permitted elemental subroutines
> only for subroutines used to overload the assignment operation.  I'd
> like to restrict the elemental notion to functions only.  Subroutines
> (unlike functions) are an outmoded concept in Fortran 90.
> Accomplished programmers will not use them -- except to package i/o
> or other side-effects, or possibly for efficiency reasons to cope with
> poor compilers.

Again, I'm not adamant about introducing elemental subroutines to HPF.
Actually, in suggesting them I took my lead from Fortran 90, which
has an intrinsic elemental subroutine 'MVBITS ()' (unless I'm out of
date and it's been dropped!).  Admittedly though, its reason for being 
a subroutine is not that it has multiple outputs, but that it has an 
INOUT argument.  Perhaps that's another justification for elemental 
subroutines in HPF? :-)

Incidentally, if subroutines are an outmoded concept in Fortran 90, 
why does it introduce new intrinsic subroutines (for date, time, 
clock, random numbers, etc)?  Please feel free to ignore this bait,
however, as I don't want to start a debate about subroutines vs.functions! :-)


Jim Cownie <jim@uk.co.meiko Thu Jul 16 11:29:02 1992> writes:
>
> John Merlin writes:
> > ... I was thinking that the global data accessed by elemental 
> > procedures must be replicated (at least on the architectures I'm 
> > considering).  Why isn't that permissible under my restrictions?
>
> Because I need to use mapping directives on the global data objects to
> ensure that they are replicated. But your restriction number 5 :
>
> > 	5. It can only reference a global variable if it is not 
> > subject to data mapping directives.
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> explicitly forbids me from doing this...

Ok.  We must clarify the significance of omitting data mapping
directives.  I'd like to interpret it (at least on distributed 
memory platforms without demand-driven communications) to mean 
'replication' (or 'private' storage for the dummy arguments and locals
of elemental procedures, which is in some senses equivalent to replication).
That seems logical, by analogy with the way one would typically 
store scalars, which also don't have mapping directives.  
I think I'm allowed to interpret it that way---is that correct?

However, that does conflict to some extent with it's interpretation for 
dummy arguments, where it seems to mean 'inherited' or 'assumed' mapping.
I would support the introduction of a specific notation in the ALIGN 
directive for this case. That would correspond nicely with F90, 
which has a specific notation for assumed *shape* dummy arguments 
(e.g. 'DIMENSION (:,:)').  If and when HPF directives become actual 
Fortran syntax, it would be nice to have some form of 'ALIGN' and 
'DISTRIBUTE' attributes for all data objects that have non-trivial 
mapping, just as F90 requires some form of 'DIMENSION' attribute for 
all non-scalars.

                Cheers,
                     John Merlin.

From arthur@parcom.nl  Thu Jul 16 15:49:39 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA22520); Thu, 16 Jul 92 15:49:39 CDT
Received: from sun4nl.nluug.nl by erato.cs.rice.edu (AA16563); Thu, 16 Jul 92 15:49:30 CDT
Received: by sun4nl.nluug.nl via EUnet
	id AA03332 (5.65b/CWI-3.3); Thu, 16 Jul 1992 22:46:32 +0200
Message-Id: <9207162046.AA03332@sun4nl.nluug.nl>
Received: by prcmuu.parcom.nl; Thu, 16 Jul 92 22:32:53 +0200    
Date: Thu, 16 Jul 92 22:32:53 +0200
From: arthur@parcom.nl (Arthur Veen)
To: hpff-forall@erato.cs.rice.edu
Subject: Re: Back to square one

I would like to endorse Alan Karp's suggestion
(later endorsed by James Cownie and others) not to include
any FORALL construct in HPF at all.

As jim@meiko.co.uk said:

> Stopping at the point of adding NOTHING to the base language is
> very clean, clear, and simple.

Once you allow yourself to add extensions that affect
the semantics the sky is the limit. As a consequence
the discussions on a block-FORALL have so far been inconclusive,
and I don't see it converging in time for the next meeting.

The examples I have seen that support FORALL fall into two
classes depending on whether changing a DO in a FORALL does 
or does not change the semantics of the loop. 
In the latter case the FORALL simply asserts to the
compiler that there are no loop-carried dependencies. 
This information can be perfectly conveyed by a directive.

The discussion centers on those cases where changing a DO 
in a FORALL *does* change the 
semantics of the program. These examples can easily be rewritten
by making explicit copies of the arrays appearing in
the right-hand sides (or am I missing something ?).

Even for these cases the FORALL does not score high on my main criteria:

PORTABILITY
	As already pointed out by others FORALL is a real loser in this
	respect

CLARITY
	Of course, clarity is a matter of taste, but to me the
	explicit copies are much clearer than the implicit COPY-IN
	semantics of the FORALL.
	Compare for instance Richard Shapiro's example:

	> FORALL(I=1:NCARDS) DECK(I) = DECK(MOD(SEED2+SEED1*I,NCARDS)+1)

	with his own rewrite:

	>  DO I = 1,NCARDS
	>		TEMP(I) = DECK(MOD(SEED2+SEED1*I,NCARDS)+1)
	>  ENDDO
	>  DECK = TEMP

		 
	(By the way: the keyword "forall" seems ill-conceived to me: 
	 it does not convey anything about the COPY-IN semantics)

EFFICIENCY
	This seems to be main motive of the FORALL advocates.
	It seems to me that the sophisticated compiler that is
	needed for any effective HPF implementation can easily 
	recognize (if that is advantageous) the arrays that the programmer
	uses to store its temporary copies and thus generate the same
	code as if the FORALL was included.


--arthur

                 Arthur H. Veen
			Technical Director PREPARE

       Parallel Computing   ACE
       Postbus 16775        van Eeghenstraat 100
       1001 RG  Amsterdam   1071 GL  Amsterdam
       The Netherlands      The Netherlands
phone: +31-20-623 3274      +31-20-664 6416
Fax:                        +31-20-675 0389
E-mail:arthur@parcom.nl    arthur@ace.nl

From chk@cs.rice.edu  Thu Jul 16 16:21:25 1992
Received: from DialupEudora (charon.rice.edu) by cs.rice.edu (AA23682); Thu, 16 Jul 92 16:21:25 CDT
Message-Id: <9207162121.AA23682@cs.rice.edu>
Date: Thu, 16 Jul 1992 16:24:31 -0600
To: hpff-forall@cs.rice.edu
From: chk@cs.rice.edu
Subject: Back to business


OK, I think I've caught up on the FORALL mail.  Just in time, too, since 
we need to get a draft together for the meeting next week.  Fortunately, 
I think there is now consensus on at least some of the major issues.  
(Unfortunately, I am not sure one of those issues is "Should there be a 
FORALL at all?" More on this later...)

I'll edit the current documents together to produce a draft based on the 
points below.  If you see a problem with anything here, feel free to 
sound off now (or tomorrow, when the draft will hopefully appear).

Areas of general agreement:

1. FORALL should be a generalized assignment statement, basically not 
much more powerful than (masked) array assignment.  This is not the right 
mechanism for introducing general MIMD control.

2. Single-statement FORALLs should have the SIMD semantics of Fortran 8X 
FORALL, CM Fortran FORALL, MasPar FORALL, etc.  (See below for 
restrictions on what statements can be FORALLized.)

3. Block FORALLs should contain only series of generalized assignment 
statements, that is ordinary assignment, array assignment, FORALL 
constructs, and WHERE constructs.  

[Side note: I no longer advocate extending WHERE to allow scalar masks; 
the MERGE function performs (almost everything) I want without requiring 
extensions.  The only thing that is still inconvenient is something like 
(faulty syntax coming up!)
	FORALL ( i = 1:n )
	  IF ( a(i) .ne. 0.0 ) THEN
	  	x(i) = 1.0
		y(i) = 1.0
	  ELSE
	    x(i) = 2.0
		y(i) = 2.0
	  ENDIF
	END FORALL
That is, multiple statements controlled by the same condition; writeable 
using MERGE, but lots of replicated conditions.]

4. Dave Loveman's scalarization seems to be correct.  Unless I missed 
something, Marc Snir's scalarization is equivalent to Dave's in the 
restricted cases described above.

5. Min-You's INDEPENDENT proposal has generated no comment, except for 
two renaming suggestions (BEGIN INDEPENDENT and BEGINDEPENDENT).  I've 
heard a few comments privately, however, that this is not sufficient, in 
particular that reductions are needed as well.


Major areas of disagreement:

I. Should FORALL be in HPF at all?  Alan Karp makes a good argument that 
it is not necessary.  I agree that anything expressible by FORALL can 
also be written using DO, array assignments, and (maybe) compiler 
directives.  However, comments on with Fortran 8X and CM Fortran indicate 
that users strongly support the expressiveness of FORALL.

For purposes of the draft, FORALL will definitely be included (else it 
would be an awfully short document!).  I think that user preferences 
should sway us toward including FORALL in full HPF.  In view of the 
controversy, however, I will include language to the effect that FORALL 
should not be in the subset.

II. Should functions in FORALL be restricted, and if so how?  The 
candidates are
	A. (Steele, Loveman and Snir proposals) Any function can be called, but 
	it cannot have side effects on the subscripts in the assignment 
	statement.  It appears that other side effects are allowed, but the 
	order of evaluation is undefined.  Am I reading that right?
	B. (Merlin, Page, and Cowie discussion) Introduce ELEMENTAL functions, 
	and only allow those to be called in FORALLs.  The two restrictions of 
	ELEMENTALs are (roughly) to enforce no side effects and to ensure that 
	the functions can be called without communication.
I suspect that others could cloud the issue with new proposals if they so 
chose.

For purposes of the draft, I will take a minimalist approach.  I'll 
eschew adding ELEMENTAL functions, but place more restrictions on the 
behavior of functions called from FORALL (essentially, that they can't 
have any side effects visible in other iterations).

Don't let this stop the discussion of ELEMENTAL.  I can be outvoted!  In 
fact, I'd support adding ELEMENTAL as an assertion to provide compiler 
information.


Time permitting, I'll also circulate a separate proposal recommending 
several other assertions along the lines of INDEPENDENT.  We can't agree 
on these before the next meeting, I'm sure.  However, I think we can 
treat them as a preliminary proposal for the meeting after that.

						Chuck


From chk@cs.rice.edu  Fri Jul 17 18:05:49 1992
Received: from  by cs.rice.edu (AB25300); Fri, 17 Jul 92 18:05:49 CDT
Message-Id: <9207172305.AB25300@cs.rice.edu>
Date: Fri, 17 Jul 1992 18:08:27 -0600
To: hpff-forall@cs.rice.edu
From: chk@cs.rice.edu
Subject: FORALL proposal
X-Attachments: :Macintosh HD:3057:FORALL-chk.tex:


\documentstyle{article}

\oddsidemargin=0.25in
\textwidth=6.0in
\topmargin=-1in
\textheight=9in
\parindent=1em

\newdimen\bnfalign         \bnfalign=2in
\newdimen\bnfopwidth       \bnfopwidth=.3in
\newdimen\bnfindent        \bnfindent=.2in
\newdimen\bnfsep           \bnfsep=6pt
\newdimen\bnfmargin        \bnfmargin=0.5in
\newdimen\codemargin       \codemargin=0.5in
\newdimen\intrinsicmargin  \intrinsicmargin=3em
\newdimen\casemargin       \casemargin=0.75in
\newdimen\argumentmargin   \argumentmargin=1.8in

\def\IT{\it}
\def\RM{\rm}
\let\CHAR=\char
\let\CATCODE=\catcode
\let\DEF=\def
\let\GLOBAL=\global
\let\RELAX=\relax
\let\BEGIN=\begin
\let\END=\end


\def\FUNNYCHARACTIVE{\CATCODE`\a=13 \CATCODE`\b=13 \CATCODE`\c=13 \CATCODE`\d=13
		     \CATCODE`\e=13 \CATCODE`\f=13 \CATCODE`\g=13 \CATCODE`\h=13
		     \CATCODE`\i=13 \CATCODE`\j=13 \CATCODE`\k=13 \CATCODE`\l=13
		     \CATCODE`\m=13 \CATCODE`\n=13 \CATCODE`\o=13 \CATCODE`\p=13
		     \CATCODE`\q=13 \CATCODE`\r=13 \CATCODE`\s=13 \CATCODE`\t=13
		     \CATCODE`\u=13 \CATCODE`\v=13 \CATCODE`\w=13 \CATCODE`\x=13
		     \CATCODE`\y=13 \CATCODE`\z=13 \CATCODE`\[=13 \CATCODE`\]=13
                     \CATCODE`\-=13}

\def\RETURNACTIVE{\CATCODE`\
=13}

\makeatletter
\def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus 
 -.2ex}{2.3ex plus .2ex}{\large\sf}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-3.25ex plus -1ex minus 
 -.2ex}{1.5ex plus .2ex}{\large\sf}}

\def\@ifpar#1#2{\let\@tempe\par \def\@tempa{#1}\def\@tempb{#2}\futurelet
    \@tempc\@ifnch}

\def\?#1.{\begingroup\def\@tempq{#1}\list{}{\leftmargin\intrinsicmargin}\relax
  \item[]{\bf\@tempq.} \@intrinsictest}
\def\@intrinsictest{\@ifpar{\@intrinsicpar\@intrinsicdesc}{\@intrinsicpar\relax}}
\long\def\@intrinsicdesc#1{\list{}{\relax
  \def\@tempb{ Arguments}\ifx\@tempq\@tempb
			  \leftmargin\argumentmargin
			  \else \leftmargin\casemargin \fi
  \labelwidth\leftmargin  \advance\labelwidth -\labelsep
  \parsep 4pt plus 2pt minus 1pt
  \let\makelabel\@intrinsiclabel}#1\endlist}
\long\def\@intrinsicpar#1#2\\{#1{#2}\@ifstar{\@intrinsictest}{\endlist\endgroup}}
\def\@intrinsiclabel#1{\setbox0=\hbox{\rm #1}\ifnum\wd0>\labelwidth
  \box0 \else \hbox to \labelwidth{\box0\hfill}\fi}
\def\Case(#1):{\item[{\it Case (#1):}]}
\def\ {\@ifnextchar({\def\@tempq{#1}\@intrinsicopt}{\item[#1]}}
\def\@intrinsicopt(#1){\item[{\@tempq} (#1)]}

\def\MATRIX#1{\relax
    \@ifnextchar,{\@MATRIXTABS{}#1,\@FOO, \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar;{\@MATRIXTABS{}#1,\@FOO; \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar:{\@MATRIXTABS{}#1,\@FOO: \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar.{\hfill\penalty1\null\penalty10000\hskip0pt plus 1filll
		  \@MATRIXTABS{}#1,\@FOO.\penalty-50\@gobble
  }{\@MATRIXTABS{}#1,\@FOO{ }\hskip0pt plus 1filll\penalty-1}}}}}

\def\@MATRIXTABS#1#2,{\@ifnextchar\@FOO{\@MATRIX{#1#2}}{\@MATRIXTABS{#1#2&}}}
\def\@MATRIX#1\@FOO{\(\left[\begin{array}{rrrrrrrrrr}#1\end{array}\right]\)}

\def\@IFSPACEORRETURNNEXT#1#2{\def\@tempa{#1}\def\@tempb{#2}\futurelet\@tempc\@ifspnx}

{
\FUNNYCHARACTIVE
\GLOBAL\DEF\FUNNYCHARDEF{\RELAX
    \DEFa{{\IT\CHAR"61}}\DEFb{{\IT\CHAR"62}}\DEFc{{\IT\CHAR"63}}\RELAX
    \DEFd{{\IT\CHAR"64}}\DEFe{{\IT\CHAR"65}}\DEFf{{\IT\CHAR"66}}\RELAX
    \DEFg{{\IT\CHAR"67}}\DEFh{{\IT\CHAR"68}}\DEFi{{\IT\CHAR"69}}\RELAX
    \DEFj{{\IT\CHAR"6A}}\DEFk{{\IT\CHAR"6B}}\DEFl{{\IT\CHAR"6C}}\RELAX
    \DEFm{{\IT\CHAR"6D}}\DEFn{{\IT\CHAR"6E}}\DEFo{{\IT\CHAR"6F}}\RELAX
    \DEFp{{\IT\CHAR"70}}\DEFq{{\IT\CHAR"71}}\DEFr{{\IT\CHAR"72}}\RELAX
    \DEFs{{\IT\CHAR"73}}\DEFt{{\IT\CHAR"74}}\DEFu{{\IT\CHAR"75}}\RELAX
    \DEFv{{\IT\CHAR"76}}\DEFw{{\IT\CHAR"77}}\DEFx{{\IT\CHAR"78}}\RELAX
    \DEFy{{\IT\CHAR"79}}\DEFz{{\IT\CHAR"7A}}\DEF[{{\RM\CHAR"5B}}\RELAX
    \DEF]{{\RM\CHAR"5D}}\DEF-{\@IFSPACEORRETURNNEXT{{\CHAR"2D}}{{\IT\CHAR"2D}}}}
}

%%% Warning!  Devious return-character machinations in the next several lines!
%%%           Don't even *breathe* on these macros!
{\RETURNACTIVE\global\def\RETURNDEF{\def
{\@ifnextchar\FNB{}{\@stopline\@ifnextchar
{\@NEWBNFRULE}{\penalty\@M\@startline\ignorespaces}}}}\global\def\@NEWBNFRULE
{\vskip\bnfsep\@startline\ignorespaces}\global\def\@ifspnx{\ifx\@tempc\@sptoken \let\@tempd\@tempa \else \ifx\@tempc
\let\@tempd\@tempa \else \let\@tempd\@tempb \fi\fi \@tempd}}
%%% End of bizarro return-character machinations.

\def\IS{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth  \hskip-\bnfalign
  \hbox to \bnfalign{\unhbox\@curfield\hfill}\hbox to \bnfopwidth{\bf is \hfill}}
\def\OR{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth \hbox to \bnfopwidth{\bf or \hfill}}
\def\R#1 {\hbox to 0pt{\hskip-\bnfmargin R#1\hfill}}
\def\XBNF{\FUNNYCHARDEF\FUNNYCHARACTIVE\RETURNDEF\RETURNACTIVE
  \def\@underbarchar{{\char"5F}}\tt\frenchspacing
  \advance\@totalleftmargin\bnfmargin \tabbing
  \hskip\bnfalign\hskip\bnfopwidth\hskip\bnfindent\=\kill\>\+\@gobblecr}
\def\endXBNF{\-\endtabbing}

\def\BNF{\BEGIN{XBNF}}
\def\FNB{\END{XBNF}}

\begingroup \catcode `|=0 \catcode`\\=12
|gdef|@XCODE#1\EDOC{#1|endtrivlist|end{tt}}
|endgroup

\def\CODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces \@XCODE}
\def\ICODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces
   \FUNNYCHARDEF\FUNNYCHARACTIVE \UNDERBARACTIVE\UNDERBARDEF \@XCODE}

\def\@underbarsub#1{{\ifmmode _{#1}\else {$_{#1}$}\fi}}
\let\@underbarchar\_
\def\@underbar{\let\@tempq\@underbarsub\if\@tempz A\let\@tempq\@underbarchar\fi
  \if\@tempz B\let\@tempq\@underbarchar\fi\if\@tempz C\let\@tempq\@underbarchar\fi
  \if\@tempz D\let\@tempq\@underbarchar\fi\if\@tempz E\let\@tempq\@underbarchar\fi
  \if\@tempz F\let\@tempq\@underbarchar\fi\if\@tempz G\let\@tempq\@underbarchar\fi
  \if\@tempz H\let\@tempq\@underbarchar\fi\if\@tempz I\let\@tempq\@underbarchar\fi
  \if\@tempz J\let\@tempq\@underbarchar\fi\if\@tempz K\let\@tempq\@underbarchar\fi
  \if\@tempz L\let\@tempq\@underbarchar\fi\if\@tempz M\let\@tempq\@underbarchar\fi
  \if\@tempz N\let\@tempq\@underbarchar\fi\if\@tempz O\let\@tempq\@underbarchar\fi
  \if\@tempz P\let\@tempq\@underbarchar\fi\if\@tempz Q\let\@tempq\@underbarchar\fi
  \if\@tempz R\let\@tempq\@underbarchar\fi\if\@tempz S\let\@tempq\@underbarchar\fi
  \if\@tempz T\let\@tempq\@underbarchar\fi\if\@tempz U\let\@tempq\@underbarchar\fi
  \if\@tempz V\let\@tempq\@underbarchar\fi\if\@tempz W\let\@tempq\@underbarchar\fi
  \if\@tempz X\let\@tempq\@underbarchar\fi\if\@tempz Y\let\@tempq\@underbarchar\fi
  \if\@tempz Z\let\@tempq\@underbarchar\fi\@tempq}
\def\@under{\futurelet\@tempz\@underbar}

\def\UNDERBARACTIVE{\CATCODE`\_=13}
\UNDERBARACTIVE
\def\UNDERBARDEF{\def_{\protect\@under}}
\UNDERBARDEF

\catcode`\$=11  \catcode`\%=11

\makeatother


\begin{document}


\title{Proposal for FORALL in High Performance Fortran}
\author{David Loveman and Charles Koelbel}


\maketitle

\section{Overview}

The purpose of the FORALL construct is to provide a convenient syntax for 
simultaneous assignments to large groups of array elements.
In this respect it is very similar to the functionality provided by array 
assignments and WHERE constructs.
FORALL differs from these constructs primarily in its syntax, which is 
intended to be more suggestive of local operations on each element of an 
array.
It is also possible to specify slightly more general array regions than 
are allowed by the basic array triplet notation.
Both single-statement and block FORALLs are defined in this proposal.

Recent discussions within the FORALL subgroup have made it clear that 
some are opposed to the construct on the grounds that it is unnecessary 
and perhaps confusing.
It is, however, clear that others in the group strongly support the added 
expressiveness provided by FORALL.
Because of this disagreement, the committee recommends that if FORALL is 
accepted into HPF, that it not be included in the official subset.
We feel, however, that it is important to define the construct so that 
implementations with this functionality have consistent semantics.

The following proposal is designed as a modification of the Fortran 90 
standard; all references to rule numbers and section numbers pertain to 
that document unless otherwise noted.

\section{Element Array Assignment - FORALL}  

The element array
assignment statement is used to specify an array assignment in terms of
array elements or array sections.  The element array assignment may be
masked with a scalar logical expression.  Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsection{General Form of Element Array Assignment}

                                                                       \BNF
forall-stmt          \IS FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ]) forall-assignment

forall-triplet-spec  \IS subscript-name = subscript : subscript [ : stride]
                                                                       \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\begin{verbatim}
forall-assignment    \IS array-element = expr
                     \OR array-section = expr
\end{verbatim}

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(INT((m2 -m1 + m3) / m3) \leq 0\), the {\it forall-assignment} is not executed.

Examples of element array assignments are:
                                                                  \CODE
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)
FORALL (I=1:N, J=1:N, A(I,J) .NE. 0.0) B(I,J) = 1.0 / A(I,J)
                                                                  \EDOC 

\subsection{Interpretation of Element Array Assignments}  

Execution of an element array assignment consists of the following steps:
\begin{enumerate}
\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.
\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.
\item Evaluation in any order of the {\it expr} and all subscripts 
contained in the 
{\it array-element} or {\it array-section} in the {\it forall-assignment} 
for all active combinations of {\em subscript-name} values.
\item Assignment of the computed {\it expr} values to the corresponding 
elements specified by {\it array-element} or {\it array-section}.
\end{enumerate}
If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.  The scope of a
{\it subscript-name} is the FORALL statement itself. 
The evaluation of the {\it expr} for a particular active combination of {\it 
subscript-name} values may neither affect nor be affected by 
the evaluation of {\it expr} for any other combination of {\it 
subscript-name} values.
In particular, functions cannot produce side effects that are visible in 
the FORALL.
The evaluation of the {\it expr} or any subscript on the left-hand side 
of the {\it forall-assignment} for any active combination of {\it 
subscript-name} values may not affect 
nor be affected by 
the evaluation of any subscript in the {\it forall-assignment}, either 
for the same 
combination of {\it subscript-name} values or a different active combination.
In particular, a function reference
appearing in any expression in the {\it forall-assignment} must not
redefine any subscript name.


\subsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:
                                                                   \CODE
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn , mask ) &
      a(e1,...,em) = rhs
                                                                   \EDOC
is equivalent to the following standard Fortran 90 code:
                                                                    \CODE
!evaluate subscript and stride expressions in any order
templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
	  ...
  END DO
END DO

!then evaluate the expr in the forall-assignment 
!for all valid combinations of subscript names 
!for which the scalar mask expression is true
!(it is safe to avoid saving the subscript expressions
!because of the conditions on FORALL expressions)
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
	  ...
  END DO
END DO

!then perform the assignment of these values to 
!the corresponding elements of the array being assigned to
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(e1,...,em) = temprhs(v1,v2,...,vn)
        END IF
      END DO
	  ...
  END DO
END DO
                                                                      \EDOC

\subsubsection{Consequences of the Definition of the FORALL Statement}

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  (This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Subscripts on the left hand side of a {\it forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\item The evaluation of expressions within {\it array-element} or {\it
array-section} must neither affect nor be affected by the evaluation of 
{\it expr}.

\item The evaluations of right-hand-side expressions in different FORALL 
instances (iterations) must not affect each other through side-effects.

\end{itemize}


\section{FORALL Construct}

The FORALL construct is a generalization of the element array
assignment statement allowing multiple assignments, masked array 
assignments, and nested FORALL statements to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it forall-construct}.\\

\subsubsection{General Form of the FORALL Construct}

                                                                    \BNF
forall-construct        \IS FORALL (forall-triplet-spec-list [ ,scalar-mask-expr ])
                          forall-body-stmt-list
                        END FORALL

forall-body-stmt     \IS forall-assignment
                     \OR where-stmt
					 \OR forall-stmt
					 \OR forall-construct
                                                                    \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\noindent
Constraint:  Any left-hand side {\it array-section} or {\it 
array-element} in any {\it forall-body-stmt}
must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
within a {\it forall-construct}, then the inner FORALL may not redefine 
any {\it subscript-name} used in the outer {\it forall-construct}.
This rule applies recursively in the event of multiple nesting levels.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(INT((m2 -m1 + m3) / m3) \leq 0\), the {\it forall-assignment}s are not 
executed.

Examples of the FORALL construct are:
                                                                 \CODE
FORALL ( i = 2:n-1, j = 2:i-1 )
  a(i,j) = a(i,j-1) + a(i,j+1) + a(i-1,j) + a(i+1,j)
  b(i,j) = a(i,j)
END FORALL

FORALL ( i = 1:n-1 )
  FORALL ( j = i+1:n )
    a(i,j) = a(j,i)
  END FORALL
END FORALL

FORALL ( i = 1:n, j = 1:n )
  a(i,j) = MERGE( a(i,j), a(i,j)**2, i.eq.j )
  WHERE ( .not. done(i,j,1:m) )
    b(i,j,1:m) = b(i,j,1:m)*x
  END WHERE
END FORALL
																 \EDOC


\subsection{Interpretation of the FORALL Construct}  

Execution of an element array assignment consists of the following steps:
\begin{enumerate}
\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.
\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.
\item Execute the {\it forall-body-stmts} in the order they appear.
Each statement is executed completely (that is, for all active 
combinations of {\it subscript-name} values) according to the following 
interpretation:
\begin{enumerate}
\item Assignment statements and array assignment statements (i.e. 
statements in the {\it forall-assignment} category) evaluate the 
right-hand side {\it expr} and any left-and side subscripts for all 
active {\it subscript-name} values,
then assign those results to the corresponding left-hand side references.
\item WHERE statements evaluate their {\it mask-expr} for all active 
combinations of {\it subscript-name} values.
The assignments within the WHERE branch of the statement are then 
executed in order using the above interpretation of array assignments 
within the FORALL, but the only array elements assigned are those 
selected by both the active {\it subscript-names} and the WHERE mask.
Finally, the assignments in the ELSEWHERE branch are executed if that 
branch is present.
The assignments here are also treated as array assignments, but elements 
are only assigned if they are selected by both the active combinations 
and by the negation of the WHERE mask.
\item FORALL statements and FORALL constructs first evaluate the 
subscript and stride expressions in 
the {\it forall-triplet-spec-list} for all active combinations of the 
outer FORALL constructs.
The set of valid combinations of {\it subscript-names} for the inner 
FORALL is then the union of the sets defined by these bounds and strides 
for each active combination of the outer {\it subscript-names}.
For example, the valid set of the inner FORALL in the second example in 
the last section is the upper triangle (not including the main diagonal) 
of the \(n \times n\) matrix a.
The scalar mask expression is then evaluated for all valid combinations 
of the inner FORALL's {\it subscript-names} to produce the set of active 
combinations.
If there is no scalar mask expression, it is assumed to be always true.
Each statement in the inner FORALL is then executed for each valid 
combination (of the inner FORALL), recursively following the 
interpretations given in this section.
\end{enumerate}
\end{enumerate}
If the scalar mask expresion is omitted, it is as if it were
present with the value true.

A single statement in a {\it forall-construct} must not cause any element 
of the array
being assigned to be assigned a value more than once.
It is, however, permitted that different statements may assign to the 
same array element.
The scope of a
{\it subscript-name} is the FORALL construct itself. 
The evaluation of any subexpression of an assignment or array 
assignment for a particular 
active combination of {\it subscript-name} values may neither affect nor be
affected by 
the evaluation any subexpression in the same statement for any other 
combination of {\it subscript-name} values.
Similarly, evaluation of the mask or subscript bounds and stride 
expressions in an inner WHERE or FORALL for one active combination of 
{\it subscript-name} values may not affect nor be affected by the 
evaluations of those subexpressions for any other active combination.
The evaluation of any subexpression in an assignment or array assignment 
for any active combination of {\it subscript-name} values may not affect 
nor be affected by 
the evaluation of any subscript in the same statement, either 
for the same 
combination of {\it subscript-name} values or a different active combination.
In particular, a function reference
appearing in any expression in the {\it forall-assignment} must not
redefine any subscript name.

\subsection{Scalarization of the FORALL Construct}

A {\it forall-construct} of the form:
                                                                \CODE
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
                                                                \CODE
where each si is an assignment is equivalent to the following scalar code:
                                                                \CODE
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
                                                                \CODE
A similar statement can be made using FORALL constructs when some of the 
si are WHERE or FORALL constructs.

A {\it forall-construct} of the form:
                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  WHERE ( mask2(l2:u2) )
    a(vi,l2:u2) = rhs1
  ELSEWHERE
    a(vi,l2:u2) = rhs2
  END WHERE
END FORALL
                                                                   \EDOC
is equivalent to the following standard Fortran 90 code:
                                                                    \CODE
!evaluate subscript and stride expressions in any order
templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression
DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the masks for the WHERE
DO v1=templ1,tempu1,temps1
  tmpl2(v1) = l2
  tmpu2(v1) = u2
  IF (tempmask(v1)) THEN
    tempmask2(v1,tmpl2(v1):tmpu2(v1)) = mask2(tmpl2(v1):tmpu2(v1))
  END IF
END DO

!then evaluate the WHERE branch
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs1(v1,tmpl2(v1):tmpu2(v1)) = rhs1
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs1(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO

!then evaluate the ELSEWHERE branch
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs2(v1,tmpl2(v1):tmpu2(v1)) = rhs2
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs2(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO
                                                                      \EDOC

A {\it forall-construct} of the form:
                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  FORALL ( v2=l2:u2:s2, mask2 )
    a(e1,e2) = rhs1
	b(e3,e4) = rhs2
  END FORALL
END FORALL
                                                                   \EDOC
is equivalent to the following standard Fortran 90 code:
                                                                    \CODE
!evaluate subscript and stride expressions in any order
templ1 = l1
tempu1 = u1
temps1 = s1
!then evaluate the FORALL mask expression
DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the inner FORALL bounds, etc
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    templ2(v1) = l2
    tempu2(v1) = u2
    temps2(v1) = s2
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      tempmask2(v1,v2) = mask2
	END DO
  END IF
END DO

!first statement
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        temprhs1(v1,v2) = rhs1
	  END IF
	END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        a(e1,e2) = temprhs1(v1,v2)
	  END IF
	END DO
  END IF
END DO

!first statement
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        temprhs2(v1,v2) = rhs2
	  END IF
	END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        b(e3,e4) = temprhs2(v1,v2)
	  END IF
	END DO
  END IF
END DO
                                                                      \EDOC


\subsubsection{Consequences of the Definition of the FORALL Construct}

\begin{itemize}

\item A block FORALL means the same as replicating the FORALL
header in front of each array assignment statement in the block, except
that any expressions in the FORALL header are evaluated only once,
rather than being re-evaluated before each of the statements in the body.
(This statement needs some modification in the case of nesting.)

\item One may think of a block FORALL as synchronizing twice per
contained assignment statement: once after handling the rhs and other 
expressions
but before performing assignments, and once after all assignments have
been performed but before commencing the next statement.  (In practice,
appropriate dependence analysis will often permit the compiler to
eliminate unnecessary synchronizations.)

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  (This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item In general, any expression in a FORALL is evaluated only for valid 
combinations of all surrounding subscript names for which all the
scalar mask expressions are true.

\item Nested FORALL bounds and strides can depend on outer FORALL {\it 
subscript-names}.  They cannot redefine those names, even temporarily (if 
they did there  would be no way to avoid multiple assignments to the same 
array element).

\item Dependences are allowed from one statement to later statements, but 
never from an assignment statement to itself.
Masks and subscript bounds could conceivably have side effects visible in 
the rest of the nested statement.

\end{itemize}


\end{document}


From gmap11@f1ibmsv2.gmd.de  Sat Jul 18 14:51:27 1992
Received: from gmdzi.gmd.de by cs.rice.edu (AA06637); Sat, 18 Jul 92 14:51:27 CDT
Received: from f1ibmsv2.gmd.de (f1ibmsv2) by gmdzi.gmd.de with SMTP id AA08168
  (5.65c/IDA-1.4.4 for <hpff-forall@cs.rice.edu>); Sat, 18 Jul 1992 21:50:28 +0200
Received: by f1ibmsv2.gmd.de id AA05018; Sat, 18 Jul 92 21:51:06 GMT
Date: Sat, 18 Jul 92 21:51:06 GMT
From: gmap11@f1ibmsv2.gmd.de (C.A. Thole)
Message-Id: <9207182151.AA05018@f1ibmsv2.gmd.de>
To: hpff-forall@cs.rice.edu

	          A Proposal for MIMD Support in HPF
         Clemens-August Thole, GMD I1.T, D-5205 St. Augustin
                        gmap11@gmdzi.gmd.de
 
                     Version of June 22, 1992

[1] Abstract

This proposal tries to supply sufficient language support in order 
to deal with loosely sysnchronous programs, some of which have been 
identified in my "A vote for explicit MIMD support".
This is a proposal for the support of MIMD parallelism, which extends
the proposal for !HPF$ INDEPENDENT by Guy Steele. It is more oriented
towards the CRAY - MPP Fortran Programming Model and the PCF proposal. 
The fine-grain synchronization of PCF is not proposed for implementation.
Instead of the CRAY-mechanims for assigning work to processors an 
extension of the ON-clause is used.
Due to the lack of fine-grain synchronization the constructs can be executed
on SIMD or sequential architectures just by ignoring the additional information.


[2] Summary of the current situation of MIMD support as part of HPF

According to the Chuck Koelbel's mail (CRPC) dated March 20th "Working Group 4 -
Issues for discussion" MIMD-support is a topic for discussion within working
group 4. 

Dave Loveman (DEC) has produced a document on FORALL statements (...) which
summarizes the discussion. Marc Snir proposed some extensions. These
constructs allow to describe SIMD extensions in an extended way compared
to array assignments. 

A topic for working papers is the interface of HPF Fortran to program units
which execute in SPMD mode. Proposals for "Local Subroutines" have been made
by Guy Steele (Proposal for local program units in HPF, 25.03.92) and
Marc Snir (Proposal for Local Routines, 16.06.92). Both proposals
define local subroutines as program units, which are executed by all
processors independent of each other. Each processor has only access
to the data contained in its local memory. Parts of distributed data objects
can be accessed and updated by calls to a special library. Any message-passing
library might be used for synchronization and communication.
This approach does not really integrate MIMD-support into HPF programming.

The MPP Fortran proposal by Douglas M. Pase, Tom MacDonald, Andrew Meltzer (CRAY)
contained the following features in order to support integrated MIMD features:
   -  parallel directive
   -  shared loops 
   -  private variables
   -  barrier synchronization
   -  no-barrier directive for removing synchronization
   -  locks, events, critical sections and atomic update
   -  functions, to examine the mapping of data objects.

Steele's "Proposal for loops in HPF" (02.04.92) included a proposal for a 
directive "!HPF$ INDEPENDENT( integer_variable_list)", which specifies
for the next set of nested loops, that the loops with the specified
loop variables can be executed independent from each other.

Chuck Koelbel gave an overview on different styles for parallel loops
in "Parallel Loops Position Paper". No specific proposal was made.

Min-You Wu "Proposal for FORALL, May 1992" extended Guy Steele's 
"!HPF$INDEPENDENT" proposal to use the directive in a block style.

Clemens-August Thole "A vote for explicit MIMD support" contains 3 examples
from different application areas, which seem to require MIMD support for
efficient execution. 

Summary:

In contrast to FORALL extensions MIMD support is currently not well-established
as part of HPF Fortran. The examples in "A vote for explicit MIMD support"
show clearly the need for such features. Local subroutines do not fulfill
the requirements because they force to use a distributed memory programming model,
which should not be necessary in most cases.

With the exception of parallel sections all interesting features
are contained in the MPP-proposal. I would like to split the discussion
on specifying parallelism, synchronization and mapping into three different
topics. Furthermore I would like to see corresponding features to be expessed
in the style of of the current X3H5 proposal, if possible, in order to
be in line with upcoming standards.


[3] Proposal for MIMD support

In order to support the spezification of MIMD-type of parallelism the following
features are taken from the "Fortran 77 Binding of X3H5 Model for 
Parallel Programming Constructs": 

    -   PARALLEL DO construct/directive
    -   PARALLEL SECTIONS worksharing construct/directive
    -   NEW statement/directive

These constructs are not used with PCF like options for mapping or 
sysnchronisation but are combined with the ON clause for mapping operations
onto the parallel architecture. 

[3.1] PARALLEL DO

Explicit Syntax

The PARALLEL DO construct is used to specify parallelism amoung the 
iterations of a block of code. The PARALLEL DO construct has the same
syntax as a DO statement. For an directive approach the directive
!HPF$ PARALLEL can be used in front of a do statement.
After the PARALLEL DO statement a new-declaration may be inserted.

A PARALLEL DO construct might be nested with other parallel constructs. 

Interpretation

The PARALLEL DO is used to specify parallel execution of the iterations of
a block of code. Each iteration of a PARALLEL DO is an independent unit
of work. The iterations of PARALLEL DO must be data independent. Iterations
are data independent if the storage sequence accociated with each variable
are array element that is assigned a value by each iteration is not referenced
by any other iteration. 

A program is not HPF conforming, if for any iteration a statement is executed,
which causes a transfer of control out of the block defined by the PARALLEL
DO construct. 

The value of the loop index of a PARALLEL DO is undefined outside the scope
of the PARALLEL DO construct. 


[3.2] PARALLEL SECTIONS

The parallel sections construct is used to specify parallelism among sections
of code.

Explicit Syntax

        !HPF$ PARALLEL SECTIONS
        !HPF$ SECTION
        !HPF$ END PARALLEL SECTIONS

structured as

        !HPF$ PARALLEL SECTIONS
        [new-declaration-stmt-list]
        [section-block]
        [section-block-list]
        !HPF$ END PARALLEL SECTIONS
   
where [section-block] is

        !HPF$ SECTION
        [execution-part]


Interpretation

The parallel sections construct is used to specify parallelism among sections
of code. Each section of the code is an independent unit of work. A program
is not standard conforming if during the execution of any parallel sections
construct a transfer of control out of the blocks defined by the Parallel
Sections construct is performed. 
In a standard conforming program the sections of code shall be data 
independent. Sections are data independent if the storage sequence accociated 
with each variable are array element that is assigned a value by each section
is not referenced by any other section. 


[3.3] Data scoping

Data objects, which are local to a subroutine, are different between 
distinct units of work, even if the execute the same subroutine.


[3.4] NEW statement/directive

The NEW statement/directive allows the user to generate new instances of 
objects with the same name as an object, which can currently be referenced.


Explicit Syntax

A [new-declaration-stmt] is

       !HPF$ NEW variable-name-list


Coding rules

A [varable-name] shall not be 
-    the name of an assumed size array, dummy argument, common block, function
     or entry point
-    of type character with an assumed length
-    specified in a SAVE of DATA statement
-    associated with any object that is shared for this parallel construct.


Interpretation
 
Listing a variable on a NEW statement causes the object to be explicitly
private for the parallel construct. For each unit of work of the parallel 
construct a new instance of the object is created and referenced with the
specific name. 

From arthur@parcom.nl  Sun Jul 19 06:02:45 1992
Received: from sun4nl.nluug.nl by cs.rice.edu (AA12202); Sun, 19 Jul 92 06:02:45 CDT
Received: by sun4nl.nluug.nl via EUnet
	id AA12692 (5.65b/CWI-3.3); Sun, 19 Jul 1992 12:59:43 +0200
Message-Id: <9207191059.AA12692@sun4nl.nluug.nl>
Received: by prcmuu.parcom.nl; Sun, 19 Jul 92 12:23:12 +0200    
Date: Sun, 19 Jul 92 12:23:12 +0200
From: arthur@parcom.nl (Arthur Veen)
To: hpff-forall@cs.rice.edu
Subject: Re: FORALL proposal
Cc: hpfp-all@parcom.nl

A few remarks concerning Chuck's latest FORALL proposal
posted 17 Jul 1992 18:08:27

A - If HPF is going to extend the base language (I prefer that
    it doesn't), I would like to have one extension rather
    than two, i.e. EITHER the forall-stmt OR the forall-construct
	but not both.
	As far as I can tell, the only convenience the forall-stmt
	gives above the forall-construct is that you can leave out 
	"END FORALL".

B - If A is adopted and the choice is made to include
    forall-construct and to leave out forall-stmt, the proposal
	can be further simplified by leaving out the scalar-mask-expr.
	As far as I can tell masks do not give any expressiveness above
	the where-stmt.

C - There appears to be a contradiction in section 3.1
    "Interpretation of the FORALL Construct". It says:

	> A single statement in a {\it forall-construct} must not cause any element
	> of the array
	> being assigned to be assigned a value more than once.
	> It is, however, permitted that different statements may assign to the
	> same array element.

	However, "A single statement in a {\it forall-construct}" may be
	a inner forall-construct, which may assign to the same array
	element more than once.
	I think the intention is that for any level of a (possibly
	nested) forall-construct, *different* valid combinations of
	subscript-name values may not assign to the *same* array
	element.

	A separate question about these "must not"s and "may not"s that
	appear in the proposal. I hope these are meant as admonishments
	to the application programmer rather than to the compiler
	writers. In other words that they mean that programs that do
	not meet these requirements may produce arbitrary results.

D - A small but confusing typo. In section 3 FORALL Construct:

	> Examples of the FORALL construct are:
	>                                                    \CODE
	> FORALL ( i = 2:n-1, j = 2:i-1 )
                                ^
	should be

	  FORALL ( i = 2:n-1, j = 2:n-1 )

	I hope

--arthur

                 Arthur H. Veen
       Parallel Computing   ACE
       Postbus 16775        van Eeghenstraat 100
       1001 RG  Amsterdam   1071 GL  Amsterdam
       The Netherlands      The Netherlands
phone: +31-20-623 3274      +31-20-664 6416
Fax:                        +31-20-675 0389
E-mail:arthur@parcom.nl    arthur@ace.nl

From wu@cs.buffalo.edu  Sun Jul 19 10:26:21 1992
Received: from ruby.cs.Buffalo.EDU by cs.rice.edu (AA13137); Sun, 19 Jul 92 10:26:21 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA22282; Sun, 19 Jul 92 11:26:17 EDT
Date: Sun, 19 Jul 92 11:26:17 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9207191526.AA22282@ruby.cs.Buffalo.EDU>
To: gmap11@f1ibmsv2.gmd.de, hpff-forall@cs.rice.edu
Subject: Re: Clemens-August Thole's MIMD proposal
Cc: wu@cs.buffalo.edu


> The PARALLEL DO is used to specify parallel execution of the iterations of
> a block of code. Each iteration of a PARALLEL DO is an independent unit
> of work. The iterations of PARALLEL DO must be data independent. Iterations
> are data independent if the storage sequence accociated with each variable
> are array element that is assigned a value by each iteration is not referenced
> by any other iteration. 
> 

How about anti-dependence and output dependence?  For example, the 
following loop is not a correct PARALLEL DO loop since there is an 
anti-dependence:

      PARALLEL DO (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END DO

Min-You

From wu@cs.buffalo.edu  Sun Jul 19 14:34:20 1992
Received: from ruby.cs.Buffalo.EDU by cs.rice.edu (AA14609); Sun, 19 Jul 92 14:34:20 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA22625; Sun, 19 Jul 92 15:34:23 EDT
Date: Sun, 19 Jul 92 15:34:23 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9207191934.AA22625@ruby.cs.Buffalo.EDU>
To: hpff-forall@cs.rice.edu
Subject: Revised proposal for FORALL with INDEPENDENT directives
Cc: gcf@npac.syr.edu, wu@cs.buffalo.edu


\documentstyle[11pt]{article}
\textwidth 6.4in
\textheight 8in
\parskip 0.15in
\begin{document}

\topmargin 0in
\oddsidemargin 0in
\baselineskip .25in 

\begin{center}
{\Large Proposal for FORALL with INDEPENDENT Directives}

(Revised July 19, 1992)

Min-You Wu  \\
SUNY at Buffalo \\
wu@cs.buffalo.edu \\

\end{center}

This proposal is an extension of Guy Steele's INDEPENDENT proposal.
We propose a block FORALL with the directives for independent 
execution of statements.  The INDEPENDENT directives are used
in a block style.  

The block FORALL is in the form of
\begin{verbatim}
      FORALL (...) [ON (...)]
        a block of statements
      END FORALL
\end{verbatim}
where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:
\begin{verbatim}
!HPF$BEGIN INDEPENDENT
!HPF$END INDEPENDENT
\end{verbatim}
The two directives must be used in pair.  There is a synchronization 
at each of these directives.  A sub-block of statements parenthesized 
in the two directives is called an {\em asynchronous} sub-block 
or {\em independent} sub-block.  The statements that are not in 
an asynchronous sub-block are in {\em synchronized} sub-blocks
or {\em non-independent} sub-block.  The synchronized sub-block is 
the same as Guy Steele's synchronized FORALL statement, and the 
asynchronous sub-block is the same as the FORALL with the INDEPENDENT 
directive.  Thus, the block FORALL
\begin{verbatim}
      FORALL (e)
        b1
!HPF$BEGIN INDEPENDENT
        b2
!HPF$END INDEPENDENT
        b3
      END FORALL
\end{verbatim}
means roughly the same as
\begin{verbatim}
      FORALL (e)
        b1
      END FORALL
!HPF$INDEPENDENT
      FORALL (e)
        b2
      END FORALL
      FORALL (e)
        b3
      END FORALL
\end{verbatim}

Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronous sub-block are completely independent.
The INDEPENDENT directives do not change the semantics of FORALL.
It only indicates to the compiler there is no dependence and consequently,
synchronizations are not necessary.
It is users' responsibility to ensure there is no dependence
between instances in an asynchronous sub-block.
A compiler can do dependence analysis for the asynchronous sub-blocks
and issues an error message when there exists a dependence or a warning
when it finds a possible dependence.

% what is independent?
% use local data only
% access non-local data, but no dependence --- this one
% there are dependences, but can execute in any order

\noindent
{\bf What does mean "no dependence between instances"?}

It means that no true dependence, anti-dependence,
output dependence, or control dependence between instances.
Examples of these dependences are shown below:

\noindent
1) true dependence
\begin{verbatim}
      FORALL (i = 1:N)
        x(i) = ... 
        ...  = x(i+1)
      END FORALL
\end{verbatim}
Notice that dependences in FORALL are different from that in a DO loop.
If the above example was a DO loop, that would be an anti-dependence.

\noindent
2) anti-dependence:

\begin{verbatim}
      FORALL (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END FORALL
\end{verbatim}

\noindent
3) output dependence:
\begin{verbatim}
      FORALL (i = 1:N)
        x(i+1) = ... 
        x(i) = ...
      END FORALL
\end{verbatim}

\noindent
4) control dependence:

\begin{verbatim}
      FORALL (i = 1:N)
        IF (x(i+1) .EQ. 0) THEN
          x(i) = ...
        END IF
      END FORALL
\end{verbatim}
     
Independent does not imply no communication.  One instance may access 
data in the other instances, as long as it does not cause a dependence.  
The following example is an independent block:

\begin{verbatim}
      FORALL (i = 1:N)
!HPF$BEGIN INDEPENDENT
        x(i) = a(i-1)
        y(i-1) = a(i+1)
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
{\bf Statements that can appear in FORALL}

There is no restriction on the type of statements in an asynchronous 
sub-block.  That is, FORALL statements, DO loops, WHILE loops, 
WHERE-ELSEWHERE statements, IF-THEN-ELSE statements, CASE statements, 
subroutine and function calls, etc. can appear, as long as there is
no dependence.  On the other hand, the statements that can appear in a 
synchronized sub-block are restricted.  The degree of restrictions will 
be determined later.  At least the following statements are allowed:
assignment statements, FORALL statements, DO loops, WHERE statements 
(not WHERE-ELSEWHERE), IF statements (not IF-THEN-ELSE), reduction 
statements (see below), and some intrinsic functions (and elemental 
functions and subroutines).

Some examples are given below for the asynchronous sub-blocks:

\noindent
1) FORALL statement
\begin{verbatim}
      FORALL (I = 1 : N)
        A(I,0) = A(I-1,0)
!HPF$BEGIN INDEPENDENT
        FORALL (J = 1 : N)
          A(I,J) = A(I,0) + B(I-1,J-1)
          C(I,J) = A(I,J)
        END FORALL
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
2) DO loop
\begin{verbatim}
      FORALL (I = 1 : N)
!HPF$BEGIN INDEPENDENT
        DO J = 1, N 
          A(I) = A(I) * B(I)
        END DO 
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
3) WHILE loop
\begin{verbatim}
      FORALL (I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHILE (A(I) < BIG) DO
          A(I) = A(I) * B(I)
        END DO 
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
4) IF-THEN-ELSE
\begin{verbatim}
      FORALL ( I = 1 : N )
!HPF$BEGIN INDEPENDENT
        IF ( A(I) < EPS ) THEN                
          A(I) = 0.0                          
          B(I) = 0.0                          
        ELSE
          TMP(I) = B(I)
          B(I) = A(I)                       
          A(I) = TMP(I)
        END IF
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
5) WHERE
\begin{verbatim}
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
6) subroutine CALL
\begin{verbatim}
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        A(I) = C(I)
        CALL FOO(A(I))
!HPF$END INDEPENDENT
      END FORALL

      SUBROUTINE FOO(x)
      real :: x
      x = x * 10 + x 
      RETURN
      END
\end{verbatim}

\noindent
Another example for subroutine CALL:

\begin{verbatim}
      FORALL(I = 1 : N)
        A(I) = C(I)
!HPF$BEGIN INDEPENDENT
        CALL FOO(B(I), A(I-1), A(I+1))
!HPF$END INDEPENDENT
      END FORALL

      SUBROUTINE FOO(x, y, z)
      real :: x, y, z
      x = (y + z) / 2
      RETURN
      END
\end{verbatim}


\noindent
{\bf Reduction}

A reduction statement is synchronized and cannot appear in an
asynchronous sub-block.

We propose to use special operators for the reduction operations.
As an example, the following FORALL statement provides a sum 
reduction over a(i) and assigns the result to a scalar variable,
with `+=' as a sum reduction operator:
\begin{verbatim}
      FORALL (i=1:N) 
        x = (+= a(i))
      END FORALL
\end{verbatim}

We list some possible reduction operators as follows:
\begin{verbatim}
+=      Sum of values  
*=      Product of values 
&=      Logical AND 
|=      Logical OR 
^=      Logical XOR
<?=     Minimum of values 
>?=     Maximum of values 
\end{verbatim}

Here is an example for multiple reduction:

\begin{verbatim}
      FORALL (i=1:N)
        b(i/c) = (+= a(i))
      END FORALL
\end{verbatim}

Using the reduction operators, we can also provide scan functions.

\begin{verbatim}
      FORALL (i=1:N) 
        b(i) = (+= a(1:i)) 
      END FORALL
\end{verbatim}

Using reduction operators but reduction intrinsics allows reduction 
operations in the FORALL body without exiting from the FORALL.  See 
the following example:

(1) With intrinsic function:
\begin{verbatim}
      FORALL (i=1:N) 
!HPF$BEGIN INDEPENDENT
        a(i) = i
!HPF$END INDEPENDENT
      END FORALL
      b = SUM(a)
      FORALL (i=1:N) 
!HPF$BEGIN INDEPENDENT
        c(i) = a(i) + b
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

(2) With reduction operator:
\begin{verbatim}
      FORALL (i=1:N) 
!HPF$BEGIN INDEPENDENT
        a(i) = i
!HPF$END INDEPENDENT
        b = (+= a(i)) 
!HPF$BEGIN INDEPENDENT
        c(i) = a(i) + b
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
Notes for reduction:

With the reduction defined above, only limited operators can be defined.  
We do not have `MAXLOC', `MINLOC', etc.  Another solution is 
using some reduction functions similar to intrinsic functions.  
For example:

\begin{verbatim}
      FORALL (i=1:N) 
        x = SUM(a(i))
      END FORALL

      FORALL (i=1:N) 
        x = MAXLOC(a(i))
      END FORALL
\end{verbatim}

However, users may be confused with the current intrinsic function SUM 
and MAXLOC.

\vspace{.1in}
\noindent
{\bf Rationale}

1. A FORALL with a single asynchronous sub-block as shown below is 
the same as a do independent (or doall, or doeach, or parallel do, etc.).
\begin{verbatim}
      FORALL (e)
!HPF$BEGIN INDEPENDENT
        b1
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}
A FORALL without any INDEPENDENT directive is the same as a tightly 
synchronized FORALL.  We only need to define one type of parallel 
constructs including both synchronized and asynchronous blocks.  
Furthermore, combining asynchronous and synchronized FORALLs, we 
have a loosely synchronized FORALL which is more flexible for many 
loosely synchronous applications.

2. With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  The INDEPENDENT directives can act 
as barrier synchronizations.  One may suggest a smart compiler 
that can recognize dependences and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependences.  
When the compiler cannot determine whether there is a dependence, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.

\end{document}


From chk@cs.rice.edu  Mon Jul 20 09:39:44 1992
Received: from  by cs.rice.edu (AB24689); Mon, 20 Jul 92 09:39:44 CDT
Message-Id: <9207201439.AB24689@cs.rice.edu>
Date: Mon, 20 Jul 1992 09:44:07 -0600
To: arthur@parcom.nl (Arthur Veen)
From: chk@cs.rice.edu
Subject: Re: FORALL proposal
Cc: hpff-forall@cs.rice.edu

>A few remarks concerning Chuck's latest FORALL proposal
>posted 17 Jul 1992 18:08:27
>
>A - If HPF is going to extend the base language (I prefer that
>    it doesn't), I would like to have one extension rather
>    than two, i.e. EITHER the forall-stmt OR the forall-construct
>	but not both.
>	As far as I can tell, the only convenience the forall-stmt
>	gives above the forall-construct is that you can leave out 
>	"END FORALL".

A reasonable view.  I suspect that TMC people will disagree with
you, since they already have the forall-stmt syntax implemented.
I'll go with the majority opinion that I hear on this list and at the
HPFF meeting in DC.

>B - If A is adopted and the choice is made to include
>    forall-construct and to leave out forall-stmt, the proposal
>	can be further simplified by leaving out the scalar-mask-expr.
>	As far as I can tell masks do not give any expressiveness above
>	the where-stmt.

FORALL ( i = 1:n, j =1:n,  i<j )
  a(i,j) = b(i,j)
END FORALL

This can be done using WHERE, but it requires allocating a 2D integer
array.  My feeling is that programmers still care about large data
structures.

FORALL ( i=1:n, a(i,i) .ne. 0.0 )
  a(i,i) = 1/a(i,i)
END FORALL

I don't think this can be done in place using WHERE at all.  (You can do
anything using WHERE if you copy into temporary arrays.)

>C - There appears to be a contradiction in section 3.1
>    "Interpretation of the FORALL Construct". It says:
>
>	> A single statement in a {\it forall-construct} must not cause any element
>	> of the array
>	> being assigned to be assigned a value more than once.
>	> It is, however, permitted that different statements may assign to the
>	> same array element.
>
>	However, "A single statement in a {\it forall-construct}" may be
>	a inner forall-construct, which may assign to the same array
>	element more than once.
>	I think the intention is that for any level of a (possibly
>	nested) forall-construct, *different* valid combinations of
>	subscript-name values may not assign to the *same* array
>	element.

I intended it to mean that no assignment (i.e. the bottom level of the
nesting structure) could assign more than once.  I think this is
equivalent to Arthur's interpretation, and will try to improve the wording.

>	A separate question about these "must not"s and "may not"s that
>	appear in the proposal. I hope these are meant as admonishments
>	to the application programmer rather than to the compiler
>	writers. In other words that they mean that programs that do
>	not meet these requirements may produce arbitrary results.

I've tried to follow the usage in the Fortran 90 standard, where "must"
and "may" refer to standard-conforming programs (at least, that's my
understanding).  Thus, they are admonitions to the programmer, or to
the compiler writer who is forbidden to implement any extensions.

>D - A small but confusing typo. In section 3 FORALL Construct:
>
>	> Examples of the FORALL construct are:
>	>                                                    \CODE
>	> FORALL ( i = 2:n-1, j = 2:i-1 )
>                                ^
>	should be
>
>	  FORALL ( i = 2:n-1, j = 2:n-1 )
>
>	I hope

You're right, thanks for spotting that.

>--arthur
>
>                 Arthur H. Veen
>E-mail:arthur@parcom.nl    arthur@ace.nl

                                                Chuck


From zrlp09@trc.amoco.com  Mon Jul 20 10:29:47 1992
Received: from noc.msc.edu by cs.rice.edu (AA26554); Mon, 20 Jul 92 10:29:47 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA03805; Mon, 20 Jul 92 10:29:43 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA04450; Mon, 20 Jul 92 10:29:42 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA19701; Mon, 20 Jul 92 10:29:38 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA11825; Mon, 20 Jul 92 10:29:36 CDT
Received: from iverson.trc.amoco.com by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA06482; Mon, 20 Jul 92 10:29:35 CDT
Date: Mon, 20 Jul 92 10:29:35 CDT
From: zrlp09@trc.amoco.com (Rector L. Page)
Message-Id: <9207201529.AA06482@backus.trc.amoco.com>
Subject: FORALL is not a loop (and neither is DO INDEPENDENT)
Apparently-To: <hpff-forall@cs.rice.edu>

--------
Some writings on forall refer to it as a "parallel loop."  This
unfortunate term leads to confusion.  The terminology may derive
from the fact that both forall constructs and do-loops specify an
index set that parameterizes a block of statements.  In effect,
both denote a collection of statement blocks.  The collection
contains one copy of the block of statements for each index in
the index set.

         forall (i=1:n)           do i=1,n               
           . block                   . block             
           .  of                     .  of               
           .   stmts                 .   stmts           
         end forall               end do                

Here the similarity ends.  The statement blocks in a forall are
carried out independently (and simultaneously, the programmer hopes),
but the statement blocks in a do-loop are carried out in sequence.

In a forall the index set is unordered, while it is an ordered
sequence in a do-loop.  The two constructs use different delimiters
(colon and comma) to highlight the difference in meaning:
  forall (i=1:n)  specifies an index set in the usual mathematical sense;
  do i=1,n        specifies a sequence of indices.

With the looping concept out of the picture, it is not at all surprising 
that
       forall (i=1:n)
         a(i)=a(i-1)+1  ! means increment a shifted array
       end forall
while
       do i=1,n
         a(i)=a(i-1)+1   ! means to compute a running sum
       end do            ! and save the partial sums.

If all of the assignments a(i)=a(i-1)+1 in the forall are to take
place simultaneously, what value could a(i-1) denote other than the
value a(i-1) had on entry to the forall construct?  With the parallel
computation view of forall (as distinguished this from the sequence
view of looping, it is hard to imagine any meanings other than
increment-shifted-array in the forall example and running-sum
in the do-loop example.

The do-independent notation, when taken as a programming construct
rather than as a hint to the compiler, expresses the same idea
as forall, but in a restricted form.  An example restriction:
   !hpf$ independent
         do i=1,n
           a(i)=a(i-1)+1  ! is illegal because the statements
         end do           ! contain interdependencies.

When viewed as a hint to the compiler, do-independent is not so bad.
Viewed as a programming construct, it encourages confusion
between parallel computation and looping.  I oppose it on these
grounds.  Programmers trying to get the last bit of efficiency out
of their computers will try to use do-independent rather than forall
whenever they can (so the compiler can omit a few synchronization
operations).  They will spend a lot of time trying to figure out
if there are any untoward interdependencies in their indexed,
parallel blocks of statements.  They will often get it wrong.
Even when they don't get it wrong, the next programmer to
fiddle the code will.  The resulting plethora of errors will
cost a great deal more than the savings produced by avoiding
a few synchronizations.

If HPF turns out to have a parallel construct for an indexed
collection of statement blocks, I hope it uses a notation that
doesn't piggy-back on the coincidence that the do-loops happen
to specify index sequences and statement blocks.

Rex Page

From chk@cs.rice.edu  Mon Jul 20 14:30:13 1992
Received: from rice.edu by cs.rice.edu (AA03679); Mon, 20 Jul 92 14:30:13 CDT
Received: from cs.rice.edu by rice.edu (AA02438); Mon, 20 Jul 92 14:29:26 CDT
Received: from  by cs.rice.edu (AB03636); Mon, 20 Jul 92 14:29:39 CDT
Message-Id: <9207201929.AB03636@cs.rice.edu>
Date: Mon, 20 Jul 1992 14:31:42 -0600
To: zrlp09@trc.amoco.com (Rector L. Page)
From: chk@cs.rice.edu
Subject: Re: FORALL is not a loop (and neither is DO INDEPENDENT)
Cc: hpff-forall@rice.edu


>Some writings on forall refer to it as a "parallel loop."  This
>unfortunate term leads to confusion.  The terminology may derive
>from the fact that both forall constructs and do-loops specify an
>index set that parameterizes a block of statements.  In effect,
>both denote a collection of statement blocks.  The collection
>contains one copy of the block of statements for each index in
>the index set.
>
>         forall (i=1:n)           do i=1,n               
>           . block                   . block             
>           .  of                     .  of               
>           .   stmts                 .   stmts           
>         end forall               end do                
>
>Here the similarity ends.  The statement blocks in a forall are
>carried out independently (and simultaneously, the programmer hopes),
>but the statement blocks in a do-loop are carried out in sequence.

... and Rex then goes on to say a lot of things I agree with.  I've tried to
stick to "FORALL construct" and similar circumlocutions in the current
draft proposal; I'd be grateful if anybody could point out instances
where I failed.

I think, however, that we are in mild disagreement over INDEPENDENT.
 
>The do-independent notation, when taken as a programming construct
>rather than as a hint to the compiler, expresses the same idea
>as forall, but in a restricted form.  An example restriction:
>   !hpf$ independent
>         do i=1,n
>           a(i)=a(i-1)+1  ! is illegal because the statements
>         end do           ! contain interdependencies.
>When viewed as a hint to the compiler, do-independent is not so bad.
>Viewed as a programming construct, it encourages confusion
>between parallel computation and looping.  I oppose it on these
>grounds.  Programmers trying to get the last bit of efficiency out
>of their computers will try to use do-independent rather than forall
>whenever they can (so the compiler can omit a few synchronization
>operations).  They will spend a lot of time trying to figure out
>if there are any untoward interdependencies in their indexed,
>parallel blocks of statements.  They will often get it wrong.
>Even when they don't get it wrong, the next programmer to
>fiddle the code will.  The resulting plethora of errors will
>cost a great deal more than the savings produced by avoiding
>a few synchronizations.
>
>If HPF turns out to have a parallel construct for an indexed
>collection of statement blocks, I hope it uses a notation that
>doesn't piggy-back on the coincidence that the do-loops happen
>to specify index sequences and statement blocks.
>
>Rex Page

I believe that INDEPENDENT should be considered a user assertion,
not a programming construct.  I favor having the assertion, because
the users who I talk to lack faith in their compilers' dependence
analysis.  (Whether faith would be justified, or whether users
make mistakes in their analysis are questions that I refuse to
debate.)  I agree that DO INDEPENDENT should not be given the status
of new statement construct, as it is in (for example) PCF for the
same reasons as Rex gives. 

Note that defining DO INDEPENDENT as an assertion disallows some
important (or at least well-publicized) algorithms:
        Accumulations (the variable being accumulated has loop-carried
            dependences) and other recurrences
        Chaotic relaxation, and other algorithms with nondeterminate 
            execution (if the DO was really INDEPENDENT, then the 
            algorithm would be determinate)
        "Return any valid answer" (if the answer is not unique, the
            solutions create dependences)
        Explicit synchronization (how do you synchronize without
            changing shared state)
My hope is that Clemmens' group can provide a good mechanism for
expressing most of these in HPF version 2.  We have explicitly limited
FORALL and DO INDEPENDENT for the current version of HPF in order
to get faster agreement.

                                                Chuck 


From zrlp09@trc.amoco.com  Mon Jul 20 14:38:50 1992
Received: from noc.msc.edu by cs.rice.edu (AA03916); Mon, 20 Jul 92 14:38:50 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA19284; Mon, 20 Jul 92 14:38:49 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA11078; Mon, 20 Jul 92 14:38:47 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA21604; Mon, 20 Jul 92 14:38:43 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA17178; Mon, 20 Jul 92 14:38:40 CDT
Received: from iverson.trc.amoco.com by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA06972; Mon, 20 Jul 92 14:38:39 CDT
Date: Mon, 20 Jul 92 14:38:39 CDT
From: zrlp09@trc.amoco.com (Rector L. Page)
Message-Id: <9207201938.AA06972@backus.trc.amoco.com>
Subject: User-defined elemental procedures
         (Page/July20 responding to  >Merlin/July16)
Apparently-To: <hpff-forall@cs.rice.edu>

----------
> Question:  How did F8x propose to check that [elemental] procedures 
> are free of side-effects? 
> ...stuff omitted...  I assume it didn't check these things.

Essentially correct, no checking required.
No Fortran standard has required checking for errors.  A program that
violates constraints does not conform to the standard, and its
interpretation is entirely up to the processor.  Deadlock would, for
example, be a legitimate interpretation (as far as the F90 standard is
concerned) of any program failing to conform to the standard.

The sole restriction on elemental functions in the Fortran 8x proposal
was that the result of an elemental reference could not depend on the
order in which the elemental references were made.  HPF will have to
specify more restrictions than this to permit parallel evaluation of
elemental references.  (Defining a global variable, for example, might not
preclude any particular order of evaluation of elemental references as 
long as the evaluations aren't overlapped; the function might simply be
using the variable for temporary space.)  The Merlin 5 plus i/o would be
an adequate set of restrictions, I think.


> I'd be glad to avoid new syntax.  I just want some way of annotating
> a procedure to assert that it obeys my constraints, so that the
> constraints can be checked.  I'd be happy with an HPF directive like:
>
> 	!HPF$ ELEMENTAL proc-name
>
> at the end of the procedure statement or on the next line.  Is that
> an acceptable solution?

Constraints can be checked with or without an assertion of their validity.
When the compiler encounters an elemental reference (that is a reference
containing array actual arguments where the function expects scalars),
the compiler can check to make sure the function is suitable.  A model for
implementation might be for the compiler to record with each procedure the
information needed for such checks.  The compiler could then use the
information upon encountering an elemental reference.

Such information would probably be recorded anyway if the compiler were
to check for conformance to constraints on elemental functions.  For example,
consider a function that calls a subroutine that defines a global variable.
Such a function could not be called elementally, but the compiler could not
check for conformance to the constraints unless it knew, while compiling the
function, that the suboutine defined a global variable.  Therefore, the
needed information would have to be recorded with every procedure whether or
not HPF requires an ELEMENTAL assertion in functions that the programmer 
wants to invoke elementally.

In summary, it seems to me that the ELEMENTAL assertion is superfluous.
It makes things no easier for the compiler that checks for conformance,
but it increases the size of the language and complicates life for
programmers.  I do like the idea of including elemental functions in HPF,
however, because they appear to provide a facility for SPMD programming 
that covers everything one might want to do in that mode except i/o.


> Incidentally, if subroutines are an outmoded concept in Fortran 90,
> why does it introduce new intrinsic subroutines (for date, time,
> clock, random numbers, etc)?

Fortran 90 has five intrinsic subroutines: date_and_time, system_clock,
random_number, random_seed, and mvbits.  Except for mvbits, which would
have been a function had it not been inherited from a MIL STD, all of
these are properly subroutines.  The value of a function depends only
on its arguments.  None of these instrinsics have any (input) arguments.
Therefore, if they were functions, they would have to deliver the same
value at every invocation.  Time would appear to stand still.

Fortran 90's intrinsic subroutines are "packaging for side effects."
The time functions report a system state, random_number reports
an internal state (assuming it's pseudo-random--otherwise, it reports a
state of the universe), and random_seed sets an internal state.

From wu@cs.buffalo.edu  Mon Jul 20 14:39:48 1992
Received: from ruby.cs.Buffalo.EDU by cs.rice.edu (AA03944); Mon, 20 Jul 92 14:39:48 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA23554; Mon, 20 Jul 92 15:39:44 EDT
Date: Mon, 20 Jul 92 15:39:44 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9207201939.AA23554@ruby.cs.Buffalo.EDU>
To: arthur@parcom.nl, chk@cs.rice.edu
Subject: Re: FORALL proposal
Cc: hpff-forall@cs.rice.edu

> 
> >A few remarks concerning Chuck's latest FORALL proposal
> >posted 17 Jul 1992 18:08:27
> >
> >A - If HPF is going to extend the base language (I prefer that
> >    it doesn't), I would like to have one extension rather
> >    than two, i.e. EITHER the forall-stmt OR the forall-construct
> >	but not both.
> >	As far as I can tell, the only convenience the forall-stmt
> >	gives above the forall-construct is that you can leave out 
> >	"END FORALL".
> 
> A reasonable view.  I suspect that TMC people will disagree with
> you, since they already have the forall-stmt syntax implemented.
> I'll go with the majority opinion that I hear on this list and at the
> HPFF meeting in DC.
> 

I agree with Arthur.

> >B - If A is adopted and the choice is made to include
> >    forall-construct and to leave out forall-stmt, the proposal
> >	can be further simplified by leaving out the scalar-mask-expr.
> >	As far as I can tell masks do not give any expressiveness above
> >	the where-stmt.
> 
> FORALL ( i = 1:n, j =1:n,  i<j )
>   a(i,j) = b(i,j)
> END FORALL
> 
> This can be done using WHERE, but it requires allocating a 2D integer
> array.  My feeling is that programmers still care about large data
> structures.
> 
> FORALL ( i=1:n, a(i,i) .ne. 0.0 )
>   a(i,i) = 1/a(i,i)
> END FORALL
> 
> I don't think this can be done in place using WHERE at all.  (You can do
> anything using WHERE if you copy into temporary arrays.)
> 
>                                                 Chuck
> 

However, that can be done with IF:

 FORALL ( i = 1:n, j = 1:n )
   IF (i<j) THEN 
     a(i,j) = b(i,j)
   END IF
 END FORALL

 FORALL ( i = 1:n )
   IF (a(i,i) .ne. 0.0) THEN 
     a(i,i) = 1/a(i,i)
   END IF
 END FORALL


Min-You

From chk@cs.rice.edu  Tue Jul 21 07:44:22 1992
Received: from rice.edu by cs.rice.edu (AA16843); Tue, 21 Jul 92 07:44:22 CDT
Received: from cs.rice.edu by rice.edu (AA07692); Tue, 21 Jul 92 07:43:38 CDT
Received: from DialupEudora (charon.rice.edu) by cs.rice.edu (AA16829); Tue, 21 Jul 92 07:43:46 CDT
Message-Id: <9207211243.AA16829@cs.rice.edu>
Date: Tue, 21 Jul 1992 07:45:52 -0600
To: wu@cs.buffalo.edu (Min-You Wu)
From: chk@cs.rice.edu
Subject: Re: FORALL proposal
Cc: hpff-forall@rice.edu


>> >B - If A is adopted and the choice is made to include
>> >    forall-construct and to leave out forall-stmt, the proposal
>> >	can be further simplified by leaving out the scalar-mask-expr.
>> >	As far as I can tell masks do not give any expressiveness above
>> >	the where-stmt.
>> 
>> FORALL ( i = 1:n, j =1:n,  i<j )
>>   a(i,j) = b(i,j)
>> END FORALL
>> 
>> This can be done using WHERE, but it requires allocating a 2D integer
>> array.  My feeling is that programmers still care about large data
>> structures.
>> 
>> FORALL ( i=1:n, a(i,i) .ne. 0.0 )
>>   a(i,i) = 1/a(i,i)
>> END FORALL
>> 
>> I don't think this can be done in place using WHERE at all.  (You can do
>> anything using WHERE if you copy into temporary arrays.)
>> 
>>                                                 Chuck
>> 
>
>However, that can be done with IF:
>
> FORALL ( i = 1:n, j = 1:n )
>   IF (i<j) THEN 
>     a(i,j) = b(i,j)
>   END IF
> END FORALL
>
> FORALL ( i = 1:n )
>   IF (a(i,i) .ne. 0.0) THEN 
>     a(i,i) = 1/a(i,i)
>   END IF
> END FORALL
>
>
>Min-You

But IF is not permitted in FORALL in the current draft.  Are you
proposing adding it?

                                                Chuck


From wu@cs.buffalo.edu  Tue Jul 21 09:22:23 1992
Received: from rice.edu by cs.rice.edu (AA18462); Tue, 21 Jul 92 09:22:23 CDT
Received: from ruby.cs.Buffalo.EDU by rice.edu (AA08292); Tue, 21 Jul 92 09:21:27 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA23983; Tue, 21 Jul 92 10:22:00 EDT
Date: Tue, 21 Jul 92 10:22:00 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9207211422.AA23983@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, wu@cs.buffalo.edu
Subject: Re: FORALL proposal
Cc: hpff-forall@rice.edu, wu@cs.buffalo.edu


> >> >B - If A is adopted and the choice is made to include
> >> >    forall-construct and to leave out forall-stmt, the proposal
> >> >	can be further simplified by leaving out the scalar-mask-expr.
> >> >	As far as I can tell masks do not give any expressiveness above
> >> >	the where-stmt.
> >> 
> >> FORALL ( i = 1:n, j =1:n,  i<j )
> >>   a(i,j) = b(i,j)
> >> END FORALL
> >> 
> >> This can be done using WHERE, but it requires allocating a 2D integer
> >> array.  My feeling is that programmers still care about large data
> >> structures.
> >> 
> >> FORALL ( i=1:n, a(i,i) .ne. 0.0 )
> >>   a(i,i) = 1/a(i,i)
> >> END FORALL
> >> 
> >> I don't think this can be done in place using WHERE at all.  (You can do
> >> anything using WHERE if you copy into temporary arrays.)
> >> 
> >>                                                 Chuck
> >> 
> >
> >However, that can be done with IF:
> >
> > FORALL ( i = 1:n, j = 1:n )
> >   IF (i<j) THEN 
> >     a(i,j) = b(i,j)
> >   END IF
> > END FORALL
> >
> > FORALL ( i = 1:n )
> >   IF (a(i,i) .ne. 0.0) THEN 
> >     a(i,i) = 1/a(i,i)
> >   END IF
> > END FORALL
> >
> >
> >Min-You
> 
> But IF is not permitted in FORALL in the current draft.  Are you
> proposing adding it?
> 
>                                                 Chuck
> 

Yes, I do.  In my INDEPENDENT proposal, I proposed allowing IF-THEN-ELSE
in the independent block, and IF-THEN in the non-independent block.
The reason of allowing only IF-THEN in the non-independent block is that
I do not want dependence carrying out from the THEN part to the ELSE part.

Min-You

From chk@cs.rice.edu  Tue Jul 21 09:53:09 1992
Received: from rice.edu by cs.rice.edu (AA19533); Tue, 21 Jul 92 09:53:09 CDT
Received: from cs.rice.edu by rice.edu (AA08530); Tue, 21 Jul 92 09:50:56 CDT
Received: from DialupEudora (charon.rice.edu) by cs.rice.edu (AA19214); Tue, 21 Jul 92 09:40:52 CDT
Message-Id: <9207211440.AA19214@cs.rice.edu>
Date: Tue, 21 Jul 1992 09:44:26 -0600
To: wu@cs.buffalo.edu (Min-You Wu)
From: chk@cs.rice.edu
Subject: Re: FORALL proposal
Cc: hpff-forall@rice.edu

>> >
>> > FORALL ( i = 1:n )
>> >   IF (a(i,i) .ne. 0.0) THEN 
>> >     a(i,i) = 1/a(i,i)
>> >   END IF
>> > END FORALL
>> >
>> >
>> >Min-You
>> 
>> But IF is not permitted in FORALL in the current draft.  Are you
>> proposing adding it?
>> 
>>                                                 Chuck
>> 
>
>Yes, I do.  In my INDEPENDENT proposal, I proposed allowing IF-THEN-ELSE
>in the independent block, and IF-THEN in the non-independent block.
>The reason of allowing only IF-THEN in the non-independent block is that
>I do not want dependence carrying out from the THEN part to the ELSE part.
>
>Min-You

My impression was that consensus was to only allow assignment, array
assignment, WHERE, and FORALL nested within FORALL.  Certainly that was the
way the straw poll went a few weeks back (in fact, the poll might have
thrown out WHERE as well).  Does anybody want to support Min-You's
proposal?

I am firmly opposed to allowing different constructs inside independent
blocks as opposed to outside them.  This implies that a directive is
changing the syntax of the language (a rather more visible problem than
changing the semantics, which we're trying to avoid).

                                                Chuck


From shapiro@think.com  Tue Jul 21 10:14:42 1992
Received: from mail.think.com by cs.rice.edu (AA19842); Tue, 21 Jul 92 10:14:42 CDT
Return-Path: <shapiro@Think.COM>
Received: from Django.Think.COM by mail.think.com; Tue, 21 Jul 92 11:14:35 -0400
From: Richard Shapiro <shapiro@think.com>
Received: by django.think.com (4.1/Think-1.2)
	id AA17271; Tue, 21 Jul 92 11:14:35 EDT
Date: Tue, 21 Jul 92 11:14:35 EDT
Message-Id: <9207211514.AA17271@django.think.com>
To: karp@hplms2.hpl.hp.com
Cc: hpff-forall@cs.rice.edu
Subject: Your example

   Date: Mon, 20 Jul 92 16:22:58 -0700
   From: Alan Karp <karp@hplahk.hpl.hp.com>
   Reply-To: "Alan Karp" <karp@hplms2.hpl.hp.com>

   >From shapiro@Think.COM Mon Jul 20 15:19:25 1992
   Return-Path: <shapiro@Think.COM>
   From: Richard Shapiro <shapiro@Think.COM>
   Date: Mon, 20 Jul 92 16:55:58 EDT
   To: karp@hplms2.hpl.hp.com
   In-Reply-To: Alan Karp's message of Mon, 20 Jul 92 12:03:50 -0700 <9207201903.AA05843@hplahk.hpl.hp.com>
   Subject: FORALL example


      I would like to see your example of a FORALL loop that can not be
      expressed with a DO loop plus directives.

   Let me first state that since there is a way to rewrite ANY forall contruct
   (it's NOT a loop!) into a set of DO loops and array sized temporaries, you
   could win this argument "by definition".  Since I assume that's not what
   you mean, let me propose some "rules" I consider reasonable (and we can
   discuss this as well).

	   1) Any rewrite can't introduce a temporary larger than a scalar.

   ----> I disagree. The compiler will generate large temporaries so I
   ----> can, too.

Not necessarily true. The compiler can stripmine out some temporaries I can't.

   This is a clarity/optimization issue. If I have to create a temporary or
   two for each FORALL construct, I might wind up with 30 or 40 temporaries
   cluttering up a FORALL-laden subroutine.

   ----> I agree, but it is a matter of balance. Introducing one or two
   ----> temporaries per loop isn't a problem; introducing 100 is. We
   ----> need to see what real code looks like and if it can be rewritten
   ----> conveniently. 

I showed you an example from real code. I'm an applications developer; not
a compiler writer, and I know what I'd have to do if I didn't have FORALL.

	   2) The directives can't change the semantics of the DO loop.
	      (i.e. the DO loop must have the desired result without the
	      directive present.)

   This is a property we'd like to preserve for directives. 

   ----> This goes without saying unless I am trying to do an
   ----> asynchronous calculation. If I tell the compiler to ignore a
   ----> true dependence, then I clearly don't care if the meaning
   ----> changes. This rule goes under the category of giving enough
   ----> rope. 

	   3) I'm not willing to generate more than one DO loop nest.

   If I'm going to generate a bunch of DO loops, then I might as well stick
   with F77.  

   ----> Why not, unless you need to generate dozens? My point is that it
   ----> is better to stick to F77 then add something dangerous like
   ----> FORALL.

Here is a major point  of disagreement: I don't accept that single stament
FORALL is dangerous. I can see your point about block FORALL, that unless I
see the FORALL part, the statement A(I) = A(I+1) has a different meaning.
However, I don't believe this argument applies to single statement FORALL,
since the FORALL is part of the same statement.

   That said, here is an example:

	   INTEGER,DIMENSION(M,N) :: INDEX1,INDEX2,INDEX3,INDEX4
	   REAL A(M,N)

	   FORALL (I=1:M,J=1:N) A(INDEX3(I,J),INDEX4(I,J)) =
	  & SUM(A,MASK=(I.EQ.INDEX1) .AND. (J.EQ.INDEX2))

   This example may look contrived, but it actually represents something I'd
   like to express in a 3-D finite element code I'm writing.  I want to do the
   equivalent of a histogramming (or SEND_WITH_ADD) operation on the array A.

   ----> I don't get the histogram analogy although I could generate one
   ----> by changing your tests. I do agree that this example is a good
   ----> one. 

   This can be re-written:

	   INTEGER INDEX1(M,N),INDEX2(M,N),INDEX3(M,N),INDEX4(M,N)
	   REAL A(M,N),TEMP(M,N)

	   DO I = 1,M
	     DO J = 1,N
	       TEMP(M,N) = SUM(A,MASK=(I.EQ.INDEX1) .AND. (J.EQ.INDEX2))
	     ENDDO
	   ENDDO
	   DO I = 1,M
	     DO J = 1,N
	       A(INDEX3(I,J),INDEX4(I,J)) = TEMP(I,J)
	     ENDDO
	   ENDDO


   I have to do this several times, and the "A" arrays are all different
   sizes. This means I need to explicitly allocate several TEMPs. If I'm tight on
   memory as it is, these extra temps are going to hurt me.

   ----> Sorry, but I disagree. The compiler will have to generate a temp
   ----> as large as the result, so you might still blow memory.

   Notice that I've taken 1 line of code and expanded it to 10. Notice also
   that I'v now put an order of evaluation on TEMP. I could remove this with a
   directive, but what do I gain?  The advantage of the FORALL construct is
   that it provides a means to specify that there is no order, and the RHS is
   to be fully evaluated for all indices befor I do the assignment.  It is a
   mistake to think of a FORALL as a loop, just because it can be implemented
   that way.  Does this help clarify anything?

   ----> I concede that you need more lines of code without FORALL. In
   ----> fact, I think the example I gave is more onerous to the
   ----> programmer. However, I don't see that you have imposed any
   ----> ordering on the evaluation of TEMP since the compiler is free to
   ----> execute the loop iterations in any order. I would also write a
   ----> slightly different routine.

   ----> INTEGER INDEX1(M,N),INDEX2(M,N),INDEX3(M,N),INDEX4(M,N)
   ----> REAL A(M,N)
   ----> ALLOCATABLE TEMP(:,:)
   ---->
   ----> ALLOCATE (TEMP(M,N))  ! Takes care of different shapes for each temp
   ----> TEMP = A
   ----> DO I = 1,M
   ---->    DO J = 1, N
   ---->       A(INDEX3(I,J),INDEX4(I,J)) =
   ---->*         SUM(TEMP,MASK=(I.EQ.INDEX1).AND.(J.EQ.INDEX2))
   ---->    ENDDO
   ----> ENDDO
   ----> DEALLOCATE (TEMP)     ! See, I only need 1 temp
   ---->
   ----> This code does manually what the compiler would do with FORALL.

Not true. You have only given one possible implementation of this. This
particular exmaple is a scatter-add, and it can be done without creating
any temporaries if one has the appropriate hardware.

   ----> The difference is that I can see that I want the old values of
   ----> A. With FORALL I can interpret your statement as meaning "Use the
   ----> current values of the elements of A, not the value on loop entry".
   ----> 
   ----> What do you say? Should we forward this correspondence to the
   ----> Forum?


Yes, let's forward this conversation to the forum (which I have done).

	   Rich Shapiro


From wu@cs.buffalo.edu  Tue Jul 21 10:53:06 1992
Received: from rice.edu by cs.rice.edu (AA21047); Tue, 21 Jul 92 10:53:06 CDT
Received: from ruby.cs.Buffalo.EDU by rice.edu (AA09029); Tue, 21 Jul 92 10:52:21 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA24248; Tue, 21 Jul 92 11:52:56 EDT
Date: Tue, 21 Jul 92 11:52:56 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9207211552.AA24248@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, wu@cs.buffalo.edu
Subject: Re: FORALL proposal
Cc: hpff-forall@rice.edu, wu@cs.buffalo.edu

> 
> >> >
> >> > FORALL ( i = 1:n )
> >> >   IF (a(i,i) .ne. 0.0) THEN 
> >> >     a(i,i) = 1/a(i,i)
> >> >   END IF
> >> > END FORALL
> >> >
> >> >
> >> >Min-You
> >> 
> >> But IF is not permitted in FORALL in the current draft.  Are you
> >> proposing adding it?
> >> 
> >>                                                 Chuck
> >> 
> >
> >Yes, I do.  In my INDEPENDENT proposal, I proposed allowing IF-THEN-ELSE
> >in the independent block, and IF-THEN in the non-independent block.
> >The reason of allowing only IF-THEN in the non-independent block is that
> >I do not want dependence carrying out from the THEN part to the ELSE part.
> >
> >Min-You
> 
> My impression was that consensus was to only allow assignment, array
> assignment, WHERE, and FORALL nested within FORALL.  Certainly that was the
> way the straw poll went a few weeks back (in fact, the poll might have
> thrown out WHERE as well).  Does anybody want to support Min-You's
> proposal?
> 
> I am firmly opposed to allowing different constructs inside independent
> blocks as opposed to outside them.  This implies that a directive is
> changing the syntax of the language (a rather more visible problem than
> changing the semantics, which we're trying to avoid).
> 
>                                                 Chuck


The INDEPENDENT directive could have two features:

1). Let the compiler know that there is no dependence in the independent
block and no synchronization needed.

2). Allowing almost any Fortran construct in independent blocks.  That
will give users flexibility to write whatever they want for their application
problems.

I'd like to know if we want both 1) and 2) , or to limit INDEPENDENT 
directives to 1) ?  And if we want both features, should we use directives 
or change it to something else?

Min-You

From @eros.uknet.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Tue Jul 21 14:49:02 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA07180); Tue, 21 Jul 92 14:49:02 CDT
Via: uk.ac.uknet-relay; Tue, 21 Jul 1992 20:48:23 +0100
Received: from ecs.soton.ac.uk by eros.uknet.ac.uk via JANET with NIFTP (PP) 
          id <11729-0@eros.uknet.ac.uk>; Tue, 21 Jul 1992 20:24:35 +0100
Via: camra.ecs.soton.ac.uk; Tue, 21 Jul 92 20:14:58 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Tue, 21 Jul 92 20:20:48 BST
Date: Tue, 21 Jul 92 20:17:56 BST
Message-Id: <1783.9207211917@bacchus.ecs.soton.ac.uk>
To: chk@cs.rice.edu, loveman@ftn90.enet.dec.com
Subject: Comments on FORALL proposal
Cc: hpff-forall@cs.rice.edu

Here are some comments and questions on David and Chuck's 
FORALL proposal.

Let me start with an obvious and uncontroversial observation: 
that a FORALL statement without a mask is like a generalised array 
assignment, and a FORALL with a mask is like a generalised masked
array assignment (i.e. WHERE statement).  Every array assignment, 
masked or not, can be written as a FORALL, and (I would maintain) 
sufficiently restricted FORALL statements should be expressible as 
array assignments or WHERE statements.  I think it's a good idea to 
keep this duality in mind and try to maintain it wherever possible.  
I mention this as it underlies some of my comments.


(1)   With Chuck's limitations on the contents of a FORALL construct,
this correspondence extends to FORALL constructs, which are equivalent
to a sequence of generalised array assignments or a WHERE construct.
I think that's a good reason for the limitations!

Incidentally, this correspondence leads to my first observation:
since Fortran 90 has a WHERE - ELSEWHERE construct, perhaps the 
FORALL construct should be extended to a FORALL - ELSEFORALL construct.  
(Of course, the ELSE FORALL part is only relevant if the FORALL has a mask; 
if not, the ELSE FORALL isn't executed).


(2)  The only restrictions I can see on the use of 'subscript-name's
in the LHS of a forall assignment are that every subscript name must 
be referenced, and that no array element must be assigned a value more 
than once.  Do you really want to keep it this general?

For example, can more than one subscript name appear in a single 
subscript expression?  E.g.:

	FORALL (i=1:n:2, j=0:1)  a(i+j) = ...

Can a subscript name appear in a subscript triplet?  E.g.:

	FORALL (i=1:n:2)  a(i:i+1) = ...

(Perhaps the correspondence with array assignments suggests that 
no more than one subscript name should appear in each subscript expression 
of the assignment variable, and that subscript names should not appear in a 
subscript triplet).

I assume that a subscript name can be used in more than one dimension
of the assignment variable (e.g. FORALL (i=1:n) b(i,i) ...), as you give 
an example of this, but perhaps it should be explicitly stated.


(3)  The use of the WHERE statement and construct within a FORALL
seems redundant, as the FORALL construct is already a generalised 
WHERE construct!  Everything you can express with an embedded WHERE 
can be expressed without it by scalarising the sectional dimensions 
and absorbing the mask in the forall-stmt.  My objection is that it
opens up multiple ways of expressing the same thing (which is already 
a big flaw with Fortran).  E.g. if A is 2 dimensional,

	WHERE (a > 0)  a = ...

can be written as:

	FORALL (i=1:n)
	  WHERE (a(i,:) > 0)  a(i,:) = ...
or:

	FORALL (i=1:n, j=1:n, a(i,j)>0)  a(i,j) = ...

or even as:

	FORALL (i=1:n, j=1:n)
	  WHERE (a(i:i, j:j) > 0)  a(i:i, j:j) = ...

etc.

In contrast to Arthur Veen, I'd advocate dropping the nested WHERE 
stmt and construct in FORALL in favour of the scalar mask expression in the 
forall-stmt, as the latter can express more general masks (as you've already
said in your reply).

I suppose an advantage with having a WHERE construct within a FORALL 
construct is that you can then have an ELSEWHERE part.  However, this
would be unnecessary if the FORALL construct had an optional ELSE FORALL, 
as I've already suggested.

A minor aesthetic argument against WHERE in FORALL is that Fortran 90 
doesn't allow nested WHERE constructs (for whatever reason), 
so it seems inconsistent to allow them to be nested within FORALL, 
which effectively amounts to the same thing.

Also, WHERE introduces a small inconsistency in FORALLs, namely, that
a normal 'forall-assignment' can be array valued and can have any shape,
but within the WHERE the shape is no longer free--it must conform 
with that of the WHERE mask expression.


(4) In a similar vein, nested FORALL's seem redundant.  It appears
that the main reason for using them is to obtain non-rectangular
index domains.  If so, why not just allow this to be achieved in a 
single 'forall-triplet-spec-list' and have done with it (i.e. allow
each forall-triplet to refer to previous subscript names, as
permitted in some other proposals).  Are there any advantages
in using nested FORALLs to achieve this effect?

Note that I'm not strongly against nested WHERE and FORALL---it's
just my gut reaction that they're superfluous, and I suspect it may 
be the reaction of users too.


Far more important is the consideration of how functions are handled 
within FORALL.  Basically, your proposal imposes no more constraints 
than already exist in Fortran 90, namely that there should be no side 
effects that affect the evaluation of the rest of the assignment 
(here extended to cover forall-assignment).
I can see the obvious attraction of this approach, but I believe it's 
inadequate for a number of reasons:


(i) You allow arbitrary access (read and write) to distributed 
global (i.e. common block) data, which in the FORALL context requires 
demand-driven communication for it's implementation.  This probably
poses no problems on shared memory architectures, but would require 
considerable software support on many distributed-memory message-passing 
platforms, with a big performance overhead.  I don't think this
requirement appears anywhere else in HPF, and I'm not convinced
that all vendors/implementors would want to support it (as it wouldn't
be High Performance).  If not, HPF programs would be *non-portable*.

(At Southampton someone has implemented such a system for transputers.
All read/write accesses to such data go via a server running on each
node, and all such data are stored in a special region accessible
to the server, not in the data segment of the user process.
Without sophisticated inter-procedural analysis the compiler may have to
assume that *all* common block data must be treated this way.
I'm sure demand-driven communication won't be a problem in the future, 
when hardware support for it will probably be the norm, but it may be 
a problem now.  Maybe this a subject for a straw poll of vendors/implementors?)


(ii) Your constraints permit non-determinism (e.g. multiple writes 
to the same global memory location, provided it's not read within 
the same assignment; non-deterministic I/O).
However, it seems that Fortran 90 tries assiduously to avoid 
non-determinism in its array syntax (right now, I can't think of any 
way it can arise via array syntax in standard-conforming programs---but 
I may have overlooked something!).  Also, apart from function calls, you 
extend this principle to FORALLs by not allowing multiple writes to the 
same element of a forall-assignment variable.


(i) & (ii) raise the spectres of non-portability, deadlock and non-determinism,
which appear nowhere else in HPF (as far as I can see).
For these reasons I think functions in FORALL shouldn't be allowed
to perform I/O, and access to global data should be restricted to read-only
access of non-distributed data.  (In fact, perhaps they shouldn't access
global data at all---it can always be passed-in as an argument).


(iii) I'd like to stick to my guns on the proposal that such functions
should be denoted by a directive like:

	!HPF$ ELEMENTAL function-name

appearing in the function's interface and definition.
This would greatly simplify the job of checking these functions, as well
as making their characteristics and purpose obvious to programmers.

The simplification of checking resulting from this directive is fairly
obvious.  With the directive, checking can be done locally when the
function is compiled; if it calls other functions, they too must be 
elemental, which is established by the presence or absence of the directive 
in the function's interface body.  Without the directive, it appears 
that a preliminary pass thought the whole input program must be performed 
to establish whether each function is or is not elemental, with backpatching 
required to fill in information when a function calls other functions, 
before the usage of any function in a FORALL can be checked.  Also, what 
about separate compilation?

I believe that checking is particularly desirable in the case of elemental
functions, for incorrect usage could result in deadlock, which can be 
very difficult for the user to track down.

(iv)  Finally, note that function calls in FORALL *are* elemental by their
very nature.  Since many simple FORALLs *without* function calls can
be transformed into array assignments, it seems inconsistent not to be
able to do the same when a function is involved, e.g.:

	REAL a(n), b(n)
	FORALL (i=1:n) a(i) = func (b(i))

should be transformable into:

	a = func (b)

Therefore I think that this usage should be allowed in HPF.
(Of course, an explicit interface must be provided to allow it).  
As Rex Page points out, this usage could be converted to standard 
Fortran 90 by overloading the function name with versions for every 
array rank 0-7 (indeed, it could easily be done automatically
by a source-source translator).  However, if this usage is allowed,
I think it would be less perplexing to the programmer if functions 
used in this way are distinguished by a directive, to indicate that 
they're not meant to be standard Fortran 90.


One final point -- I hope you have a good meeting later this week!

                    Regards,
                         John Merlin.

From karp@hplahk.hpl.hp.com  Tue Jul 21 16:18:58 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA10055); Tue, 21 Jul 92 16:18:58 CDT
Received: from hplms2.hpl.hp.com by erato.cs.rice.edu (AA19222); Tue, 21 Jul 92 16:18:47 CDT
Received: from hplahk.hpl.hp.com by hplms2.hpl.hp.com with SMTP
	(16.5/15.5+IOS 3.20) id AA26032; Tue, 21 Jul 92 14:18:36 -0700
Received: by hplahk.hpl.hp.com
	(16.6/15.5+IOS 3.14) id AA06301; Tue, 21 Jul 92 14:18:29 -0700
Date: Tue, 21 Jul 92 14:18:29 -0700
From: Alan Karp <karp@hplahk.hpl.hp.com>
Message-Id: <9207212118.AA06301@hplahk.hpl.hp.com>
To: hpff-forall@erato.cs.rice.edu
Reply-To: "Alan Karp" <karp@hplms2.hpl.hp.com>
Subject: [shapiro@Think.COM: Your example]

                      ...

	   1) Any rewrite can't introduce a temporary larger than a scalar.

   ----> I disagree. The compiler will generate large temporaries so I
   ----> can, too.

Not necessarily true. The compiler can stripmine out some temporaries I can't.

----> Probably, but it can optimize away some of them and stripmine
----> some others. It has been my experience that the programmer knows
----> how to stripmine and will when memory or performance becomes an
----> issue. Of course, it is always better to do it automatically.

   This is a clarity/optimization issue. If I have to create a temporary or
   two for each FORALL construct, I might wind up with 30 or 40 temporaries
   cluttering up a FORALL-laden subroutine.

   ----> I agree, but it is a matter of balance. Introducing one or two
   ----> temporaries per loop isn't a problem; introducing 100 is. We
   ----> need to see what real code looks like and if it can be rewritten
   ----> conveniently. 

I showed you an example from real code. I'm an applications developer; not
a compiler writer, and I know what I'd have to do if I didn't have FORALL.

----> I, too, am an applications guy. However, in my previous life in
----> IBM, I did a lot of code porting and benchmarking of other
----> peoples' codes. Perhaps this experience has made me more
----> sensitive to issues of semantic clarity.

                     ...

	   3) I'm not willing to generate more than one DO loop nest.

   If I'm going to generate a bunch of DO loops, then I might as well stick
   with F77.  

   ----> Why not, unless you need to generate dozens? My point is that it
   ----> is better to stick to F77 then add something dangerous like
   ----> FORALL.

Here is a major point  of disagreement: I don't accept that single stament
FORALL is dangerous. I can see your point about block FORALL, that unless I
see the FORALL part, the statement A(I) = A(I+1) has a different meaning.
However, I don't believe this argument applies to single statement FORALL,
since the FORALL is part of the same statement.

----> Single statement FORALL is not as bad. Agreed. However, it is
----> still a little dangerous and, I believe, quite unnecessary
----> except for notational convenience. Whether or not its dangers
----> and its deviation from the F90 standard warrant its inclusion
----> will depend on how convenient. Certainly, the case you sent me
----> does not warrant it. Others, such as the one I gave in my first
----> note might.
---->
----> I still think that the one liner is dangerous. A programmer
----> might be surpised at the answers when changing
---->
----> FORALL(I=2:N)A(I)=A(I)+1.0 to FORALL(I=2:N)A(I)=A(I-1)+1.0
---->
----> Using conventional DO loops might not perform well, but it will
----> give the expected result. There might be no excuse for not
----> noticing that the statement is under a FORALL, but I guarantee
----> that I will do it more than once.

   That said, here is an example:

                       ...

   ----> This code does manually what the compiler would do with FORALL.

Not true. You have only given one possible implementation of this. This
particular exmaple is a scatter-add, and it can be done without creating
any temporaries if one has the appropriate hardware.

----> If one has the proper hardware, I would expect the compiler
----> optimization to eliminate the temp array. I could put in a
----> directive to tell the compiler that the array is a temp, but I
----> would hope the compiler could detect this use. (What about it,
----> compiler people?)

   ----> The difference is that I can see that I want the old values of
   ----> A. With FORALL I can interpret your statement as meaning "Use the
   ----> current values of the elements of A, not the value on loop entry".
   ----> 
   ----> What do you say? Should we forward this correspondence to the
   ----> Forum?


Yes, let's forward this conversation to the forum (which I have done).

----> I would like to get these issues in front of a wider audience.
----> What do you think? Someone on this forum will almost certainly
----> end up as referee, so let's hear it. Any objections?  I would
----> prefer to have a point/counterpoint. We could do it as two
----> separate papers or as one paper. Interested, Rich? The paper
----> could be expanded into one for a general audience by including
----> David Loveman's overview as an introduction. How about it,
----> David? Where to you think it should go - CACM, Fortran Journal,
----> or ???

	   Rich Shapiro

----> Alan Karp


From @eros.uknet.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Wed Jul 22 09:57:35 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA21565); Wed, 22 Jul 92 09:57:35 CDT
Via: uk.ac.uknet-relay; Wed, 22 Jul 1992 15:56:53 +0100
Received: from ecs.soton.ac.uk by eros.uknet.ac.uk via JANET with NIFTP (PP) 
          id <4662-0@eros.uknet.ac.uk>; Wed, 22 Jul 1992 15:52:42 +0100
Via: camra.ecs.soton.ac.uk; Wed, 22 Jul 92 15:48:02 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Wed, 22 Jul 92 15:53:56 BST
Date: Wed, 22 Jul 92 15:51:03 BST
Message-Id: <1931.9207221451@bacchus.ecs.soton.ac.uk>
To: arthur@parcom.nl, hpff-forall@cs.rice.edu
Subject: Another tiny comment on the FORALL proposal

A small point that I meant to mention in my message 'Comments on 
FORALL' yesterday, but forgot...

Arthur Veen (arthur@nl.parcom Sun Jul 19 14:15:11 1992) remarks that:
> 
> A - If HPF is going to extend the base language (I prefer that
>     it doesn't), I would like to have one extension rather
>     than two, i.e. EITHER the forall-stmt OR the forall-construct
> 	but not both.
> 	As far as I can tell, the only convenience the forall-stmt
> 	gives above the forall-construct is that you can leave out 
> 	"END FORALL".

I think that the introdunction of both a FORALL statement and 
construct is justified (indeed mandated) by the fact that 
Fortran 90 already has both a WHERE statement and construct.
(This follows from my thesis that FORALL is a kind of generalised 
WHERE, and should be designed to correspond as closely as possible 
with existing F90 array syntax).

Note also that Fortran has both an IF statement and construct.  
Therefore, it would be un-Fortranlike not to have this kind of
redundancy in a new construct! :-))

As I said, just a small point...

                         John Merlin.

From wu@cs.buffalo.edu  Sat Jul 25 11:04:47 1992
Received: from ruby.cs.Buffalo.EDU by cs.rice.edu (AA23359); Sat, 25 Jul 92 11:04:47 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA25714; Sat, 25 Jul 92 12:04:54 EDT
Date: Sat, 25 Jul 92 12:04:54 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9207251604.AA25714@ruby.cs.Buffalo.EDU>
To: hpff-forall@cs.rice.edu
Subject: MIMD and subroutine call from FORALL
Cc: wu@cs.buffalo.edu


1. A way to obtain MIMD parallelism and loosely synchronous constructs
is to call ELEMENTAL functions (subroutines) from block Forall:

      FORALL (I = 1:N)
        CALL SUBROUTINE1 
        CALL SUBROUTINE2 
        CALL SUBROUTINE3 
      END FORALL

This way, execution of subroutines is asynchronous and after each
call there is a synchronization.  As proposed by John Merlin, MIMD
parallelism can be obtained by means of branches within an elemental
function (subroutine).  However, 

2. calling functions or subroutines from FORALL itself is not well
justified yet.  As an example, there is a problem with subroutine 
(function) calls.  It can be illustrated by the following example:

      FOO (a, b, c, d)
        REAL a, b, c, d
        a = ...
        d = c + d
        RETURN
      END

This is a perfect elemental subroutine, and if it is called by

      CALL FOO (A(I),B(I),C(I),D(I))

there is with no problem.  However, if it is called by

      CALL FOO (A(I),A(I-1),A(I+1),D(I))

the result becomes nondeterminate.  I have asked some people
at last HPFF meeting the question and didn't get a satisfactory
solution.  Using IN, OUT, or INOUT cannot solve the problem.
Maybe we have to enforce more restrictions.  What restrictions
are proper?

Min-You

From wu@cs.buffalo.edu  Sat Jul 25 11:21:17 1992
Received: from ruby.cs.Buffalo.EDU by cs.rice.edu (AA23459); Sat, 25 Jul 92 11:21:17 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA25776; Sat, 25 Jul 92 12:21:25 EDT
Date: Sat, 25 Jul 92 12:21:25 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9207251621.AA25776@ruby.cs.Buffalo.EDU>
To: hpff-forall@cs.rice.edu
Subject: revised FORALL proposal with INDEPENDENT directives
Cc: wu@cs.buffalo.edu

\documentstyle[11pt]{article}
\textwidth 6.4in
\textheight 8in
\parskip 0.15in
\begin{document}

\topmargin 0in
\oddsidemargin 0in
\baselineskip .25in 

\begin{center}
{\Large Proposal for FORALL with INDEPENDENT Directives}

(Revised July 25, 1992)

Min-You Wu  \\
SUNY at Buffalo \\
wu@cs.buffalo.edu \\

\end{center}

This proposal is an extension of Guy Steele's INDEPENDENT proposal.
We propose a block FORALL with the directives for independent 
execution of statements.  The INDEPENDENT directives are used
in a block style.  

The block FORALL is in the form of
\begin{verbatim}
      FORALL (...) [ON (...)]
        a block of statements
      END FORALL
\end{verbatim}
where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:
\begin{verbatim}
!HPF$BEGIN INDEPENDENT
!HPF$END INDEPENDENT
\end{verbatim}
The two directives must be used in pair.  There is a synchronization 
at each of these directives.  A sub-block of statements parenthesized 
in the two directives is called an {\em asynchronous} sub-block 
or {\em independent} sub-block.  The statements that are not in 
an asynchronous sub-block are in {\em synchronized} sub-blocks
or {\em non-independent} sub-block.  The synchronized sub-block is 
the same as Guy Steele's synchronized FORALL statement, and the 
asynchronous sub-block is the same as the FORALL with the INDEPENDENT 
directive.  Thus, the block FORALL
\begin{verbatim}
      FORALL (e)
        b1
!HPF$BEGIN INDEPENDENT
        b2
!HPF$END INDEPENDENT
        b3
      END FORALL
\end{verbatim}
means roughly the same as
\begin{verbatim}
      FORALL (e)
        b1
      END FORALL
!HPF$INDEPENDENT
      FORALL (e)
        b2
      END FORALL
      FORALL (e)
        b3
      END FORALL
\end{verbatim}

Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronous sub-block are completely independent.
The INDEPENDENT directives indicates to the compiler there is no 
dependence and consequently, synchronizations are not necessary.
It is users' responsibility to ensure there is no dependence
between instances in an asynchronous sub-block.
A compiler can do dependence analysis for the asynchronous sub-blocks
and issues an error message when there exists a dependence or a warning
when it finds a possible dependence.

\noindent
{\bf What does mean "no dependence between instances"?}

It means that no true dependence, anti-dependence,
or output dependence between instances.
Examples of these dependences are shown below:

\noindent
1) true dependence
\begin{verbatim}
      FORALL (i = 1:N)
        x(i) = ... 
        ...  = x(i+1)
      END FORALL
\end{verbatim}
Notice that dependences in FORALL are different from that in a DO loop.
If the above example was a DO loop, that would be an anti-dependence.

\noindent
2) anti-dependence:

\begin{verbatim}
      FORALL (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END FORALL
\end{verbatim}

\noindent
3) output dependence:
\begin{verbatim}
      FORALL (i = 1:N)
        x(i+1) = ... 
        x(i) = ...
      END FORALL
\end{verbatim}

Independent does not imply no communication.  One instance may access 
data in the other instances, as long as it does not cause a dependence.  
The following example is an independent block:

\begin{verbatim}
      FORALL (i = 1:N)
!HPF$BEGIN INDEPENDENT
        x(i) = a(i-1)
        y(i-1) = a(i+1)
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
{\bf Statements that can appear in FORALL}

FORALL statements, WHERE-ELSEWHERE statements, some intrinsic functions 
(and possibly elemental functions and subroutines) can appear in the FORALL:

\noindent
1) FORALL statement
\begin{verbatim}
      FORALL (I = 1 : N)
        A(I,0) = A(I-1,0)
        FORALL (J = 1 : N)
!HPF$BEGIN INDEPENDENT
          A(I,J) = A(I,0) + B(I-1,J-1)
          C(I,J) = A(I,J)
!HPF$END INDEPENDENT
        END FORALL
      END FORALL
\end{verbatim}

\noindent
2) WHERE
\begin{verbatim}
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

Moreover, I would like to propose to include IF statements and DO loops 
in FORALL:

\noindent
3) IF-THEN-ELSE
\begin{verbatim}
      FORALL ( I = 1 : N )
!HPF$BEGIN INDEPENDENT
        IF ( A(I) < EPS ) THEN                
          A(I) = 0.0                          
          B(I) = 0.0                          
        ELSE
          TMP(I) = B(I)
          B(I) = A(I)                       
          A(I) = TMP(I)
        END IF
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
4) DO loop
\begin{verbatim}
      FORALL (I = 1 : N)
!HPF$BEGIN INDEPENDENT
        DO J = 1, C(I) 
          A(I) = A(I) * B(I)
        END DO 
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\vspace{.1in}
\noindent
{\bf Rationale}

1. A FORALL with a single asynchronous sub-block as shown below is 
the same as a do independent (or doall, or doeach, or parallel do, etc.).
\begin{verbatim}
      FORALL (e)
!HPF$BEGIN INDEPENDENT
        b1
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}
A FORALL without any INDEPENDENT directive is the same as a tightly 
synchronized FORALL.  We only need to define one type of parallel 
constructs including both synchronized and asynchronous blocks.  
Furthermore, combining asynchronous and synchronized FORALLs, we 
have a loosely synchronized FORALL which is more flexible for many 
loosely synchronous applications.

2. With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  The INDEPENDENT directives can act 
as barrier synchronizations.  One may suggest a smart compiler 
that can recognize dependences and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependences.  
When the compiler cannot determine whether there is a dependence, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.

\end{document}


From pm@icase.icase.edu  Sat Jul 25 17:26:08 1992
Received: from elc15 (elc15.icase.edu) by cs.rice.edu (AA26459); Sat, 25 Jul 92 17:26:08 CDT
Received: by elc15 (5.65.1/lanleaf2.4.9)
	id AA01284; Sat, 25 Jul 92 18:25:59 -0400
Message-Id: <9207252225.AA01284@elc15>
Date: Sat, 25 Jul 92 18:25:59 -0400
From: Piyush Mehrotra <pm@icase.icase.edu>
To: hpff-forall@cs.rice.edu
Subject: Re: MIMD and subroutine call from FORALL
In-Reply-To: Mail from 'wu@cs.buffalo.edu (Min-You Wu)'
      dated: Sat, 25 Jul 92 12:04:54 EDT

	Date: Sat, 25 Jul 92 12:04:54 EDT
	From: wu@cs.buffalo.edu (Min-You Wu)
	Status: R
	
	
	1. A way to obtain MIMD parallelism and loosely synchronous constructs
	is to call ELEMENTAL functions (subroutines) from block Forall:
	
	      FORALL (I = 1:N)
	        CALL SUBROUTINE1 
	        CALL SUBROUTINE2 
	        CALL SUBROUTINE3 
	      END FORALL
	
	This way, execution of subroutines is asynchronous and after each
	call there is a synchronization.  As proposed by John Merlin, MIMD
	parallelism can be obtained by means of branches within an elemental
	function (subroutine).  However, 

I am not sure how this allows MIMD parallelism. Could you please explain
again?
	
	2. calling functions or subroutines from FORALL itself is not well
	justified yet.  As an example, there is a problem with subroutine 
	(function) calls.  It can be illustrated by the following example:
	
	      FOO (a, b, c, d)
	        REAL a, b, c, d
	        a = ...
	        d = c + d
	        RETURN
	      END
	
	This is a perfect elemental subroutine, and if it is called by
	
	      CALL FOO (A(I),B(I),C(I),D(I))
	
	there is with no problem.  However, if it is called by
	
	      CALL FOO (A(I),A(I-1),A(I+1),D(I))
	
	the result becomes nondeterminate.  I have asked some people
	at last HPFF meeting the question and didn't get a satisfactory
	solution.  Using IN, OUT, or INOUT cannot solve the problem.
	Maybe we have to enforce more restrictions.  What restrictions
	are proper?
	
	Min-You

Calling FOO as defined above is not HPF-conforming since the execution
of one instance of the forall construct is interfering with other
instances.

	- Piyush


From @eros.uknet.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Mon Jul 27 09:38:35 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA28996); Mon, 27 Jul 92 09:38:35 CDT
Via: uk.ac.uknet-relay; Mon, 27 Jul 1992 15:36:03 +0100
Received: from eros.uknet.ac.uk by ben.uknet.ac.uk via UKIP with SMTP (PP) 
          id <sg.17503-0@ben.uknet.ac.uk>; Mon, 27 Jul 1992 15:35:07 +0100
Received: from ecs.soton.ac.uk by eros.uknet.ac.uk via JANET with NIFTP (PP) 
          id <4603-0@eros.uknet.ac.uk>; Mon, 27 Jul 1992 15:34:40 +0100
Via: camra.ecs.soton.ac.uk; Mon, 27 Jul 92 15:26:04 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Mon, 27 Jul 92 15:32:06 BST
Date: Mon, 27 Jul 92 15:29:09 BST
Message-Id: <2494.9207271429@bacchus.ecs.soton.ac.uk>
To: pm@icase.icase.edu, wu@cs.buffalo.edu
Subject: Re: MIMD and subroutine call from FORALL
Cc: hpff-forall@cs.rice.edu

Min-You asks (<wu@edu.buffalo.cs Sat Jul 25 17:17:44 1992>):
> 
> 2. calling functions or subroutines from FORALL itself is not well
> justified yet.  As an example, there is a problem with subroutine 
> (function) calls.  It can be illustrated by the following example:
> 
>       FOO (a, b, c, d)
>         REAL a, b, c, d
>         a = ...
>         d = c + d
>         RETURN
>       END
> 
> This is a perfect elemental subroutine, and if it is called by
> 
>       CALL FOO (A(I),B(I),C(I),D(I))
> 
> there is with no problem.  However, if it is called by
> 
>       CALL FOO (A(I),A(I-1),A(I+1),D(I))
> 
> the result becomes nondeterminate.  I have asked some people
> at last HPFF meeting the question and didn't get a satisfactory
> solution.  Using IN, OUT, or INOUT cannot solve the problem.
> Maybe we have to enforce more restrictions.  What restrictions
> are proper?

I think the use of IN, OUT and INOUT *do* solve the problem.

If FOO is elemental and the above call appears in a FORALL, e.g.:

	FORALL (I=1:N)  CALL FOO (A(I), A(I-1), A(I+1), D(I))

this is equivalent to the following (array-valued) subroutine call:

	CALL FOO (A(1:N), A(0:N-1), A(2:N+1), D(1:N))

Fortran 90 has a rule that if any part of an actual argument is defined 
through the dummy argument, the actual argument may be referenced only 
through that dummy argument.  Since the first two arguments of FOO are 
both defined  (assuming for the sake of argument that the last assignment 
in FOO was meant to be 'b = c + d'),  the fact that they share the same 
section A(1:N) conflicts with this rule, as does the fact that the third 
argument overlaps with the first two.

The specification of argument intent allows this rule to be checked 
(at least partially).  If a dummy argument has INTENT (OUT) or (INOUT), 
the corresponding actual argument shouldn't overlap with any other 
actual argument (whatever its intent).   If there is, or may be, 
an overlap, I would guess the compiler is justified in generating 
at least a warning message, and perhaps even an error message if the 
overlap can be proved statically (as in your example).
The first two arguments of FOO have INTENT (OUT) and the last two have 
INTENT (IN), so the compiler would generate a warning or error for the 
above call.

In short, the relevant restrictions follow from F90's present
restrictions on actual arguments, generalised to FORALL  (i.e. using
the correspondence between FORALL and an array-valued assignment or
subroutine call).


Piyush writes (<pm@edu.icase.icase Sat Jul 25 23:37:45 1992>):
> 
> 	Date: Sat, 25 Jul 92 12:04:54 EDT
> 	From: wu@cs.buffalo.edu (Min-You Wu)
> 	Status: R
> 	
> 	1. A way to obtain MIMD parallelism and loosely synchronous constructs
> 	is to call ELEMENTAL functions (subroutines) from block Forall:
> 	
> 	      FORALL (I = 1:N)
> 	        CALL SUBROUTINE1 
> 	        CALL SUBROUTINE2 
> 	        CALL SUBROUTINE3 
> 	      END FORALL
> 	
> ...
> I am not sure how this allows MIMD parallelism. Could you please explain
> again?


In the above example, MIMD paralleism would normally require the
elemental procedures to have arguments!:-)  Then, the functional
parallelism comes from argument-dependent branches in the procedure, e.g:

	!HPF$ ELEMENTAL f
	FUNCTION f (x,i)
	  IF (x > 0) THEN      ! content-based conditional
	    ...
	  ELSE IF (i == 1 .OR. i == n) THEN   ! index-based conditional
	    ...
	  ENDIF
	END FUNCTION f

	...
	FORALL (i=1:n) x(i) = f (x(i), i)
	...

(Another way of obtaining MIMD parallelism is via different elemental
function calls in the WHERE and ELSEWHERE parts of a WHERE construct 
-- assuming the compiler could avoid a synchronisation barrier between 
the two parts).

                    John Merlin

From gls@think.com  Tue Jul 28 14:24:41 1992
Received: from mail.think.com by cs.rice.edu (AA02375); Tue, 28 Jul 92 14:24:41 CDT
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Tue, 28 Jul 92 15:24:22 -0400
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.2)
	id AA04915; Tue, 28 Jul 92 15:24:20 EDT
Date: Tue, 28 Jul 92 15:24:20 EDT
Message-Id: <9207281924.AA04915@strident.think.com>
To: hpff-forall@cs.rice.edu
Cc: gls@think.com
Subject: Parallel pointer processing


Proposal A:  Pointer assignments may appear in the body of a FORALL.
	(It is not necessary to support this proposal until one
	supports derived types, as that is the only way of specifying
	assignment to more than one pointer at a time.)

Rationale: this is just another kind of assignment.

Example:

      TYPE MONARCH
        INTEGER, POINTER :: P
      END TYPE MONARCH
      TYPE(MONARCH) :: A(N)
      INTEGER B(N)
      ...
C  Set up a butterfly pattern
      FORALL (J=1:N)  A(J)%P => B(1+IEOR(J-1,2**K))


Proposal B:  ALLOCATE, DEALLOCATE, and NULLIFY statements may appear
	in the body of a FORALL.

Rationale: these are just another kind of assignment.  They may have
	a kind of side effect (storage management), but it is a
	benign side effect (even milder than random number generation).

Example:

      TYPE SCREEN
        INTEGER, POINTER :: P(:,:)
      END TYPE SCREEN
      TYPE(SCREEN) :: S(N)
      INTEGER IERR(N)
      ...
C  Lots of arrays with different aspect ratios
      FORALL (J=1:N)  ALLOCATE(S(J)%P(J,N/J),STAT=IERR(J))
      IF(ANY(IERR)) GO TO 99999


Proposal C:  Delete the constraint in section 6.1.2 of the Fortran 90
	standard (page 63, lines 7 and 8):

	Constraint: In a data-ref, there must not be more than one
		part-ref with nonzero rank.  A part-name to the right
		of a part-ref with nonzero rank must not have the
		POINTER attribute.

Rationale: further opportunities for parallelism.

Example:

      TYPE(MONARCH) :: C(N), W(N)
      ...
C  Munch that butterfly
      C = C + W * A%P		!Currently illegal in Fortran 90

--Guy

From zrlp09@trc.amoco.com  Tue Jul 28 15:51:42 1992
Received: from noc.msc.edu by cs.rice.edu (AA05142); Tue, 28 Jul 92 15:51:42 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA12258; Tue, 28 Jul 92 15:51:41 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA14521; Tue, 28 Jul 92 15:51:38 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA18984; Tue, 28 Jul 92 15:51:34 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA22915; Tue, 28 Jul 92 15:51:31 CDT
Received: from localhost by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA08071; Tue, 28 Jul 92 15:51:31 CDT
Message-Id: <9207282051.AA08071@backus.trc.amoco.com>
To: hpff-forall@cs.rice.edu
Subject: 1. CALL FOO(A(i), B(i), C(i), D(i)) is not elemental
         2. FORALL/HPF-1 should not admit procedures with side effects
Date: Tue, 28 Jul 92 15:51:30 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>

Min-You says (<wu@edu.buffalo.cs Sat Jul 25 17:17:44 1992>):
> 
>  ... calling functions or subroutines from FORALL itself is not well
> justified yet.  As an example, there is a problem with subroutine 
> (function) calls.  It can be illustrated by the following example:
> 
>       SUBROUTINE FOO (a, b, c, d)
>         REAL a, b, c, d
>         a = ...
>         b = c + d    ! assuming that the original d=c+d was a typo
>       END
> 
> This is a perfect elemental subroutine, and if it is called by
> 
>       CALL FOO (A(I),B(I),C(I),D(I))
> 
> there is with no problem.  However, if it is called by
> 
>       CALL FOO (A(I),A(I-1),A(I+1),D(I))
> 
> the result becomes nondeterminate.

FOO could be invoked elementally, but
neither of the above invocations of FOO is an elemental invocation.
An elemental invocation is one with arrays as actual arguments in
places where the function definition has scalar dummy arguments.

Here is an elemental invocation of FOO:
    CALL FOO (A(1:N), A(0:N-1), A(2:N+1), D(1:N))  ! (Merlin)
However, this invocation is illegal (if N>1) because Fortran
prohibits defining a dummy argument whose associated actual
argument overlaps that of a different dummy argument
(as John Merlin has pointed out).

Consider the following non-elemental invocation in a FORALL:
   FORALL (I=1:N)  CALL FOO (A(I), A(I-1), A(I+1), D(I)) ! (Merlin)
What might this mean?

From the fragments of the discussion that I've seen, it appears
that some people want to base an interpretation on the intent
attribute of the arguments, in effect using IN, OUT, and INOUT to
decide whether to pass copies of the actual arguments or to pass
the actual arguments themselves.  Presumably, one would pass a
copy of an argument whose corresponding dummy had intent IN
(by analogy with the treatment of rhs values in FORALL), and
one would pass the actual argument itself if the dummy had
intent OUT (by analogy with the treatment of lhs variables).

Who knows what should be done with dummies of intent INOUT?
Which read-access occurances use the value on entry to the
FORALL, and which use the "current" value?  And what about
a dummy argument of intent OUT for which the procedure contains
rhs-access?
     SUBROUTINE GOO(x,y)
     INTENT(IN) :: x
     INTENT(OUT):: y
     y = x
     y = y+1
     END
What values for y should be retrieved to compute y+1 to carry
out the invocations in the following FORALL?
     FORALL (i=1:n) CALL GOO(A(i), A(i-1))

It is interesting that when statements like those in FOO and GOO
occur in block FORALLs (with dummies replaced by actuals), the
constructs make sense in the "SIMD semantics" (synchronization
after each statement in the block) that have been discussed for
FORALL:

      expanding FORALL (i=1:n) CALL FOO(A(i), A(i-1), A(i+1), D(i))
   FORALL (i=1:n)            ! means   
     A(i) = ...              !   FORALL(i=1:n) A(i) = ...
     A(i-1) = A(i+1) + D(i)  !   FORALL(i=1:n) A(i-1) = A(i+1) + D(i)
   END FORALL                ! using "SIMD semantics" for block FORALL

      expanding FORALL (i=1:n) CALL GOO(A(i), A(i-1))
   FORALL (i=1:n)            ! means   
     A(i-1) = A(i)           !   FORALL(i=1:n) A(i-1)=A(i)     ! shift
     A(i-1) = A(i-1) + 1     !   FORALL(i=1:n) A(i-1)=A(i-1)+1 ! incr
   END FORALL                ! using "SIMD semantics" for block FORALL

I conclude that making sense of procedures with side effects
that are invoked in FORALL constructs is unlikely to be a
productive way for HPFF to invest its time in 1992.
The July 24 strawvote to preclude side-effects in procedures invoked
in FORALL constructs seems the prudent path to follow.  HPF-2
could extend the interpretation to procedures with side effects
if appropriate semantics were discovered.

Rex Page

From loveman@ftn90.enet.dec.com  Tue Jul 28 16:42:11 1992
Received: from enet-gw.pa.dec.com by cs.rice.edu (AA06792); Tue, 28 Jul 92 16:42:11 CDT
Received: by enet-gw.pa.dec.com; id AA05798; Mon, 27 Jul 92 08:43:40 -0700
Message-Id: <9207271543.AA05798@enet-gw.pa.dec.com>
Received: from ftn90.enet; by decwrl.enet; Mon, 27 Jul 92 08:43:43 PDT
Date: Mon, 27 Jul 92 08:43:43 PDT
From: David Loveman <loveman@ftn90.enet.dec.com>
To: hpff-forall@cs.rice.edu
Cc: loveman@ftn90.enet.dec.com
Apparently-To: hpff-forall@cs.rice.edu
Subject: FYI - on FORALL


I presented a technical note entitled "Element Array Assignment - the
FORALL Statement" at the Third Workshop on Compilers for Parallel
Computers, Vienna, Austria, July 6-9, 1992.  It contained two items
that might be of interest:  an "optimized" scalarization of the single
statement FORALL and a "compendium" of FORALL examples culled from a
variety of sources.  Enjoy!


Efficient Implementation of the FORALL Statement

A forall-stmt has the general form:

FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn [, mask] ) &
      a(e1,...,em) = rhs

A careful analysis of the scalarized definition of FORALL, given the
language prohibitions against side-effects in statements, permits a
more efficient scalarized implementation of FORALL:

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn
tempa = a   ! array assignment
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF mask THEN
          tempa(e1,...,em) = rhs
        END IF
      END DO
	  ...
  END DO
END DO
a = tempa   ! array assignment

In many cases, compiler optimizations such as copy propagation can
eliminate the requirement for temporaries for lower bounds, upper
bounds and strides.  Similarly, dependence analysis may eliminate the
requirement for the array temporary.  Thus a FORALL statement such as
(Eugene Albert, Joan D. Lukas, and Guy L. Steele, Jr., Data Parallel
Computers and the FORALL Statement, Journal of Parallel and Distributed
Computing, October 1991)

FORALL (I=1:N, J=1:M:2) A(I,J) = I * B(J)

could be implemented on a scalar machine as

DO I=1,N
  DO J=1,M,2
    A(I,J) = I * B(J)
  END DO
END DO

On a parallel SIMD or MIMD machine it can, of course, be implemented in parallel.


Examples of the FORALL Statement

This section presents examples of the FORALL statement from several
sources:  the Fortran 8x definition, a paper on the FORALL statement by
Albert, Lukas, and Steele, the Thinking Machines Language Reference
Manual, and other sources:

PROGRAM FORALL_EXAMPLES

! the examples assume the following declarations
! initial values are arbitrary
  INTEGER, PARAMETER  :: L = 100, M = 100, N = 100
  INTEGER :: ONE = 1, ZERO = 0
  INTEGER, DIMENSION (L) :: V = (/ 1:L /), U = (/ L:1:-1 /)
  INTEGER, DIMENSION (M,N) :: IND
  REAL, DIMENSION (L) :: R, S = 5.3
  REAL, DIMENSION (M,N) :: A, B, C=26.2
  REAL, DIMENSION (M,N,10) :: G
  IND = SPREAD((/(I,I=1,M)/),2,N)


! examples derived from the Fortran 8x (S8.104) definition of FORALL

  FORALL (I=1:M, J=1:N) A(I,J) = 1.0 / REAL(I + J - 1)
  FORALL (I=1:M, J=1:N, A(I,J) /= 0.0) B(I,J) = 1.0 / A(I,J)
  ! which is the same as
  WHERE (A /= 0.0) B = 1.0 / A


! examples derived from the paper by Albert, Lukas, and Steele
! with natural Fortran 90 equivalents

! right hand side not evaluated, division by 0 does not occur
  FORALL (I=1:L:-1) R(I) = ONE/ZERO
  FORALL (I=1:L, I > 200) R(I) = ONE/ZERO

! operations on rectangular array sections
  ! the FORALL in effect gives a name to a section triplet
  FORALL (I=1:L) R(I) = S(I) ! which is the same as
  R(1:L) = S(1:L) ! which is the same as 
  R = S ! since R and S are both of length L
  FORALL (I=1:10:2, J=10:1:-1) A(I,J) = B(I,J) * C(I,J) 
  ! which is equivalent to
  A(1:10:2, 10:1:-1) = B(1:10:2, 10:1:-1) * C(1:10:2, 10:1:-1)

! computational use of subscript values in one dimension
  FORALL (I=1:100) R(I) = I 
  ! which is equivalent to the use of an array constructor
  R = (/ 1:100 /)

! certain cases of spreading
  FORALL (I=1:10, J=1:20) A(I, J) = S(I) 
  ! which is equivalent to the implied spread
  A(1:10, 1:20) = SPREAD(S(1:10), DIM=2, NCOPIES=20) 

! vector valued subscripts
  FORALL (I=1:L) R(V(I)) = S(I)         ! which is equivalent to 
  R(V(1:L)) = S(1:100)                  ! which is equivalent to
  R(V) = S                       ! since V and S are of length 100

  FORALL (I=1:10, J=1:10)  &
     A(V(I), U(J)) = S(I)                
  ! which is equivalent to the implied spread
  A(V, U) = SPREAD( S, DIM=2, NCOPIES=100)

! permutation of two axes
  FORALL (I=1:M, J=1:N)  A(I,J) = B(J, I) ! which is equivalent to
  A = TRANSPOSE(B)


! examples derived from the paper by Albert, Lukas, and Steele
! without natural Fortran 90 equivalents

! skewed sections of arrays
  FORALL (I=1:100) A(I, I) = B(I, I) ! diagonal
  FORALL (I=1:100) A(I, I) = B(I+1, I-1)
  FORALL (I=1:M, J=1:N, K=1:10, I+J+K .EQ. 3*(N+1)/2)   &
     A(I+J-K, J) = G(I, J, K)

! use of multiple subscript values in an expression
  FORALL (I=1:10, J=1:20, K=1:30)   &
     G(I, J, K) = I + J + K
  ! which is (not obviously) equivalent to
  G(1:10, 1:20, 1:30) = SPREAD(SPREAD((/ 1:10 /), 2, 20), 3, 30)  & 
                      + SPREAD(SPREAD((/ 1:20 /), 1, 10), 3, 30)  &
                      + SPREAD(SPREAD((/ 1:30 /), 1, 10), 2, 20) 

! multi-dimensional array-valued subscripts
  FORALL (I=1:100, J=1:100) R(IND(I, J)) = B(I, J)

! scatter addressing
  FORALL (I=1:100) A(U(I), V(I)) = S(I)
  FORALL (I=2:100) R(I) = R(I/2) ! array representing a binary tree

! axis transpositions with more than 2 axes
  FORALL (I=1:100, J=1:100, K=1:100) G(I, J, K) = G(J, K, I)
  FORALL (I=1:100, J=1:100, K=1:100) G(I, J, K) = G(J, 11-K, I+1)

! parallel prefix operations
  FORALL(I=1:100) R(I) = SUM(S(1:I))


! examples derived from Thinking Machines CM Fortran Reference Manual

! zeros the upper right triangle of C
  FORALL (I=1:M, J=1:N, I<J) C(I,J) = 0

! assigns each array element its index along the 2nd axis
  FORALL (I=1:N) G(:, I, :) = I

! assigns consecutive integers to all elements of array A
  FORALL (I=1:M, J=1:N) A(I,J) = (I-1)*N + J - 1

! 2 ways to replicate a vector along the 2nd dimension of A
  FORALL (J=1:N) A(:,J) = R
  ! which is equivalent to the implied spread
  A = SPREAD(R, 2, N)

! assign diagonal of A to R
  FORALL (I=1:M) R(I) = A(I,I)

! rotate square matrix A counter-clockwise
  FORALL (I=1:N, J=1:N) A(I,J) = A(J,N-I+1)

! 2 ways to transpose an array
  FORALL (I=1:N, J=1:N) A(I,J) = A(J,I)
  A = TRANSPOSE(A)


!  other examples

  FORALL (I=1:M, K=1:N) A(I,K) = SUM(A(I,:) * B(:,K))
  ! which is equivalent to
  A = MATMUL(A,B)

  FORALL (I=1:L) R(I) = SUM(S(:),MASK=V(:)==I)

END PROGRAM FORALL_EXAMPLES

From presberg@theory.tc.cornell.edu  Tue Jul 28 17:06:03 1992
Received: from theory.TC.CORNELL.EDU by cs.rice.edu (AA07440); Tue, 28 Jul 92 17:06:03 CDT
Date: Tue, 28 Jul 92 18:06:01 EDT
From: presberg@theory.tc.cornell.edu (David Presberg)
Received: by theory.TC.CORNELL.EDU (4.1/1.6)
	id AA25389; Tue, 28 Jul 92 18:06:01 EDT
Message-Id: <9207282206.AA25389@theory.TC.CORNELL.EDU>
To: hpff-forall@cs.rice.edu
Cc: gls@think.com, presberg@theory.tc.cornell.edu
Subject: re: Parallel pointer processing
References: <9207281924.AA04915@strident.think.com>

A minor nit concerning an imprecision in the presentation.

I believe the Proposal A example should have the following declaration:

      INTEGER, TARGET :: B(N)

instead of the one given for B.  (Of course, you meant that the
POINTER attribute was asserted by a declaration in the ellipses
material.  ;-) )

-- Pres

From loveman@ftn90.enet.dec.com  Tue Jul 28 17:20:19 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA07985); Tue, 28 Jul 92 17:20:19 CDT
Received: from enet-gw.pa.dec.com by erato.cs.rice.edu (AA21068); Tue, 28 Jul 92 17:20:11 CDT
Received: by enet-gw.pa.dec.com; id AA07872; Mon, 27 Jul 92 09:12:39 -0700
Message-Id: <9207271612.AA07872@enet-gw.pa.dec.com>
Received: from ftn90.enet; by decwrl.enet; Mon, 27 Jul 92 09:12:44 PDT
Date: Mon, 27 Jul 92 09:12:44 PDT
From: David Loveman <loveman@ftn90.enet.dec.com>
To: karp@hplms2.hpl.hp.com
Cc: loveman@ftn90.enet.dec.com, hpff-forall@erato.cs.rice.edu
Apparently-To: karp@hplms2.hpl.hp.com, hpff-forall@erato.cs.rice.edu
Subject: FORALL discussion


in [shapiro@Think.COM: Your example]:

----> I would like to get these issues in front of a wider audience.
----> What do you think? Someone on this forum will almost certainly
----> end up as referee, so let's hear it. Any objections?  I would
----> prefer to have a point/counterpoint. We could do it as two
----> separate papers or as one paper. Interested, Rich? The paper
----> could be expanded into one for a general audience by including
----> David Loveman's overview as an introduction. How about it,
----> David? Where to you think it should go - CACM, Fortran Journal,
----> or ???

	   Rich Shapiro

----> Alan Karp


I would be interested in participating in such a paper.  I think there
is a lot of existing "grist" for a possibly interesting language design paper.

-David

From presberg@theory.tc.cornell.edu  Tue Aug  4 10:03:58 1992
Received: from theory.TC.CORNELL.EDU by cs.rice.edu (AA07440); Tue, 28 Jul 92 17:06:03 CDT
Date: Tue, 28 Jul 92 18:06:01 EDT
From: presberg@theory.tc.cornell.edu (David Presberg)
Received: by theory.TC.CORNELL.EDU (4.1/1.6)
	id AA25389; Tue, 28 Jul 92 18:06:01 EDT
Message-Id: <9207282206.AA25389@theory.TC.CORNELL.EDU>
To: hpff-forall@cs.rice.edu
Cc: gls@think.com, presberg@theory.tc.cornell.edu
Subject: re: Parallel pointer processing
References: <9207281924.AA04915@strident.think.com>

A minor nit concerning an imprecision in the presentation.

I believe the Proposal A example should have the following declaration:

      INTEGER, TARGET :: B(N)

instead of the one given for B.  (Of course, you meant that the
POINTER attribute was asserted by a declaration in the ellipses
material.  ;-) )

-- Pres

From chk@cs.rice.edu  Thu Aug  6 11:30:37 1992
Received: from rice.edu by cs.rice.edu (AA28961); Thu, 6 Aug 92 11:30:37 CDT
Received: from cs.rice.edu by rice.edu (AA10748); Thu, 6 Aug 92 11:29:52 CDT
Received: from DialupEudora (charon.rice.edu) by cs.rice.edu (AA28957); Thu, 6 Aug 92 11:30:15 CDT
Message-Id: <9208061630.AA28957@cs.rice.edu>
Date: Thu, 6 Aug 1992 11:32:33 -0600
To: hpff-forall@rice.edu
From: chk@cs.rice.edu
Subject: Results from DC meting
X-Attachments: :Macintosh HD:3057:Summary July 23:


It will take a couple of days for me to get the full meeting notes, but 
I wanted to get this group a summary of the decisions that affect FORALL 
and INDEPENDENT.  To whit:

I presented the draft as circulated before the meeting.  (I didn't see 
John Merlin's comments before the meeting, so I didn't attempt to 
present them.)  I also presented Guy Steele's original proposal for 
INDEPENDENT as applied to DO (hereafter, DO INDEPENDENT).

Lots of discussion ensued, starting with Alan Karp's comments 
(distributed at the meeting, along with most of Rich Shapiro's replies).
To make a long story short, the whole group decided that FORALL was worth 
considering.  There is enough controversy that three scenarios are still 
possible:
	Accept FORALL completely into HPF
	Accept FORALL into HPF but not into the HPF subset
	Not accept FORALL at the final reading next meeting

Straw poll results:
	Should HPF have at least the single-satement FORALL?  
	  Yes 28, No 4, Abstain 4
	If FORALL could be in HPF but not in the subset, should FORALL be...
	  ...totally out of HPF?  2
	  ...in HPF but not in the subset?  15
	  ...in the subset? 16
	Should HPF have the FORALL construct with at least assignment  & 
	  unmasked array assignment?  
	  Yes 18, No 7, Abstain 9 
	Should HPF have the FORALL construct with nested WHERE statements with 
	  the current restrictions and semantics?
	  Yes 16, No 6, Abstain 13
	Should HPF have the FORALL construct with nested FORALLs and WHEREs?
	  Yes 13, No 8, Abstain 15
	Should HPF allow no side effects in functions in FORALL (rather than the 
	  current draft, which allowed "benign" side effects)?
	  Yes 15, No 12, Abstain 8
	Should HPF allow CALL in FORALLs (assuming side effects were allowed)?
	  Yes 7, No 14, Abstain 12
	Should HPF have the current INDEPENDENT assertion for DO loops?
	  Yes 27, No 2, Abstain 5

Marc Snir presented his local subroutines proposal.  There was a good 
deal of discussion, and it was decided to give a proposal based on this 
one a first reding at the next meeting.  No straw votes were taken.


Some results from the FORALL subgroup meeting after the main group meeting:

We were all suprised that the FORALL draft survived essentially unscathed.

Due to the closeness of the side effect vote, it was decided to prepare 
two proposals; one based on ELEMENTAL functions (or some other name for 
no-side-effect functions), and the other a cleaned-up version of the 
current proposal.  Right now, there is still some ambiguity over certain 
side effects (notably, can a function keep a count of the number of times 
it is called).  I'll try to synthesize both in the near future from 
existing documents.  If someone (Hi, John!) wants to beat me to the punch 
on the ELEMENTAL proposal, I won't feel offended.

P. Sadayappan and J. Ramanujam volunteered to draft an extension of 
INDEPENDENT to FORALL constructs and possibly other statements.  Min-You 
Wu and Clemmens Thole promised to provide input to that extension (or 
make their own proposals).

Guy Steele and Marc Snir will collaborate to combine their local 
subroutine proposals.  Based primarily on comments from Clemmens Thole, 
they will probably avoid defining any communication or FETCH/STORE 
functions within the routines; these have too many semantic complexities 
to get timely agreement, and tend to be architecture-dependent.


From gcf@npac.syr.edu  Fri Aug 21 11:55:22 1992
Received: from spica.npac.syr.edu by cs.rice.edu (AA08129); Fri, 21 Aug 92 11:55:22 CDT
Received: from cosmos.npac.syr.edu by spica.npac.syr.edu (4.1/I-1.98K)
	id AA27101; Fri, 21 Aug 92 12:55:16 EDT
Date: Fri, 21 Aug 92 12:55:16 EDT
From: gcf@npac.syr.edu (Geoffrey Fox)
Message-Id: <9208211655.AA22852@cosmos.npac.syr.edu>
Received: by cosmos.npac.syr.edu (4.1/N-0.12)
	id AA22852; Fri, 21 Aug 92 12:55:11 EDT
To: gcf@npac.syr.edu, hpff-forall@cs.rice.edu
Subject: Independent do loops


"INDEPENDENT" DO LOOPS
This memo has evolved from discussions with Chuck Koebel. It describes two classes of
"independent" DO loops which are not easy in current HPF. The first class is actually
hardest as has reductions. The second solely is independent calculations. 

This note describes a class of problems which is probably to date the largest 
academic use of parallel computers over the last 15 years and further is the 
class for which systems such as PVM, Linda and "network-express" are currently
attracting the greatest attention in industry.  My understanding is that HP 
Fortran will not support them.  This surprises me.

The simplest example is the Monte Carlo integral.  We have a set of points 
x(i).  We need to calculate function values f(x(i)) and global sums such as 
sum(i) f(x(i)) and f(x(i))**2.  The parallel implementation is straightforward.
The f(x(i)) are calculated independently in separate processors and the global
averages by some suitable combine operation.  The original (high performance) 
Fortran code would be:

	av = avsq = cnt = o
	do i = 1, forever
	get x = x(i) (from tape or random number)
	calculate f = f(x)
	av = av + f
	avsq = avsq + f * f
	cnt = cnt + 1
	end do

        You can argue that this is trivial in HPF by declaring x f av avsq to be
        vectors. Unfortunately this is not possible in many real examples where
        these variables generalize to data structures with values of "forever" such
        that vectors would take up terabytes of memory. In natural MIMD implementation
        one would only be aware of N copies of x f av avsq at a time where N is number 
        of processors.

Real (and hence more complicated) examples would be
1)	Calculate string theory of gravity
	x(i) represent mesh of points on a two dimensional surface
        which is generated randomly initially and evolved independently.
	f(x(i)) are quantum field values on surface
	av (and error avsq) are integrals of "action."
	One plots av as a function of coupling constant to look for phase 
transitions:

2)	Analyze data from accelerators.

This application has, for last 15 years, used MIMD "processor farms" in 
production.  Ferimilab has been leader recently.

Here x(i) are read from tape and specify an interaction with from 2 to many 
hundreds of particles.  There are a huge number of f(x(i)) specifying feature 
of the physics and response of apparatus.  The averages are now accumulated 
more sophisticatedly in histograms and scatterplots.

In this case, the code inside the DO i=1, forever could well be 100,000 lines 
long.  The size of histogram and scatterplot storage needed may or may not fit
on a single node.  The variation in processing time from event to event is 
many orders of magnitude.

3)	In 1) and 2) one has a simple source of "events" x(i).  In particle 
shower or cascade calculations, the independent x(i) are not naturally 
specified by a DO LOOP.  Examples are the calculation of the response of a 
uranium plus scintillator detector to an incident photon which produces a 
cascade shower; the rays in a classic ray tracing graphics rendering  package 
(these may or may not spawn additional rays in their evolution through image);
alternatively, we have multiplication of neurons meandering through a doomsday
world.  
  Here the x(i) are particle (ray) parameters and f(x(i)) the result of 
tracking particle through medium.  However, the result of the tracking is one 
or more additional particles which must recursively be tracked.

In 1) and 2), we have a rather simple master-slave formalism where master just 
doles out x(i) from loop iterations.  In 3), the slaves will produce the x(i) 
which are either handed back to the master or distributed by the slave itself.

Other implementation issues include parallel random-numbers; distributed 
histograms; distributed data sets to specify the world in 3).  Initially, one 
could assume that each node had enough memory to store a full set of 
histograms.  One would accumulate independently into these during the loop 
over x(i) and then combine histograms at the end.

Language Issues:


Neither the FORALL construct nor the INDEPENDENT assertion as they
currently appear in HPF are suitable for these problems for the following
reasons.

1) FORALL - This is an array assignment, and no data arrays are stored. 
Rewriting the code in vector format would be very difficult, if not
impossible, due to the complexity of the f()'s used.  Even if this were
possible, the sizes of the arrays needed would be prohibitive.  Finally,
there is still the problem of the data reductions, not all of which can be
done directly with HPF intrinsics.

2) INDEPENDENT - This asserts that no memory location is written by one
iteration and accessed by another.  This is not the case with the original
Fortran, where all of the variables are reused from iteration to iteration.
 Even with a "private" declaration, the data reductions by their nature
must both read and write the same locations on many iterations.

Actually there appeaers to be a problem with an even simpler class of applications
which are typified by
a)Chemical Calculations where you first find independently matrix elements
and then manipulate matrices.
b)Circuit Simulations(spice, electrical grid, gas supply, phone etc.)
First independently calculate very sparse matrix elements (e.g. model components)
then solve sparse matrix.

In both a),b) one can see serious load balance issues as different matrix elements
have different calculational complexities.
The INDEPENDENT component is
Do I=1,numrow
do j=1,nonzero elements in column
Long complicated set of calculations with "private" reused variables (different values on
each node of parallel machine)
M(i,j)=answer
enddo
 We need again to declare variables used inside loop as "private"(replicated variables with same
name but different value on each node of MIMD parallel machine). It is perhaps controversial but
my experience would be that vast majority of all nondecomposed arrays will be best thought of
as Private not global. i.e. private should be default for all variables not appearing in a
decomposition statement.

From chk@cs.rice.edu  Sun Aug 23 23:31:56 1992
Received: from moe.rice.edu by cs.rice.edu (AA13521); Sun, 23 Aug 92 23:31:56 CDT
Received: from cs.rice.edu by moe.rice.edu (AA27350); Sun, 23 Aug 92 23:31:56 CDT
Received: from  by cs.rice.edu (AB13450); Sun, 23 Aug 92 23:31:47 CDT
Message-Id: <9208240431.AB13450@cs.rice.edu>
Date: Sun, 23 Aug 1992 23:34:22 -0600
To: hpff-forall@rice.edu
From: chk@cs.rice.edu
Subject: Proposals, proposals, proposals...

I've finally gotten all (I think) of the still-active proposals for FORALL
and INDEPENDENT together in one document, included in the next message. 
This includes minor updates to the FORALL draft from last meeting, Guy's
and Min-You's INDEPENDENT proposals, Clemens' MIMD support, and an
extensively revised ELEMENTAL function proposal.  I'll be posting my
comments on sections I didn't write or modify tomorrow or the next day. 
Let the debates begin!

                                                Chuck

PS If I've missed your proposal, my sincere apologies; please resend it and
I'll include it in the next draft.

PPS Ugly formatting in the LaTeX document is my fault; single-bit errors
may be my modem software; other problems are the authors' faults.


From chk@cs.rice.edu  Sun Aug 23 23:37:09 1992
Received: from moe.rice.edu by cs.rice.edu (AA13569); Sun, 23 Aug 92 23:37:09 CDT
Received: from cs.rice.edu by moe.rice.edu (AA27368); Sun, 23 Aug 92 23:37:07 CDT
Received: from  by cs.rice.edu (AB13450); Sun, 23 Aug 92 23:31:56 CDT
Message-Id: <9208240431.AB13450@cs.rice.edu>
Date: Sun, 23 Aug 1992 23:34:31 -0600
To: hpff-forall@rice.edu
From: chk@cs.rice.edu
Subject: new-draft.tex
X-Attachments: :Macintosh HD:3737:new-statements.tex:


%hpf-freestanding-chapter-header.tex

%Version of August 5, 1992 - David Loveman, Digital Equipment Corporation

\documentstyle[11pt]{report}
\pagestyle{plain}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin=.25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin=-.5in
\textwidth=6.0in
\textheight=9.0in
\parindent=2em

%the file syntax-macros.tex is physically included below

%syntax-macros.tex

%Version of July 29, 1992 - Guy Steele, Thinking Machines

\newdimen\bnfalign         \bnfalign=2in
\newdimen\bnfopwidth       \bnfopwidth=.3in
\newdimen\bnfindent        \bnfindent=.2in
\newdimen\bnfsep           \bnfsep=6pt
\newdimen\bnfmargin        \bnfmargin=0.5in
\newdimen\codemargin       \codemargin=0.5in
\newdimen\intrinsicmargin  \intrinsicmargin=3em
\newdimen\casemargin       \casemargin=0.75in
\newdimen\argumentmargin   \argumentmargin=1.8in

\def\IT{\it}
\def\RM{\rm}
\let\CHAR=\char
\let\CATCODE=\catcode
\let\DEF=\def
\let\GLOBAL=\global
\let\RELAX=\relax
\let\BEGIN=\begin
\let\END=\end


\def\FUNNYCHARACTIVE{\CATCODE`\a=13 \CATCODE`\b=13 \CATCODE`\c=13
\CATCODE`\d=13
		     \CATCODE`\e=13 \CATCODE`\f=13 \CATCODE`\g=13 \CATCODE`\h=13
		     \CATCODE`\i=13 \CATCODE`\j=13 \CATCODE`\k=13 \CATCODE`\l=13
		     \CATCODE`\m=13 \CATCODE`\n=13 \CATCODE`\o=13 \CATCODE`\p=13
		     \CATCODE`\q=13 \CATCODE`\r=13 \CATCODE`\s=13 \CATCODE`\t=13
		     \CATCODE`\u=13 \CATCODE`\v=13 \CATCODE`\w=13 \CATCODE`\x=13
		     \CATCODE`\y=13 \CATCODE`\z=13 \CATCODE`\[=13 \CATCODE`\]=13
                     \CATCODE`\-=13}

\def\RETURNACTIVE{\CATCODE`\
=13}

\makeatletter
\def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus 
 -.2ex}{2.3ex plus .2ex}{\large\sf}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-3.25ex plus -1ex minus 
 -.2ex}{1.5ex plus .2ex}{\large\sf}}

\def\@ifpar#1#2{\let\@tempe\par \def\@tempa{#1}\def\@tempb{#2}\futurelet
    \@tempc\@ifnch}

\def\?#1.{\begingroup\def\@tempq{#1}\list{}{\leftmargin\intrinsicmargin}\rel
ax
  \item[]{\bf\@tempq.} \@intrinsictest}
\def\@intrinsictest{\@ifpar{\@intrinsicpar\@intrinsicdesc}{\@intrinsicpar\re
lax}}
\long\def\@intrinsicdesc#1{\list{}{\relax
  \def\@tempb{ Arguments}\ifx\@tempq\@tempb
			  \leftmargin\argumentmargin
			  \else \leftmargin\casemargin \fi
  \labelwidth\leftmargin  \advance\labelwidth -\labelsep
  \parsep 4pt plus 2pt minus 1pt
  \let\makelabel\@intrinsiclabel}#1\endlist}
\long\def\@intrinsicpar#1#2\\{#1{#2}\@ifstar{\@intrinsictest}{\endlist\endgr
oup}}
\def\@intrinsiclabel#1{\setbox0=\hbox{\rm #1}\ifnum\wd0>\labelwidth
  \box0 \else \hbox to \labelwidth{\box0\hfill}\fi}
\def\Case(#1):{\item[{\it Case (#1):}]}
\def\ {\@ifnextchar({\def\@tempq{#1}\@intrinsicopt}{\item[#1]}}
\def\@intrinsicopt(#1){\item[{\@tempq} (#1)]}

\def\MATRIX#1{\relax
    \@ifnextchar,{\@MATRIXTABS{}#1,\@FOO, \hskip0pt plus
1filll\penalty-1\@gobble
  }{\@ifnextchar;{\@MATRIXTABS{}#1,\@FOO; \hskip0pt plus
1filll\penalty-1\@gobble
  }{\@ifnextchar:{\@MATRIXTABS{}#1,\@FOO: \hskip0pt plus
1filll\penalty-1\@gobble
  }{\@ifnextchar.{\hfill\penalty1\null\penalty10000\hskip0pt plus 1filll
		  \@MATRIXTABS{}#1,\@FOO.\penalty-50\@gobble
  }{\@MATRIXTABS{}#1,\@FOO{ }\hskip0pt plus 1filll\penalty-1}}}}}

\def\@MATRIXTABS#1#2,{\@ifnextchar\@FOO{\@MATRIX{#1#2}}{\@MATRIXTABS{#1#2&}}
}
\def\@MATRIX#1\@FOO{\(\left[\begin{array}{rrrrrrrrrr}#1\end{array}\right]\)}

\def\@IFSPACEORRETURNNEXT#1#2{\def\@tempa{#1}\def\@tempb{#2}\futurelet\@
tempc\@ifspnx}

{
\FUNNYCHARACTIVE
\GLOBAL\DEF\FUNNYCHARDEF{\RELAX
    \DEFa{{\IT\CHAR"61}}\DEFb{{\IT\CHAR"62}}\DEFc{{\IT\CHAR"63}}\RELAX
    \DEFd{{\IT\CHAR"64}}\DEFe{{\IT\CHAR"65}}\DEFf{{\IT\CHAR"66}}\RELAX
    \DEFg{{\IT\CHAR"67}}\DEFh{{\IT\CHAR"68}}\DEFi{{\IT\CHAR"69}}\RELAX
    \DEFj{{\IT\CHAR"6A}}\DEFk{{\IT\CHAR"6B}}\DEFl{{\IT\CHAR"6C}}\RELAX
    \DEFm{{\IT\CHAR"6D}}\DEFn{{\IT\CHAR"6E}}\DEFo{{\IT\CHAR"6F}}\RELAX
    \DEFp{{\IT\CHAR"70}}\DEFq{{\IT\CHAR"71}}\DEFr{{\IT\CHAR"72}}\RELAX
    \DEFs{{\IT\CHAR"73}}\DEFt{{\IT\CHAR"74}}\DEFu{{\IT\CHAR"75}}\RELAX
    \DEFv{{\IT\CHAR"76}}\DEFw{{\IT\CHAR"77}}\DEFx{{\IT\CHAR"78}}\RELAX
    \DEFy{{\IT\CHAR"79}}\DEFz{{\IT\CHAR"7A}}\DEF[{{\RM\CHAR"5B}}\RELAX
   
\DEF]{{\RM\CHAR"5D}}\DEF-{\@IFSPACEORRETURNNEXT{{\CHAR"2D}}{{\IT\CHAR"2D}}}}
}

%%% Warning!  Devious return-character machinations in the next several
lines!
%%%           Don't even *breathe* on these macros!
{\RETURNACTIVE\global\def\RETURNDEF{\def
{\@ifnextchar\FNB{}{\@stopline\@ifnextchar
{\@NEWBNFRULE}{\penalty\@M\@startline\ignorespaces}}}}\global\def\@NEWBNFRUL
E
{\vskip\bnfsep\@startline\ignorespaces}\global\def\@ifspnx{\ifx\@tempc\@
sptoken \let\@tempd\@tempa \else \ifx\@tempc
\let\@tempd\@tempa \else \let\@tempd\@tempb \fi\fi \@tempd}}
%%% End of bizarro return-character machinations.

\def\IS{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth  \hskip-\bnfalign
  \hbox to \bnfalign{\unhbox\@curfield\hfill}\hbox to \bnfopwidth{\bf is
\hfill}}
\def\OR{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth \hbox to \bnfopwidth{\bf or \hfill}}
\def\R#1 {\hbox to 0pt{\hskip-\bnfmargin R#1\hfill}}
\def\XBNF{\FUNNYCHARDEF\FUNNYCHARACTIVE\RETURNDEF\RETURNACTIVE
  \def\@underbarchar{{\char"5F}}\tt\frenchspacing
  \advance\@totalleftmargin\bnfmargin \tabbing
  \hskip\bnfalign\hskip\bnfopwidth\hskip\bnfindent\=\kill\>\+\@gobblecr}
\def\endXBNF{\-\endtabbing}

\def\BNF{\BEGIN{XBNF}}
\def\FNB{\END{XBNF}}

\begingroup \catcode `|=0 \catcode`\\=12
|gdef|@XCODE#1\EDOC{#1|endtrivlist|end{tt}}
|endgroup

\def\CODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces \@XCODE}
\def\ICODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces
   \FUNNYCHARDEF\FUNNYCHARACTIVE \UNDERBARACTIVE\UNDERBARDEF \@XCODE}

\def\@underbarsub#1{{\ifmmode _{#1}\else {$_{#1}$}\fi}}
\let\@underbarchar\_
\def\@underbar{\let\@tempq\@underbarsub\if\@tempz
A\let\@tempq\@underbarchar\fi
  \if\@tempz B\let\@tempq\@underbarchar\fi\if\@tempz
C\let\@tempq\@underbarchar\fi
  \if\@tempz D\let\@tempq\@underbarchar\fi\if\@tempz
E\let\@tempq\@underbarchar\fi
  \if\@tempz F\let\@tempq\@underbarchar\fi\if\@tempz
G\let\@tempq\@underbarchar\fi
  \if\@tempz H\let\@tempq\@underbarchar\fi\if\@tempz
I\let\@tempq\@underbarchar\fi
  \if\@tempz J\let\@tempq\@underbarchar\fi\if\@tempz
K\let\@tempq\@underbarchar\fi
  \if\@tempz L\let\@tempq\@underbarchar\fi\if\@tempz
M\let\@tempq\@underbarchar\fi
  \if\@tempz N\let\@tempq\@underbarchar\fi\if\@tempz
O\let\@tempq\@underbarchar\fi
  \if\@tempz P\let\@tempq\@underbarchar\fi\if\@tempz
Q\let\@tempq\@underbarchar\fi
  \if\@tempz R\let\@tempq\@underbarchar\fi\if\@tempz
S\let\@tempq\@underbarchar\fi
  \if\@tempz T\let\@tempq\@underbarchar\fi\if\@tempz
U\let\@tempq\@underbarchar\fi
  \if\@tempz V\let\@tempq\@underbarchar\fi\if\@tempz
W\let\@tempq\@underbarchar\fi
  \if\@tempz X\let\@tempq\@underbarchar\fi\if\@tempz
Y\let\@tempq\@underbarchar\fi
  \if\@tempz Z\let\@tempq\@underbarchar\fi\@tempq}
\def\@under{\futurelet\@tempz\@underbar}

\def\UNDERBARACTIVE{\CATCODE`\_=13}
\UNDERBARACTIVE
\def\UNDERBARDEF{\def_{\protect\@under}}
\UNDERBARDEF

\catcode`\$=11  

%the following line would allow derived-type component references 
%FOO%BAR in running text, but not allow LaTeX comments
%without this line, write FOO\%BAR
%\catcode`\%=11 

\makeatother

%end of file syntax-macros.tex


\title{High Performance Fortran \\ Language Specification}
\author{High Performance Fortran Forum}
%\date{ }

\hyphenation{RE-DIS-TRIB-UT-ABLE sub-script Wil-liam-son}

\begin{document}

\maketitle

\newpage

\pagenumbering{roman}

\vspace*{4.5in}

This is the result of a LaTeX run of a draft of a single chapter of 
the HPFF Final Report document.

\vspace*{3.0in}

\copyright 1992 Rice University, Houston Texas.  Permission to copy 
without fee all or part of this material is granted, provided the 
Rice University copyright notice and the title of this document 
appear, and notice is given that copying is by permission of Rice 
University.

\tableofcontents

\newpage

\pagenumbering{arabic}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%put text of chapter here


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%put \end{document} here

%statements.tex

%Version of August 2, 1992 - David Loveman, Digital Equipment Corporation 
%	and Chuck Koelbel, Rice University

%Revision history:
%August 19, 1992 - chk - cleaned up discrepancies with Fortran 90 array 
%	expressions
%August 20, 1992 - chk - added DO INDEPENDENT section, Guy Steele's 
%	pointer proposals

\chapter{Statements\protect\footnote{Version of August 2, 1992 - David
Loveman, Digital Equipment Corporation and Chuck Koelbel, Rice University}}
\label{statements}

\section{Overview}

The purpose of the FORALL construct is to provide a convenient syntax for 
simultaneous assignments to large groups of array elements.
In this respect it is very similar to the functionality provided by array 
assignments and WHERE constructs.
FORALL differs from these constructs primarily in its syntax, which is 
intended to be more suggestive of local operations on each element of an 
array.
It is also possible to specify slightly more general array regions than 
are allowed by the basic array triplet notation.
Both single-statement and block FORALLs are defined in this proposal.

Recent discussions within the FORALL subgroup have made it clear that 
some are opposed to the construct on the grounds that it is unnecessary 
and perhaps confusing.
It is, however, clear that others in the group strongly support the added 
expressiveness provided by FORALL.
Because of this disagreement, the committee recommends that if FORALL is 
accepted into HPF, that it not be included in the official subset.
We feel, however, that it is important to define the construct so that 
implementations with this functionality have consistent semantics.

The following proposal is designed as a modification of the Fortran 90 
standard; all references to rule numbers and section numbers pertain to 
that document unless otherwise noted.


\section{Element Array Assignment - FORALL\protect\footnote{Version of 
August 20, 1992 - David
Loveman, Digital Equipment Corporation and Chuck Koelbel, Rice University}}
 

\label{forall-stmt}

The element array
assignment statement is used to specify an array assignment in terms of
array elements or array sections.  The element array assignment may be
masked with a scalar logical expression.  Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsection{General Form of Element Array Assignment}

                                                                       \BNF
forall-stmt          \IS FORALL (forall-triplet-spec-list 
                           [,scalar-mask-expr ]) forall-assignment

forall-triplet-spec  \IS subscript-name = subscript : subscript 
                           [ : stride]
                                                                       \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

                                                                       \BNF
forall-assignment    \IS array-element = expr
                     \OR array-section = expr
                                                                       \FNB

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(INT((m2 -m1 + m3) / m3) \leq 0\), the {\it forall-assignment} is not
executed.

Examples of element array assignments are:

                                                                  \CODE
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, A(I,J) .NE. 0.0) B(I,J) = 1.0 / A(I,J)
                                                                  \EDOC 

\subsection{Interpretation of Element Array Assignments}  

% Check that the following are consistent with array expressions:
% 3. Side effects may affect global variables, provided they are 
%    order-independent

Execution of an element array assignment consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Evaluation in any order of the {\it expr} and all subscripts 
contained in the 
{\it array-element} or {\it array-section} in the {\it forall-assignment} 
for all active combinations of {\em subscript-name} values.

\item Assignment of the computed {\it expr} values to the corresponding 
elements specified by {\it array-element} or {\it array-section}.

\end{enumerate}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.
Similarly, the evaluations of {\em expr}, {\it array-element} or {\it 
array-section} may not cause any scalar data object to be assigned a 
value more than once, nor may they cause an array element to be assigned 
which is also assigned directly by the {\it forall-assignment}.
Local variables within two instantiations of the same function do not 
refer to the same data object unless they have the SAVE attribute or are 
sequence or storage associated with the same object.
 
The evaluation of the {\it expr} for a particular active combination of
{\it subscript-name} values may neither affect nor be affected by 
the evaluation of {\it expr} for any other combination of {\it 
subscript-name} values.
In particular, functions cannot produce side effects that are visible in 
the FORALL; 
nor may global variables be updated by functions unless the results of 
those updates are independent of the order of execution of {\it 
subscript-name} values.
The evaluation of the {\it expr} or any subscript on the left-hand side 
of the {\it forall-assignment} for any active combination of {\it 
subscript-name} values may not affect 
nor be affected by 
the evaluation of any subscript in the {\it forall-assignment}, either 
for the same 
combination of {\it subscript-name} values or a different active
combination.
In particular, a function reference
appearing in any expression in the {\it forall-assignment} must not
redefine any subscript name.


\subsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

                                                                   \CODE
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn , mask ) &
      a(e1,...,em) = rhs
                                                                   \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
	  ...
  END DO
END DO

!then evaluate the expr in the forall-assignment for all valid 
!combinations of subscript names for which the scalar mask 
!expression is true (it is safe to avoid saving the subscript 
!expressions because of the conditions on FORALL expressions)

DO v1=templ1,tempu1,temps1
  DO v2=tel2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
	  ...
  END DO
END DO

!then perform the assignment of these values to the corresponding 
!elements of the array being assigned to

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(e1,...,em) = temprhs(v1,v2,...,vn)
        END IF
      END DO
	  ...
  END DO
END DO
                                                                      \EDOC

\subsubsection{Consequences of the Definition of the FORALL Statement}

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  
(This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Right-hand sides and subscripts on the left hand side of a {\it 
forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\item Side-effects of function calls are allowed 
under the usual condition of Fortran statements that order of evaluation 
of subexpressions is undefined.
This principle has been extended so that side effects in computing one 
array element cannot affect other array elements.

\item Side-effects cannot assign to the same global array element or 
element of a function's actual argument twice.
This effectively disallows accumulations in function calls (including 
recording how many times a function is called).
It is still legal, however, to create side effects on {\it different} 
global array elements.

\item I am unable to find a clear answer to the following in the 
Fortran~90 standard: Can the evaluation of a function affect the values 
of global objects not referenced in the calling statement?
For example, is the following legal?
                                                                 \CODE
X = F(1) + F(2)
...
REAL FUNCTION F(K)
COMMON /HUH/ ICOUNT
ICOUNT = ICOUNT+1
F = K
END
																 \EDOC
The assignments to ICOUNT do not affect the values computed by F, and the 
final value of ICOUNT is mathematically independent of the order of 
evaulation here.
The constraints on FORALL evaluation above would not allow F to be called 
from a FORALL; if this is inconsistent with array expressions, the 
proposal should be amended or a note of the inconsistency made in the text.

\item Distinct function instantiations explicitly have distinct sets of 
local variables, to remove ambiguity about whether the following is legal:
                                                                  \CODE
FORALL ( I = 1:N ) A(I) = FOO( I )
...
INTEGER FUNCTION FOO( I )
INTEGER I, J, K
J = 1
K = I
DO WHILE ( K .GT. 1 )
  J = J+1
  IF (MOD(K,2) .EQ. 0) THEN
    K = K / 2
  ELSE
    K = K * 3 + 1
  END IF
END DO
FOO = K
END
                                                                 \EDOC 
Assuming distinct function calls have their own variables, there are no 
side effects to any global variable.
This is consistent (some might argue implied by) Section~12.5.2.4 of the 
Fortran~90 standard.
I don't claim this is particularly easy to implement on all machines.

\item This proposal is mute on whether I/O is allowed in functions called 
from FORALL statements.
This could, and probably should, be added as a constraint in the 
interpretation section.

\end{itemize}


\section{FORALL Construct\protect\footnote{Version of August 20, 1992 - David
Loveman, Digital Equipment Corporation and Chuck Koelbel, Rice University}}

\label{forall-construct}

The FORALL construct is a generalization of the element array
assignment statement allowing multiple assignments, masked array 
assignments, and nested FORALL statements to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it
forall-construct}.

\subsubsection{General Form of the FORALL Construct}

                                                                    \BNF
forall-construct        \IS FORALL (forall-triplet-spec-list 
                                  [,scalar-mask-expr ])
                               forall-body-stmt-list
                            END FORALL

forall-body-stmt     \IS forall-assignment
                     \OR where-stmt
                     \OR forall-stmt
                     \OR forall-construct
                                                                    \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\noindent
Constraint:  Any left-hand side {\it array-section} or {\it 
array-element} in any {\it forall-body-stmt}
must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
within a {\it forall-construct}, then the inner FORALL may not redefine 
any {\it subscript-name} used in the outer {\it forall-construct}.
This rule applies recursively in the event of multiple nesting levels.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(INT((m2 -m1 + m3) / m3) \leq 0\), the {\it forall-assignment}s are not 
executed.

Examples of the FORALL construct are:

                                                                 \CODE
FORALL ( i = 2:n-1, j = 2:i-1 )
  a(i,j) = a(i,j-1) + a(i,j+1) + a(i-1,j) + a(i+1,j)
  b(i,j) = a(i,j)
END FORALL

FORALL ( i = 1:n-1 )
  FORALL ( j = i+1:n )
    a(i,j) = a(j,i)
  END FORALL
END FORALL

FORALL ( i = 1:n, j = 1:n )
  a(i,j) = MERGE( a(i,j), a(i,j)**2, i.eq.j )
  WHERE ( .not. done(i,j,1:m) )
    b(i,j,1:m) = b(i,j,1:m)*x
  END WHERE
END FORALL
								
		                                                       
         \EDOC


\subsection{Interpretation of the FORALL Construct}  

Execution of a FORALL construct consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Execute the {\it forall-body-stmts} in the order they appear.
Each statement is executed completely (that is, for all active 
combinations of {\it subscript-name} values) according to the following 
interpretation:

\begin{enumerate}

\item Assignment statements and array assignment statements (i.e. 
statements in the {\it forall-assignment} category) evaluate the 
right-hand side {\it expr} and any left-and side subscripts for all 
active {\it subscript-name} values,
then assign those results to the corresponding left-hand side references.

\item WHERE statements evaluate their {\it mask-expr} for all active 
combinations of values of {\it subscript-name}s.
All elements of all masks may be evaluated in any order. 
The assignments within the WHERE branch of the statement are then 
executed in order using the above interpretation of array assignments 
within the FORALL, but the only array elements assigned are those 
selected by both the active {\it subscript-names} and the WHERE mask.
Finally, the assignments in the ELSEWHERE branch are executed if that 
branch is present.
The assignments here are also treated as array assignments, but elements 
are only assigned if they are selected by both the active combinations 
and by the negation of the WHERE mask.

\item FORALL statements and FORALL constructs first evaluate the 
subscript and stride expressions in 
the {\it forall-triplet-spec-list} for all active combinations of the 
outer FORALL constructs.
The set of valid combinations of {\it subscript-names} for the inner 
FORALL is then the union of the sets defined by these bounds and strides 
for each active combination of the outer {\it subscript-names}.
For example, the valid set of the inner FORALL in the second example in 
the last section is the upper triangle (not including the main diagonal) 
of the \(n \times n\) matrix a.
The scalar mask expression is then evaluated for all valid combinations 
of the inner FORALL's {\it subscript-names} to produce the set of active 
combinations.
If there is no scalar mask expression, it is assumed to be always true.
Each statement in the inner FORALL is then executed for each valid 
combination (of the inner FORALL), recursively following the 
interpretations given in this section.

\end{enumerate}

\end{enumerate}

If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL construct itself. 

A single assignment or array assignment statement in a {\it 
forall-construct} must obey the same restrictions as a {\it 
forall-assignment} in a simple {\it forall-stmt}.
(Note that the lowest level of nested statements must always be an 
assignment statement.)
For example, an assignment may not cause the same array element to be 
assigned more than once.
It is, however, permitted that different statements may assign to the 
same array element, or that the evaluation of subexpressions in one 
statement affect the execution of a later statement.
Evaluation of the mask or subscript bounds and stride 
expressions in an inner WHERE or FORALL for one active combination of 
{\it subscript-name} values may not affect nor be affected by the 
evaluations of those subexpressions for any other active combination.

\subsection{Scalarization of the FORALL Construct}

A {\it forall-construct} othe form:

                                                                \CODE
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
                                                                \EDOC

where each si is an assignment is equivalent to the following scalar code:

                                                                \CODE
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
                                                                \EDOC

A similar statement can be made using FORALL constructs when the 
si may be WHERE or FORALL constructs.

A {\it forall-construct} of the form:

                                                                \CODE
FORALL ( v1=l1:u1:s1, mask )
  WHERE ( mask2(l2:u2) )
    a(vi,l2:u2) = rhs1
  ELSEWHERE
    a(vi,l2:u2) = rhs2
  END WHERE
END FORALL
                                                                \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the masks for the WHERE

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    tmpl2(v1) = l2
    tmpu2(v1) = u2
    tempmask2(v1,tmpl2(v1):tmpu2(v1)) = mask2(tmpl2(v1):tmpu2(v1))
  END IF
END DO

!then evaluate the WHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs1(v1,tmpl2(v1):tmpu2(v1)) = rhs1
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs1(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO

!then evaluate the ELSEWHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs2(v1,tmpl2(v1):tmpu2(v1)) = rhs2
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs2(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO
                                                                   \EDOC


A {\it forall-construct} of the form:

                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  FORALL ( v2=l2:u2:s2, mask2 )
    a(e1,e2) = rhs1
	b(e3,e4) = rhs2
  END FORALL
END FORALL
                                                                   \EDOC

is equivalent to the following standard Fortran 90 code:


                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the inner FORALL bounds, etc

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    templ2(v1) = l2
    tempu2(v1) = u2
    temps2(v1) = s2
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      tempmask2(v1,v2) = mask2
	END DO
  END IF
END DO

!first statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        temprhs1(v1,v2) = rhs1
	  END IF
	END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        a(e1,e2) = temprhs1(v1,v2)
	  END IF
	END DO
  END IF
END DO

!second statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        temprhs2(v1,v2) = rhs2
	  END IF
	END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        b(e3,e4) = temprhs2(v1,v2)
	  END IF
	END DO
  END IF
END DO
                                                                   \EDOC


\subsubsection{Consequences of the Definition of the FORALL Construct}

\begin{itemize}

\item A block FORALL means the same as replicating the FORALL
header in front of each array assignment statement in the block, except
that any expressions in the FORALL header are evaluated only once,
rather than being re-evaluated before each of the statements in the body.
(This statement needs some modification in the case of nesting.)

\item One may think of a block FORALL as synchronizing twice per
contained assignment statement: once after handling the rhs and other 
expressions
but before performing assignments, and once after all assignments have
been performed but before commencing the next statement.  (In practice,
appropriate dependence analysis will often permit the compiler to
eliminate unnecessary synchronizations.)

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item In general, any expression in a FORALL is evaluated only for valid 
combinations of all surrounding subscript names for which all the
scalar mask eressions are true.

\item Nested FORALL bounds and strides can depend on outer FORALL {\it 
subscript-names}.  They cannot redefine those names, even temporarily (if 
they did there  would be no way to avoid multiple assignments to the same 
array element).

\item Dependences are allowed from one statement to later statements, but 
never from an assignment statement to itself.
Masks and subscript bounds could conceivably have side effects visible in 
the rest of the nested statement.

\end{itemize}


\section{The INDEPENDENT Directive\protect\footnote{Version of August 20,
1992
 - Guy Steele, Thinking Machines Corporation, and Chuck Koelbel, Rice
University}}

\label{do-independent}


Let there be a directive
                                                  \CODE
!HPF$INDEPENDENT
                                                  \EDOC
that can precede a DO loop.
It asserts to the compiler that the iterations of the loop
may be executed independently--that is, in any order, or
interleaved, or concurrently--without changing the semantics
of the program.  (The compiler is justified in producing
a warning if it can prove otherwise.)
                                                  \CODE
!HPF$INDEPENDENT
      DO I=1,100
        A(P(I))=B(I)   !I happen to know that P is a permutation
      END DO
                                                  \EDOC

One may apply this directive to a nest of multiple loops
by listing all the loop variables of the loops in question;
the loops must be contiguous with the directive and in the
same order that the variables are listed:
                                                  \CODE
!HPF$INDEPENDENT (I1,I2,I3)
      DO I1 = ...
        DO I2 = ...
          DO I3 = ...
            DO I4 = ...    !The inner two loops are *not* independent!
              DO I5 = ...
                ...
              END DO
            END DO
          END DO
        END DO
      END DO
                                                  \EDOC

These directives are purely advisory and a compiler is free
to ignore them if it cannot make use of the information.

This directive is of course similar to the DOSHARED directive
of Cray MPP Fortran.  A different name is offered here to avoid
even the hint of commitment to execution by a shared memory machine.
Also, the "mechanism" syntax is omitted here, though we might want
to adopt it as further advice to the compiler about appropriate
implementation strategies, if we can agree on a desirable set
of options.


\section{Other Proposals}

The proposals in this section have not been approved, even as a first 
reading.
Sections~\ref{begin-independent}, \ref{forall-pointer}, 
\ref{forall-allocate}, and \ref{data-ref} 
extend parts of the previous sections and/or the Fortran~90 standard.
Section~\ref{forall-elemental} is an alternative to the treatment of 
function calls in Sections~\ref{forall-stmt} and~\ref{forall-construct}.

\subsection{FORALL with INDEPENDENT Directives\protect\footnote{Version 
of July 21, 1992) - Min-You Wu}}
\label{begin-independent}

This proposal is an extension of Guy Steele's INDEPENDENT proposal.
We propose a block FORALL with the directives for independent 
execution of statements.  The INDEPENDENT directives are used
in a block style.  

The block FORALL is in the form of
\begin{verbatim}
      FORALL (...) [ON (...)]
        a block of statements
      END FORALL
\end{verbatim}
where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:
\begin{verbatim}
!HPF$BEGIN INDEPENDENT
!HPF$END INDEPENDENT
\end{verbatim}
The two directives must be used in pair.  There is a synchronization 
at each of these directives.  A sub-block of statements parenthesized 
in the two directives is called an {\em asynchronous} sub-block 
or {\em independent} sub-block.  The statements that are not in 
an asynchronous sub-block are in {\em synchronized} sub-blocks
or {\em non-independent} sub-block.  The synchronized sub-block is 
the same as Guy Steele's synchronized FORALL statement, and the 
asynchronous sub-block is the same as the FORALL with the INDEPENDENT 
directive.  Thus, the block FORALL
\begin{verbatim}
      FORALL (e)
        b1
!HPF$BEGIN INDEPENDENT
        b2
!HPF$END INDEPENDENT
        b3
      END FORALL
\end{verbatim}
means roughly the same as
\begin{verbatim}
      FORALL (e)
        b1
      END FORALL
!HPF$INDEPENDENT
      FORALL (e)
        b2
      END FORALL
      FORALL (e)
        b3
      END FORALL
\end{verbatim}

Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronous sub-block are completely independent.
The INDEPENDENT directives indicates to the compiler there is no 
dependence and consequently, synchronizations are not necessary.
It is users' responsibility to ensure there is no dependence
between instances in an asynchronous sub-block.
A compiler can do dependence analysis for the asynchronous sub-blocks
and issues an error message when there exists a dependence or a warning
when it finds a possible dependence.

\noindent
{\bf What does mean "no dependence between instances"?}

It means that no true dependence, anti-dependence,
or output dependence between instances.
Examples of these dependences are shown below:

\noindent
1) true dependence
\begin{verbatim}
      FORALL (i = 1:N)
        x(i) = ... 
        ...  = x(i+1)
      END FORALL
\end{verbatim}
Notice that dependences in FORALL are different from that in a DO loop.
If the above example was a DO loop, that would be an anti-dependence.

\noindent
2) anti-dependence:

\begin{verbatim}
      FORALL (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END FORALL
\end{verbatim}

\noindent
3) output dependence:
\begin{verbatim}
      FORALL (i = 1:N)
        x(i+1) = ... 
        x(i) = ...
      END FORALL
\end{verbatim}

Independent does not imply no communication.  One instance may access 
data in the other instances, as long as it does not cause a dependence.  
The following example is an independent block:

\begin{verbatim}
      FORALL (i = 1:N)
!HPF$BEGIN INDEPENDENT
        x(i) = a(i-1)
        y(i-1) = a(i+1)
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
{\bf Statements that can appear in FORALL}

There is no restriction on the type of statements in an asynchronous 
sub-block.  That is, FORALL statements, DO loops, WHILE loops, 
WHERE-ELSEWHERE statements, IF-THEN-ELSE statements, CASE statements, 
subroutine and function calls, etc. can appear, as long as there is
no dependence.  On the other hand, the statements that can appear in a 
synchronized sub-block are restricted.  The degree of restrictions will 
be determined later.  The following statements could be allowed:
assignment statements, FORALL statements, DO loops, WHERE statements 
(not WHERE-ELSEWHERE), IF statements (not IF-THEN-ELSE) and some 
intrinsic functions (and elemental functions and subroutines).

Some examples are given below for the asynchronous sub-blocks:

\noindent
1) FORALL statement
\begin{verbatim}
      FORALL (I = 1 : N)
        A(I,0) = A(I-1,0)
!HPF$BEGIN INDEPENDENT
        FORALL (J = 1 : N)
          A(I,J) = A(I,0) + B(I-1,J-1)
          C(I,J) = A(I,J)
        END FORALL
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
2) DO loop
\begin{verbatim}
      FORALL (I = 1 : N)
!HPF$BEGIN INDEPENDENT
        DO J = 1, N 
          A(I) = A(I) * B(I)
        END DO 
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
3) WHILE loop
\begin{verbatim}
      FORALL (I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHILE (A(I) < BIG) DO
          A(I) = A(I) * B(I)
        END DO 
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
4) IF-THEN-ELSE
\begin{verbatim}
      FORALL ( I = 1 : N )
!HPF$BEGIN INDEPENDENT
        IF ( A(I) < EPS ) THEN                
          A(I) = 0.0                          
          B(I) = 0.0                          
        ELSE
          TMP(I) = B(I)
          B(I) = A(I)                       
          A(I) = TMP(I)
        END IF
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
5) WHERE
\begin{verbatim}
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
6) subroutine CALL
\begin{verbatim}
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        A(I) = C(I)
        CALL FOO(A(I))
!HPF$END INDEPENDENT
      END FORALL

      SUBROUTINE FOO(x)
      real :: x
      x = x * 10 + x 
      RETURN
      END
\end{verbatim}

\noindent
Another example for subroutine CALL:

\begin{verbatim}
      FORALL(I = 1 : N)
        A(I) = C(I)
!HPF$BEGIN INDEPENDENT
        CALL FOO(B(I), A(I-1), A(I+1))
!HPF$END INDEPENDENT
      END FORALL

      SUBROUTINE FOO(x, y, z)
      real :: x, y, z
      x = (y + z) / 2
      RETURN
      END
\end{verbatim}

\vspace{.1in}
\noindent
{\bf Rationale}

1. A FORALL with a single asynchronous sub-block as shown below is 
the same as a do independent (or doall, or doeach, or parallel do, etc.).
\begin{verbatim}
      FORALL (e)
!HPF$BEGIN INDEPENDENT
        b1
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}
A FORALL without any INDEPENDENT directive is the same as a tightly 
synchronized FORALL.  We only need to define one type of parallel 
constructs including both synchronized and asynchronous blocks.  
Furthermore, combining asynchronous and synchronized FORALLs, we 
have a loosely synchronized FORALL which is more flexible for many 
loosely synchronous applications.

2. With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  The INDEPENDENT directives can act 
as barrier synchronizations.  One may suggest a smart compiler 
that can recognize dependences and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependences.  
When the compiler cannot determine whether there is a dependence, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.


\subsection{Pointer Assignments in FORALL\protect\footnote{Version of 
July 28, 1992 - Guy Steele, Thinking Machines Corporation}}
\label{forall-pointer}


Proposal: Pointer assignments may appear in the body of a FORALL.
	(It is not necessary to support this proposal until one
	supports derived types, as that is the only way of specifying
	assignment to more than one pointer at a time.)

Rationale: this is just another kind of assignment.

Example:
                                                    \CODE
      TYPE MONARCH
        INTEGER, POINTER :: P
      END TYPE MONARCH
      TYPE(MONARCH) :: A(N)
      INTEGER B(N)
      ...
C  Set up a butterfly pattern
      FORALL (J=1:N)  A(J)%P => B(1+IEOR(J-1,2**K))
                                                    \EDOC
\subsection{ALLOCATE in FORALL\protect\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation}}

\label{forall-allocate}


Proposal:  ALLOCATE, DEALLOCATE, and NULLIFY statements may appear
	in the body of a FORALL.

Rationale: these are just another kind of assignment.  They may have
	a kind of side effect (storage management), but it is a
	benign side effect (even milder than random number generation).

Example:
                                                            \CODE
      TYPE SCREEN
        INTEGER, POINTER :: P(:,:)
      END TYPE SCREEN
      TYPE(SCREEN) :: S(N)
      INTEGER IERR(N)
      ...
!  Lots of arrays with different aspect ratios
      FORALL (J=1:N)  ALLOCATE(S(J)%P(J,N/J),STAT=IERR(J))
      IF(ANY(IERR)) GO TO 99999
                                                            \EDOC
\subsection{Generalized Data References\protect\footnote{Version of July
28, 1992 
- Guy Steele, Thinking Machines Corporation}}

\label{data-ref}

Proposal:  Delete the constraint in section 6.1.2 of the Fortran 90
	standard (page 63, lines 7 and 8):
\begin{quote}
	Constraint: In a data-ref, there must not be more than one
		part-ref with nonzero rank.  A part-name to the right
		of a part-ref with nonzero rank must not have the
		POINTER attribute.
\end{quote}

Rationale: further opportunities for parallelism.

Example:
                                                                     \CODE
      TYPE(MONARCH) :: C(N), W(N)
      ...
C  Munch that butterfly
      C = C + W * A%P		!Currently illegal in Fortran 90
                                                                      \EDOC
																	  
\subsection{ELEMENTAL Functions\protect\footnote{Version of August 20, 
1992 - John Merlin, University of Southhampton, and Chuck Koelbel, Rice 
University}}

\label{forall-elemental}

The intent of this counter-proposal is to further restrict functions 
called from within FORALL so that they have no side effects.
This is more restrictive than the Fortran~90 constraints on function 
calls in array assignments; however, the definition is simpler and 
presumably clearer.

\subsubsection{General Form of Element Array Assignment}

To the definition of {\it forall-assignment}, add the following:

\noindent Constraint: If any subexpression in {\it expr}, {\it 
array-element}, or {\it array-section} is a {\it function-reference}, 
then the {\it function-name} must be an ELEMENTAL function as defined
below.


\subsubsection{Interpretation of Element Array Assignments}  

Change the paragraphs after the step-by-step interpretation to the 
following:

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.
By the nature of ELEMENTAL functions, no expression evaluations can 
have any affect on other expressions, either for the same combination of 
{\it subscript-name} values or for a different combination.
 
\subsubsection{ELEMENTAL Procedure}

An elemental procedure is one which produces no side effects except for 
returning a value or assigning to scalar dummy arguments of type OUT or 
INOUT.
It may be used in any way that a normal procedure of its type may 
be used.
In addition, elemental functions may be called from a FORALL statement.
An array expression may also apply an elemental function to all elements 
of an array.

A procedure may be asserted to be elemental at its definition or at its 
interface.
The form of the assertion is a directive
                                                                 \BNF
elemental-directive \IS !HPF$ ELEMENTAL
                                                                 \FNB
																 
To assert that a definition defines an elemental procedure, Rules~R1215 
and R1219 are 
changed to
                                                                 \BNF
function-subprogram \IS [elemental-directive]
                        function-stmt
						[specification-part]
						[execution-part]
						[internal-subprogram-part]
						end-function-stmt
                                                                \FNB

\noindent
Constraint: All dummy arguments to the function must have INTENT(IN).

\noindent
Constraint: No local variable in {\it specification-part} may have the 
SAVE attribute.

\noindent
Constraint: No executable statement in {\it execution-part} or {\it 
internal-subprogram-part} may assign to a global data object.

\noindent
Constraint: No executable statement in {\it execution-part} or {\it 
internal-subprogram-part} may be an I/O statement.

\noindent
Constraint: Any function or subroutine called from {\it execution-part} 
or {\it internal-subprogram-part} must be an elemental function.

                                                                 \BNF
subroutine-subprogram \IS [elemental-directive]
                        subroutine-stmt
						[specification-part]
						[execution-part]
						[internal-subprogram-part]
						end-subroutine-stmt
                                                                \FNB

\noindent
Constraint: All dummy arguments must have explicit INTENT.

\noindent
Constraint: All dummy arguments to the subroutine with INTENT(OUT) or 
INTENT(INOUT) must be of scalar type.

\noindent
Constraint: No local variable in {\it specification-part} may have the 
SAVE attribute.

\noindent
Constraint: No executable statement in {\it execution-part} or {\it 
internal-subprogram-part} may assign to a global data object.

\noindent
Constraint: No executable statement in {\it execution-part} or {\it 
internal-subprogram-part} may be an I/O statement.

\noindent
Constraint: Any function or subroutine called from {\it execution-part} 
or {\it internal-subprogram-part} must be an elemental function.


To define elemental interfaces, Rule~R1204 is changed to
                                                                \BNF
interface-body \IS [elemental-directive]
                   function-stmt
				   [specification-part]
				   end-function-stmt
			   \OR [elemental-directive]
                   subroutine-stmt
				   [specification-part]
				   end-subroutine-stmt
                                                                \FNB

\noindent
Constraint: All dummy arguments must have explicit INTENT.

\noindent
Constraint: All dummy arguments to a subroutine with INTENT(OUT) or 
INTENT(INOUT) must be of scalar type.
All dummy arguments to a function must have INTENT(IN).

When applied to a procedure interface, {\it elemental-directive} asserts 
that the function will make no assignment to any global data object except
for 
actual arguments corresponding to INTENT(OUT) or INTENT(INOUT) arguments, 
and that the function will perform no I/O.
The compiler does not need to check this assertion, although if the 
definition of the function has an {\it elemental-directive} it will be 
true.
Only procedures with explicit interfaces may be asserted to be elemental.
It is allowed for the interface to a function to include ELEMENTAL even 
when its definition does not include it, or vice versa.

\paragraph{Comments}

Fortran 90 introduces the concept of 'elemental procedures',
which are defined for scalar arguments but may also be applied to
conforming array-valued arguments.  
For an elemental function,
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g SIN(X).
For an elemental subroutine, the effect on each element of an INTENT(OUT)
or 
INTENT(INOUT) array argument is would be obtained by calling the 
subroutine with the corresponding elements of the arguments.
An example is the intrinsic routine MVBITS.

Unfortunately, Fortran~90 restricts the application of elemental
procedures to a subset of the intrinsic procedures --- the programmer
cannot write his own.  We therefore propose the extension of 
allowing the programmer to define elemental procedures.

Detailed comments:
\begin{itemize}
\item The constraints are obviously designed so that the function can
be invoked concurrently at each element of an array.
The constraints under the procedure definition part are sufficient (but 
not necessary) for this.

\item An earlier draft proposed allowing the dummy arguments of elemental 
functions to be themselves arrays.
These provisions were dropped to avoid promoting storage order to a 
higher level in Fortran~90.

\item An earlier draft of this proposal contained constraints disallowing 
elmental functions to access global data objects, particularly 
distributed data objects.
These have been dropped as inessential to the side-effect freedom that 
the HPF committee requested.
However, it may well be that some machines will have great difficulty 
implementing FORALLs without these constraints.

\item These rules show that a compiler cannot deduce from a procedure's 
interface whether it can validly be used as an elemental procedure, 
as that depends on its local declarations and internal operations.  
Hence, it is necessary to use a specifier like 'ELEMENTAL' in the 
procedure interface to identify such procedures.  The compiler can 
check that the procedure satisfies all the necessary constraints 
when it compiles the procedure itself.

\end{itemize}

\paragraph{Uses and MIMD aspects}

\subparagraph{FORALL statements and constructs}

Elemental functions may be used in expressions 
in FORALL statements and 
constructs, unlike general functions.
This includes their use in array expressions in FORALL statements and 
constructs, using the features described below.

Because a {\it forall-assignment}
may be an {\it array-assignment} the elemental
function can have an array result.  
For example, if a certain problem
is data-parallel over a 2d grid, and the data structure at each grid
point is a vector of length 3 (2d QCD?), we could have:
                                                              \CODE
	REAL  v (3,10,10)
	INTERFACE
	  ELEMENTAL FUNCTION f (x)
	    REAL, DIMENSION(3) :: f, x
	  END FUNCTION f
	END INTERFACE
	...
	FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 
                                                              \EDOC


A natural way of obtaining some MIMD parallelism is by
means of branches within an elemental function which depend on argument 
values.  These branches can be governed by content-based or index-based 
conditionals (the latter in a FORALL context).  For example:
                                                              \CODE
	ELEMENTAL FUNCTION f (x, i)
	  IF (x > 0) THEN     ! content-based conditional
	    ...
	  ELSE IF (i==1 .OR. i==n) THEN    ! index-based conditional
	    ...
	  ENDIF
	END FUNCTION

	...
	FORALL (i=1:n)  x (i) = f(x(i), i)
	...
                                                              \EDOC
Content-based conditionals can be exploited generally, including in
array assignments, which may sometimes obviate the need for WHERE 
statements and constructs with their potential synchronisation overhead. 

\subparagraph{Array expressions}

Elemental functions returning a scalar result can be used in array 
expressions with the 
same interpretation as Fortran~90 elemental intrinsic functions.
This interpretation (Fortran~90 standard, Section~13.2.1) is as follows:
\begin{quote}
If a generic name or a specific name is used to reference an elemental 
intrinsic function, the shape of the result is the same as the shape of 
the argument with the greatest rank.
If the arguments are all scalar, the result is scalar.
For those elemental intrinsic functions that have more than one argument, 
all arguments must be conformable.
In the array-valued case, the values of the elements, if any of the 
result are the same as would have been obtained if the scalar-valued 
function had been applied separately, in any order, to corresponding 
elements of each argument.  An argument called KIND must be specified as 
a scalar integer initialization expression and must specify a 
representation method for the function result that exists on the processor.
\end{quote}
We propose the following extensions to this interpretation:
\begin{itemize}
\item Array dummy arguments are allowed in elemental functions; the 
ranks of the actual arguments corresponding to these dummies must match 
the ranks of the dummies.
\item The shape of the result is the same as the shape of the highest-rank
actual 
argument matching a scalar dummy argument.
\item Actual arguments corresponding to scalar dummy arguments must be 
either conformable with other actuals or scalar.
\end{itemize}

\subparagraph{CALL statements}

Elemental subroutines can be called with array arguments as Fortran~90 
elemental subroutine intrinsics are, using the same interpretation.
This interpretation (Fortran~90 standard, Section~13.2.2) is as follows:
\begin{quote}
An elemental subroutine is one that is specified for scalar arguments, 
but may be applied to array arguments.  In a reference to an elemental 
intrinsic subroutine, either all actual arguments must be scalar or all 
INTENT(OUT) or INTENT(INOUT) arguments must be arrays of the same shape 
and the remaining arguments must be conformable with them. In the case 
that the INTENT(OUT) and INTENT(INOUT) arguments are arrays, the values 
of the elements, if any, of the results are the same as would be obtained 
if the subroutine with scalar arguments were applied separately, in any 
order, to corresponding elements of each argument.
\end{quote}
We propose the following extensions to this interpretation:
\begin{itemize}
\item The ranks of actual arguments corresponding to array dummy 
arguments must be equal to the ranks of those dummies.
\item Actual arguments corresponding to scalar dummy arguments with 
INTENT(IN) may be either scalar or arrays; if they are arrays, then they 
must be conformable with other array actuals.
\item All actual arguments with INTENT(OUT) or INTENT(INOUT) must be 
scalar or all must be arrays of the same shape; in the latter case, any 
other array actuals corresponding to scalar dummy arguments must be 
conformable with the the INTENT(OUT) and INTENT(INOUT) actuals.
\end{itemize}


\subsection{A Proposal for MIMD Support in HPF\protect\footnote{Version 
of July 18, 1992 - Clemens-August Thole, GMD I1.T}}

\label{mimd-support}
	          

\subsubsection{Abstract}

This proposal tries to supply sufficient language support in order 
to deal with loosely sysnchronous programs, some of which have been 
identified in my "A vote for explicit MIMD support".
This is a proposal for the support of MIMD parallelism, which extends
Section~\ref{do-independent}. 
It is more oriented
towards the CRAY - MPP Fortran Programming Model and the PCF proposal. 
The fine-grain synchronization of PCF is not proposed for implementation.
Instead of the CRAY-mechanims for assigning work to processors an 
extension of the ON-clause is used.
Due to the lack of fine-grain synchronization the constructs can be
executed
on SIMD or sequential architectures just by ignoring the additional
information.


\subsubsection{Summary of the current situation of MIMD support as part of
HPF}

According to the Chuck Koelbel's (Rice) mail dated March 20th "Working
Group 4 -
Issues for discussion" MIMD-support is a topic for discussion within
working
group 4. 

Dave Loveman (DEC) has produced a document on FORALL statements 
(inorporated in Sections~\ref{forall-stmt} and \ref{forall-construct})
which
summarizes the discussion. Marc Snir proposed some extensions. These
constructs allow to describe SIMD extensions in an extended way compared
to array assignments. 

A topic for working papers is the interface of HPF Fortran to program units
which execute in SPMD mode. Proposals for "Local Subroutines" have been
made
by Marc Snir and Guy Steele
(Chapter~\ref{foreign}). Both proposals
define local subroutines as program units, which are executed by all
processors independent of each other. Each processor has only access
to the data contained in its local memory. Parts of distributed data
objects
can be accessed and updated by calls to a special library. Any
message-passing
library might be used for synchronization and communication.
This approach does not really integrate MIMD-support into HPF programming.

The MPP Fortran proposal by Douglas M. Pase, Tom MacDonald, Andrew Meltzer
(CRAY)
contained the following features in order to support integrated MIMD
features:
\begin{itemize}
   \item  parallel directive
   \item  shared loops 
   \item  private variables
   \item  barrier synchronization
   \item  no-barrier directive for removing synchronization
   \item  locks, events, critical sections and atomic update
   \item  functions, to examine the mapping of data objects.
\end{itemize}

Steele's "Proposal for loops in HPF" (02.04.92) included a proposal for a 
directive "!HPF$ INDEPENDENT( integer_variable_list)", which specifies
for the next set of nested loops, that the loops with the specified
loop variables can be executed independent from each other.
(Sectin~\ref{do-independent} is a short version of this proposal.) 

Chuck Koelbel gave an overview on different styles for parallel loops
in "Parallel Loops Position Paper". No specific proposal was made.

Min-You Wu "Proposal for FORALL, May 1992" extended Guy Steele's 
"!HPF$INDEPENDENT" proposal to use the directive in a block style.

Clemens-August Thole "A vote for explicit MIMD support" contains 3 examples
from different application areas, which seem to require MIMD support for
efficient execution. 

\paragraph{Summary}

In contrast to FORALL extensions MIMD support is currently not
well-established
as part of HPF Fortran. The examples in "A vote for explicit MIMD support"
show clearly the need for such features. Local subroutines do not fulfill
the requirements because they force to use a distributed memory programming
model,
which should not be necessary in most cases.

With the exception of parallel sections all interesting features
are contained in the MPP-proposal. I would like to split the discussion
on specifying parallelism, synchronization and mapping into three different
topics. Furthermore I would like to see corresponding features to be
expessed
in the style of of the current X3H5 proposal, if possible, in order to
be in line with upcoming standards.


\subsubsection{Proposal for MIMD support}

In order to support the spezification of MIMD-type of parallelism the
following
features are taken from the "Fortran 77 Binding of X3H5 Model for 
Parallel Programming Constructs": 
\begin{itemize}
    \item   PARALLEL DO construct/directive
    \item   PARALLEL SECTIONS worksharing construct/directive
    \item   NEW statement/directive
\end{itemize}

These constructs are not used with PCF like options for mapping or 
sysnchronisation but are combined with the ON clause for mapping operations
onto the parallel architecture. 

\paragraph{PARALLEL DO}

\subparagraph{Explicit Syntax}

The PARALLEL DO construct is used to specify parallelism amoung the 
iterations of a block of code. The PARALLEL DO construct has the same
syntax as a DO statement. For an directive approach the directive
!HPF$ PARALLEL can be used in front of a do statement.
After the PARALLEL DO statement a new-declaration may be inserted.

A PARALLEL DO construct might be nested with other parallel constructs. 

\subparagraph{Interpretation}

The PARALLEL DO is used to specify parallel execution of the iterations of
a block of code. Each iteration of a PARALLEL DO is an independent unit
of work. The iterations of PARALLEL DO must be data independent. Iterations
are data independent if the storage sequence accociated with each variable
are array element that is assigned a value by each iteration is not
referenced
by any other iteration. 

A program is not HPF conforming, if for any iteration a statement is
executed,
which causes a transfer of control out of the block defined by the PARALLEL
DO construct. 

The value of the loop index of a PARALLEL DO is undefined outside the scope
of the PARALLEL DO construct. 


\paragraph{PARALLEL SECTIONS}

The parallel sections construct is used to specify parallelism among
sections
of code.

\subparagraph{Explicit Syntax}


                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        !HPF$ SECTION
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
structured as
                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        [new-declaration-stmt-list]
        [section-block]
        [section-block-list]
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
where [section-block] is
                                                              \CODE
        !HPF$ SECTION
        [execution-part]
                                                              \EDOC

\subparagraph{Interpretation}

The parallel sections construct is used to specify parallelism among
sections
of code. Each section of the code is an independent unit of work. A program
is not standard conforming if during the execution of any parallel sections
construct a transfer of control out of the blocks defined by the Parallel
Sections construct is performed. 
In a standard conforming program the sections of code shall be data 
independent. Sections are data independent if the storage sequence accociated 
with each variable are array element that is assigned a value by each
section
is not referenced by any other section. 


\paragraph{Data scoping}

Data objects, which are local to a subroutine, are different between 
distinct units of work, even if the execute the same subroutine.


\paragraph{NEW statement/directive}

The NEW statement/directive allows the user to generate new instances of 
objects with the same name as an object, which can currently be referenced.


\subparagraph{Explicit Syntax}

A [new-declaration-stmt] is
                                                                \CODE
       !HPF$ NEW variable-name-list
                                                                \EDOC

\subparagraph{Coding rules}

A [varable-name] shall not be
\begin{itemize} 
\item    the name of an assumed size array, dummy argument, common block, 
function or entry point
\item    of type character with an assumed length
\item    specified in a SAVE of DATA statement
\item    associated with any object that is shared for this parallel
construct.
\end{itemize}

\subparagraph{Interpretation}
 
Listing a variable on a NEW statement causes the object to be explicitly
private for the parallel construct. For each unit of work of the parallel 
construct a new instance of the object is created and referenced with the
specific name. 


\end{document}


From chk@cs.rice.edu  Mon Aug 24 15:47:53 1992
Received: from moe.rice.edu by cs.rice.edu (AA01245); Mon, 24 Aug 92 15:47:53 CDT
Received: from cs.rice.edu by moe.rice.edu (AA03934); Mon, 24 Aug 92 15:47:54 CDT
Received: from  by cs.rice.edu (AB01092); Mon, 24 Aug 92 15:47:46 CDT
Message-Id: <9208242047.AB01092@cs.rice.edu>
Date: Mon, 24 Aug 1992 15:50:20 -0600
To: hpff-forall@rice.edu
From: chk@cs.rice.edu
Subject: new-draft.tex, try again

David Presberg and Bert Halstead have pointed out some problems with the
document as it went out last night.  Some were due to lack of proofreading
(a misleading title), some to a mail program that thinks the world only
needs 80 characters per line.  A corrected draft is going out in a second. 
PLEASE use this one to print (and distribute if you give it to anyone
else).

                                                Chuck


From chk@erato.cs.rice.edu  Mon Aug 24 16:00:14 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA01838); Mon, 24 Aug 92 16:00:14 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA04558); Mon, 24 Aug 92 16:00:06 CDT
Message-Id: <9208242100.AA04558@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Word-Of-The-Day: paralogism : (n) a fallacious argument
Subject: new-draft.tex
Date: Mon, 24 Aug 92 16:00:02 -0500
From: chk@erato.cs.rice.edu

%hpf-freestanding-chapter-header.tex

%Version of August 5, 1992 - David Loveman, Digital Equipment Corporation

\documentstyle[11pt]{report}
\pagestyle{plain}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin=.25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin=-.5in
\textwidth=6.0in
\textheight=9.0in
\parindent=2em

%the file syntax-macros.tex is physically included below

%syntax-macros.tex

%Version of July 29, 1992 - Guy Steele, Thinking Machines

\newdimen\bnfalign         \bnfalign=2in
\newdimen\bnfopwidth       \bnfopwidth=.3in
\newdimen\bnfindent        \bnfindent=.2in
\newdimen\bnfsep           \bnfsep=6pt
\newdimen\bnfmargin        \bnfmargin=0.5in
\newdimen\codemargin       \codemargin=0.5in
\newdimen\intrinsicmargin  \intrinsicmargin=3em
\newdimen\casemargin       \casemargin=0.75in
\newdimen\argumentmargin   \argumentmargin=1.8in

\def\IT{\it}
\def\RM{\rm}
\let\CHAR=\char
\let\CATCODE=\catcode
\let\DEF=\def
\let\GLOBAL=\global
\let\RELAX=\relax
\let\BEGIN=\begin
\let\END=\end


\def\FUNNYCHARACTIVE{\CATCODE`\a=13 \CATCODE`\b=13 \CATCODE`\c=13 \CATCODE`\d=13
		     \CATCODE`\e=13 \CATCODE`\f=13 \CATCODE`\g=13 \CATCODE`\h=13
		     \CATCODE`\i=13 \CATCODE`\j=13 \CATCODE`\k=13 \CATCODE`\l=13
		     \CATCODE`\m=13 \CATCODE`\n=13 \CATCODE`\o=13 \CATCODE`\p=13
		     \CATCODE`\q=13 \CATCODE`\r=13 \CATCODE`\s=13 \CATCODE`\t=13
		     \CATCODE`\u=13 \CATCODE`\v=13 \CATCODE`\w=13 \CATCODE`\x=13
		     \CATCODE`\y=13 \CATCODE`\z=13 \CATCODE`\[=13 \CATCODE`\]=13
                     \CATCODE`\-=13}

\def\RETURNACTIVE{\CATCODE`\
=13}

\makeatletter
\def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus 
 -.2ex}{2.3ex plus .2ex}{\large\sf}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-3.25ex plus -1ex minus 
 -.2ex}{1.5ex plus .2ex}{\large\sf}}

\def\@ifpar#1#2{\let\@tempe\par \def\@tempa{#1}\def\@tempb{#2}\futurelet
    \@tempc\@ifnch}

\def\?#1.{\begingroup\def\@tempq{#1}\list{}{\leftmargin\intrinsicmargin}\relax
  \item[]{\bf\@tempq.} \@intrinsictest}
\def\@intrinsictest{\@ifpar{\@intrinsicpar\@intrinsicdesc}{\@intrinsicpar\relax}}
\long\def\@intrinsicdesc#1{\list{}{\relax
  \def\@tempb{ Arguments}\ifx\@tempq\@tempb
			  \leftmargin\argumentmargin
			  \else \leftmargin\casemargin \fi
  \labelwidth\leftmargin  \advance\labelwidth -\labelsep
  \parsep 4pt plus 2pt minus 1pt
  \let\makelabel\@intrinsiclabel}#1\endlist}
\long\def\@intrinsicpar#1#2\\{#1{#2}\@ifstar{\@intrinsictest}{\endlist\endgroup}}
\def\@intrinsiclabel#1{\setbox0=\hbox{\rm #1}\ifnum\wd0>\labelwidth
  \box0 \else \hbox to \labelwidth{\box0\hfill}\fi}
\def\Case(#1):{\item[{\it Case (#1):}]}
\def\ {\@ifnextchar({\def\@tempq{#1}\@intrinsicopt}{\item[#1]}}
\def\@intrinsicopt(#1){\item[{\@tempq} (#1)]}

\def\MATRIX#1{\relax
    \@ifnextchar,{\@MATRIXTABS{}#1,\@FOO, \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar;{\@MATRIXTABS{}#1,\@FOO; \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar:{\@MATRIXTABS{}#1,\@FOO: \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar.{\hfill\penalty1\null\penalty10000\hskip0pt plus 1filll
		  \@MATRIXTABS{}#1,\@FOO.\penalty-50\@gobble
  }{\@MATRIXTABS{}#1,\@FOO{ }\hskip0pt plus 1filll\penalty-1}}}}}

\def\@MATRIXTABS#1#2,{\@ifnextchar\@FOO{\@MATRIX{#1#2}}{\@MATRIXTABS{#1#2&}}}
\def\@MATRIX#1\@FOO{\(\left[\begin{array}{rrrrrrrrrr}#1\end{array}\right]\)}

\def\@IFSPACEORRETURNNEXT#1#2{\def\@tempa{#1}\def\@tempb{#2}\futurelet\@
tempc\@ifspnx}

{
\FUNNYCHARACTIVE
\GLOBAL\DEF\FUNNYCHARDEF{\RELAX
    \DEFa{{\IT\CHAR"61}}\DEFb{{\IT\CHAR"62}}\DEFc{{\IT\CHAR"63}}\RELAX
    \DEFd{{\IT\CHAR"64}}\DEFe{{\IT\CHAR"65}}\DEFf{{\IT\CHAR"66}}\RELAX
    \DEFg{{\IT\CHAR"67}}\DEFh{{\IT\CHAR"68}}\DEFi{{\IT\CHAR"69}}\RELAX
    \DEFj{{\IT\CHAR"6A}}\DEFk{{\IT\CHAR"6B}}\DEFl{{\IT\CHAR"6C}}\RELAX
    \DEFm{{\IT\CHAR"6D}}\DEFn{{\IT\CHAR"6E}}\DEFo{{\IT\CHAR"6F}}\RELAX
    \DEFp{{\IT\CHAR"70}}\DEFq{{\IT\CHAR"71}}\DEFr{{\IT\CHAR"72}}\RELAX
    \DEFs{{\IT\CHAR"73}}\DEFt{{\IT\CHAR"74}}\DEFu{{\IT\CHAR"75}}\RELAX
    \DEFv{{\IT\CHAR"76}}\DEFw{{\IT\CHAR"77}}\DEFx{{\IT\CHAR"78}}\RELAX
    \DEFy{{\IT\CHAR"79}}\DEFz{{\IT\CHAR"7A}}\DEF[{{\RM\CHAR"5B}}\RELAX
    \DEF]{{\RM\CHAR"5D}}\DEF-{\@IFSPACEORRETURNNEXT{{\CHAR"2D}}{{\IT\CHAR"2D}}}}
}

%%% Warning!  Devious return-character machinations in the next several lines!
%%%           Don't even *breathe* on these macros!
{\RETURNACTIVE\global\def\RETURNDEF{\def
{\@ifnextchar\FNB{}{\@stopline\@ifnextchar
{\@NEWBNFRULE}{\penalty\@M\@startline\ignorespaces}}}}\global\def\@NEWBNFRULE
{\vskip\bnfsep\@startline\ignorespaces}\global\def\@ifspnx{\ifx\@tempc\@
sptoken \let\@tempd\@tempa \else \ifx\@tempc
\let\@tempd\@tempa \else \let\@tempd\@tempb \fi\fi \@tempd}}
%%% End of bizarro return-character machinations.

\def\IS{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth  \hskip-\bnfalign
  \hbox to \bnfalign{\unhbox\@curfield\hfill}\hbox to \bnfopwidth{\bf is \hfill}}
\def\OR{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth \hbox to \bnfopwidth{\bf or \hfill}}
\def\R#1 {\hbox to 0pt{\hskip-\bnfmargin R#1\hfill}}
\def\XBNF{\FUNNYCHARDEF\FUNNYCHARACTIVE\RETURNDEF\RETURNACTIVE
  \def\@underbarchar{{\char"5F}}\tt\frenchspacing
  \advance\@totalleftmargin\bnfmargin \tabbing
  \hskip\bnfalign\hskip\bnfopwidth\hskip\bnfindent\=\kill\>\+\@gobblecr}
\def\endXBNF{\-\endtabbing}

\def\BNF{\BEGIN{XBNF}}
\def\FNB{\END{XBNF}}

\begingroup \catcode `|=0 \catcode`\\=12
|gdef|@XCODE#1\EDOC{#1|endtrivlist|end{tt}}
|endgroup

\def\CODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces \@XCODE}
\def\ICODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces
   \FUNNYCHARDEF\FUNNYCHARACTIVE \UNDERBARACTIVE\UNDERBARDEF \@XCODE}

\def\@underbarsub#1{{\ifmmode _{#1}\else {$_{#1}$}\fi}}
\let\@underbarchar\_
\def\@underbar{\let\@tempq\@underbarsub\if\@tempz A\let\@tempq\@underbarchar\fi
  \if\@tempz B\let\@tempq\@underbarchar\fi\if\@tempz C\let\@tempq\@underbarchar\fi
  \if\@tempz D\let\@tempq\@underbarchar\fi\if\@tempz E\let\@tempq\@underbarchar\fi
  \if\@tempz F\let\@tempq\@underbarchar\fi\if\@tempz G\let\@tempq\@underbarchar\fi
  \if\@tempz H\let\@tempq\@underbarchar\fi\if\@tempz I\let\@tempq\@underbarchar\fi
  \if\@tempz J\let\@tempq\@underbarchar\fi\if\@tempz K\let\@tempq\@underbarchar\fi
  \if\@tempz L\let\@tempq\@underbarchar\fi\if\@tempz M\let\@tempq\@underbarchar\fi
  \if\@tempz N\let\@tempq\@underbarchar\fi\if\@tempz O\let\@tempq\@underbarchar\fi
  \if\@tempz P\let\@tempq\@underbarchar\fi\if\@tempz Q\let\@tempq\@underbarchar\fi
  \if\@tempz R\let\@tempq\@underbarchar\fi\if\@tempz S\let\@tempq\@underbarchar\fi
  \if\@tempz T\let\@tempq\@underbarchar\fi\if\@tempz U\let\@tempq\@underbarchar\fi
  \if\@tempz V\let\@tempq\@underbarchar\fi\if\@tempz W\let\@tempq\@underbarchar\fi
  \if\@tempz X\let\@tempq\@underbarchar\fi\if\@tempz Y\let\@tempq\@underbarchar\fi
  \if\@tempz Z\let\@tempq\@underbarchar\fi\@tempq}
\def\@under{\futurelet\@tempz\@underbar}

\def\UNDERBARACTIVE{\CATCODE`\_=13}
\UNDERBARACTIVE
\def\UNDERBARDEF{\def_{\protect\@under}}
\UNDERBARDEF

\catcode`\$=11  

%the following line would allow derived-type component references 
%FOO%BAR in running text, but not allow LaTeX comments
%without this line, write FOO\%BAR
%\catcode`\%=11 

\makeatother

%end of file syntax-macros.tex

\title{{\em Tentative Proposal} \\ High Performance Fortran \\ 
FORALL and INDEPENDENT Proposal}
\author{FORALL Subgroup, High Performance Fortran Forum}
\date{August 24, 1992}

\hyphenation{RE-DIS-TRIB-UT-ABLE sub-script Wil-liam-son}

\begin{document}

\maketitle

\newpage

\pagenumbering{roman}

\vspace*{4.5in}

This is the result of a LaTeX run of a draft of a single chapter of 
the HPFF Language Specification document.

\vspace*{3.0in}

\copyright 1992 Rice University, Houston Texas.  Permission to copy 
without fee all or part of this material is granted, provided the 
Rice University copyright notice and the title of this document 
appear, and notice is given that copying is by permission of Rice 
University.

\tableofcontents

\newpage

\pagenumbering{arabic}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%put text of chapter here


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%put \end{document} here

%statements.tex

%Version of August 2, 1992 - David Loveman, Digital Equipment Corporation 
%	and Chuck Koelbel, Rice University

%Revision history:
%August 19, 1992 - chk - cleaned up discrepancies with Fortran 90 array 
%	expressions
%August 20, 1992 - chk - added DO INDEPENDENT section, Guy Steele's 
%	pointer proposals

\chapter{Statements\protect\footnote{Version of August 2, 1992 - David
Loveman, Digital Equipment Corporation and Chuck Koelbel, Rice University}}
\label{statements}

\section{Overview}

The purpose of the FORALL construct is to provide a convenient syntax for 
simultaneous assignments to large groups of array elements.
In this respect it is very similar to the functionality provided by array 
assignments and WHERE constructs.
FORALL differs from these constructs primarily in its syntax, which is 
intended to be more suggestive of local operations on each element of an 
array.
It is also possible to specify slightly more general array regions than 
are allowed by the basic array triplet notation.
Both single-statement and block FORALLs are defined in this proposal.

Recent discussions within the FORALL subgroup have made it clear that 
some are opposed to the construct on the grounds that it is unnecessary 
and perhaps confusing.
It is, however, clear that others in the group strongly support the added 
expressiveness provided by FORALL.
Because of this disagreement, the committee recommends that if FORALL is 
accepted into HPF, that it not be included in the official subset.
We feel, however, that it is important to define the construct so that 
implementations with this functionality have consistent semantics.

The following proposal is designed as a modification of the Fortran 90 
standard; all references to rule numbers and section numbers pertain to 
that document unless otherwise noted.


\section{Element Array Assignment - FORALL\protect\footnote{Version of 
August 20, 1992 - David
Loveman, Digital Equipment Corporation and Chuck Koelbel, Rice University}}  

\label{forall-stmt}

The element array
assignment statement is used to specify an array assignment in terms of
array elements or array sections.  The element array assignment may be
masked with a scalar logical expression.  Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsection{General Form of Element Array Assignment}

                                                                       \BNF
forall-stmt          \IS FORALL (forall-triplet-spec-list 
                           [,scalar-mask-expr ]) forall-assignment

forall-triplet-spec  \IS subscript-name = subscript : subscript 
                           [ : stride]
                                                                       \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

                                                                       \BNF
forall-assignment    \IS array-element = expr
                     \OR array-section = expr
                                                                       \FNB

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(INT((m2 -m1 + m3) / m3) \leq 0\), the {\it forall-assignment} is not
executed.

Examples of element array assignments are:

                                                                  \CODE
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, A(I,J) .NE. 0.0) B(I,J) = 1.0 / A(I,J)
                                                                  \EDOC 

\subsection{Interpretation of Element Array Assignments}  

% Check that the following are consistent with array expressions:
% 3. Side effects may affect global variables, provided they are 
%    order-independent

Execution of an element array assignment consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Evaluation in any order of the {\it expr} and all subscripts 
contained in the 
{\it array-element} or {\it array-section} in the {\it forall-assignment} 
for all active combinations of {\em subscript-name} values.

\item Assignment of the computed {\it expr} values to the corresponding 
elements specified by {\it array-element} or {\it array-section}.

\end{enumerate}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.
Similarly, the evaluations of {\em expr}, {\it array-element} or {\it 
array-section} may not cause any scalar data object to be assigned a 
value more than once, nor may they cause an array element to be assigned 
which is also assigned directly by the {\it forall-assignment}.
Local variables within two instantiations of the same function do not 
refer to the same data object unless they have the SAVE attribute or are 
sequence or storage associated with the same object.
 
The evaluation of the {\it expr} for a particular active combination of
{\it subscript-name} values may neither affect nor be affected by 
the evaluation of {\it expr} for any other combination of {\it 
subscript-name} values.
In particular, functions cannot produce side effects that are visible in 
the FORALL; 
nor may global variables be updated by functions unless the results of 
those updates are independent of the order of execution of {\it 
subscript-name} values.
The evaluation of the {\it expr} or any subscript on the left-hand side 
of the {\it forall-assignment} for any active combination of {\it 
subscript-name} values may not affect 
nor be affected by 
the evaluation of any subscript in the {\it forall-assignment}, either 
for the same 
combination of {\it subscript-name} values or a different active
combination.
In particular, a function reference
appearing in any expression in the {\it forall-assignment} must not
redefine any subscript name.


\subsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

                                                                   \CODE
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn , mask ) &
      a(e1,...,em) = rhs
                                                                   \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
	  ...
  END DO
END DO

!then evaluate the expr in the forall-assignment for all valid 
!combinations of subscript names for which the scalar mask 
!expression is true (it is safe to avoid saving the subscript 
!expressions because of the conditions on FORALL expressions)

DO v1=templ1,tempu1,temps1
  DO v2=tel2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
	  ...
  END DO
END DO

!then perform the assignment of these values to the corresponding 
!elements of the array being assigned to

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(e1,...,em) = temprhs(v1,v2,...,vn)
        END IF
      END DO
	  ...
  END DO
END DO
                                                                      \EDOC

\subsubsection{Consequences of the Definition of the FORALL Statement}

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  
(This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Right-hand sides and subscripts on the left hand side of a {\it 
forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\item Side-effects of function calls are allowed 
under the usual condition of Fortran statements that order of evaluation 
of subexpressions is undefined.
This principle has been extended so that side effects in computing one 
array element cannot affect other array elements.

\item Side-effects cannot assign to the same global array element or 
element of a function's actual argument twice.
This effectively disallows accumulations in function calls (including 
recording how many times a function is called).
It is still legal, however, to create side effects on {\it different} 
global array elements.

\item I am unable to find a clear answer to the following in the 
Fortran~90 standard: Can the evaluation of a function affect the values 
of global objects not referenced in the calling statement?
For example, is the following legal?
                                                                 \CODE
X = F(1) + F(2)
...
REAL FUNCTION F(K)
COMMON /HUH/ ICOUNT
ICOUNT = ICOUNT+1
F = K
END
																 \EDOC
The assignments to ICOUNT do not affect the values computed by F, and the 
final value of ICOUNT is mathematically independent of the order of 
evaulation here.
The constraints on FORALL evaluation above would not allow F to be called 
from a FORALL; if this is inconsistent with array expressions, the 
proposal should be amended or a note of the inconsistency made in the text.

\item Distinct function instantiations explicitly have distinct sets of 
local variables, to remove ambiguity about whether the following is legal:
                                                                  \CODE
FORALL ( I = 1:N ) A(I) = FOO( I )
...
INTEGER FUNCTION FOO( I )
INTEGER I, J, K
J = 1
K = I
DO WHILE ( K .GT. 1 )
  J = J+1
  IF (MOD(K,2) .EQ. 0) THEN
    K = K / 2
  ELSE
    K = K * 3 + 1
  END IF
END DO
FOO = K
END
                                                                 \EDOC 
Assuming distinct function calls have their own variables, there are no 
side effects to any global variable.
This is consistent (some might argue implied by) Section~12.5.2.4 of the 
Fortran~90 standard.
I don't claim this is particularly easy to implement on all machines.

\item This proposal is mute on whether I/O is allowed in functions called 
from FORALL statements.
This could, and probably should, be added as a constraint in the 
interpretation section.

\end{itemize}


\section{FORALL Construct\protect\footnote{Version of August 20, 1992 - David
Loveman, Digital Equipment Corporation and Chuck Koelbel, Rice University}}

\label{forall-construct}

The FORALL construct is a generalization of the element array
assignment statement allowing multiple assignments, masked array 
assignments, and nested FORALL statements to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it
forall-construct}.

\subsubsection{General Form of the FORALL Construct}

                                                                    \BNF
forall-construct        \IS FORALL (forall-triplet-spec-list 
                                  [,scalar-mask-expr ])
                               forall-body-stmt-list
                            END FORALL

forall-body-stmt     \IS forall-assignment
                     \OR where-stmt
                     \OR forall-stmt
                     \OR forall-construct
                                                                    \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\noindent
Constraint:  Any left-hand side {\it array-section} or {\it 
array-element} in any {\it forall-body-stmt}
must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
within a {\it forall-construct}, then the inner FORALL may not redefine 
any {\it subscript-name} used in the outer {\it forall-construct}.
This rule applies recursively in the event of multiple nesting levels.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(INT((m2 -m1 + m3) / m3) \leq 0\), the {\it forall-assignment}s are not 
executed.

Examples of the FORALL construct are:

                                                                 \CODE
FORALL ( i = 2:n-1, j = 2:i-1 )
  a(i,j) = a(i,j-1) + a(i,j+1) + a(i-1,j) + a(i+1,j)
  b(i,j) = a(i,j)
END FORALL

FORALL ( i = 1:n-1 )
  FORALL ( j = i+1:n )
    a(i,j) = a(j,i)
  END FORALL
END FORALL

FORALL ( i = 1:n, j = 1:n )
  a(i,j) = MERGE( a(i,j), a(i,j)**2, i.eq.j )
  WHERE ( .not. done(i,j,1:m) )
    b(i,j,1:m) = b(i,j,1:m)*x
  END WHERE
END FORALL
								
		                                                       
         \EDOC


\subsection{Interpretation of the FORALL Construct}  

Execution of a FORALL construct consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Execute the {\it forall-body-stmts} in the order they appear.
Each statement is executed completely (that is, for all active 
combinations of {\it subscript-name} values) according to the following 
interpretation:

\begin{enumerate}

\item Assignment statements and array assignment statements (i.e. 
statements in the {\it forall-assignment} category) evaluate the 
right-hand side {\it expr} and any left-and side subscripts for all 
active {\it subscript-name} values,
then assign those results to the corresponding left-hand side references.

\item WHERE statements evaluate their {\it mask-expr} for all active 
combinations of values of {\it subscript-name}s.
All elements of all masks may be evaluated in any order. 
The assignments within the WHERE branch of the statement are then 
executed in order using the above interpretation of array assignments 
within the FORALL, but the only array elements assigned are those 
selected by both the active {\it subscript-names} and the WHERE mask.
Finally, the assignments in the ELSEWHERE branch are executed if that 
branch is present.
The assignments here are also treated as array assignments, but elements 
are only assigned if they are selected by both the active combinations 
and by the negation of the WHERE mask.

\item FORALL statements and FORALL constructs first evaluate the 
subscript and stride expressions in 
the {\it forall-triplet-spec-list} for all active combinations of the 
outer FORALL constructs.
The set of valid combinations of {\it subscript-names} for the inner 
FORALL is then the union of the sets defined by these bounds and strides 
for each active combination of the outer {\it subscript-names}.
For example, the valid set of the inner FORALL in the second example in 
the last section is the upper triangle (not including the main diagonal) 
of the \(n \times n\) matrix a.
The scalar mask expression is then evaluated for all valid combinations 
of the inner FORALL's {\it subscript-names} to produce the set of active 
combinations.
If there is no scalar mask expression, it is assumed to be always true.
Each statement in the inner FORALL is then executed for each valid 
combination (of the inner FORALL), recursively following the 
interpretations given in this section.

\end{enumerate}

\end{enumerate}

If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL construct itself. 

A single assignment or array assignment statement in a {\it 
forall-construct} must obey the same restrictions as a {\it 
forall-assignment} in a simple {\it forall-stmt}.
(Note that the lowest level of nested statements must always be an 
assignment statement.)
For example, an assignment may not cause the same array element to be 
assigned more than once.
It is, however, permitted that different statements may assign to the 
same array element, or that the evaluation of subexpressions in one 
statement affect the execution of a later statement.
Evaluation of the mask or subscript bounds and stride 
expressions in an inner WHERE or FORALL for one active combination of 
{\it subscript-name} values may not affect nor be affected by the 
evaluations of those subexpressions for any other active combination.

\subsection{Scalarization of the FORALL Construct}

A {\it forall-construct} othe form:

                                                                \CODE
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
                                                                \EDOC

where each si is an assignment is equivalent to the following scalar code:

                                                                \CODE
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
                                                                \EDOC

A similar statement can be made using FORALL constructs when the 
si may be WHERE or FORALL constructs.

A {\it forall-construct} of the form:

                                                                \CODE
FORALL ( v1=l1:u1:s1, mask )
  WHERE ( mask2(l2:u2) )
    a(vi,l2:u2) = rhs1
  ELSEWHERE
    a(vi,l2:u2) = rhs2
  END WHERE
END FORALL
                                                                \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the masks for the WHERE

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    tmpl2(v1) = l2
    tmpu2(v1) = u2
    tempmask2(v1,tmpl2(v1):tmpu2(v1)) = mask2(tmpl2(v1):tmpu2(v1))
  END IF
END DO

!then evaluate the WHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs1(v1,tmpl2(v1):tmpu2(v1)) = rhs1
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs1(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO

!then evaluate the ELSEWHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs2(v1,tmpl2(v1):tmpu2(v1)) = rhs2
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs2(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO
                                                                   \EDOC


A {\it forall-construct} of the form:

                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  FORALL ( v2=l2:u2:s2, mask2 )
    a(e1,e2) = rhs1
	b(e3,e4) = rhs2
  END FORALL
END FORALL
                                                                   \EDOC

is equivalent to the following standard Fortran 90 code:


                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the inner FORALL bounds, etc

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    templ2(v1) = l2
    tempu2(v1) = u2
    temps2(v1) = s2
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      tempmask2(v1,v2) = mask2
	END DO
  END IF
END DO

!first statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        temprhs1(v1,v2) = rhs1
	  END IF
	END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        a(e1,e2) = temprhs1(v1,v2)
	  END IF
	END DO
  END IF
END DO

!second statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        temprhs2(v1,v2) = rhs2
	  END IF
	END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        b(e3,e4) = temprhs2(v1,v2)
	  END IF
	END DO
  END IF
END DO
                                                                   \EDOC


\subsubsection{Consequences of the Definition of the FORALL Construct}

\begin{itemize}

\item A block FORALL means the same as replicating the FORALL
header in front of each array assignment statement in the block, except
that any expressions in the FORALL header are evaluated only once,
rather than being re-evaluated before each of the statements in the body.
(This statement needs some modification in the case of nesting.)

\item One may think of a block FORALL as synchronizing twice per
contained assignment statement: once after handling the rhs and other 
expressions
but before performing assignments, and once after all assignments have
been performed but before commencing the next statement.  (In practice,
appropriate dependence analysis will often permit the compiler to
eliminate unnecessary synchronizations.)

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item In general, any expression in a FORALL is evaluated only for valid 
combinations of all surrounding subscript names for which all the
scalar mask eressions are true.

\item Nested FORALL bounds and strides can depend on outer FORALL {\it 
subscript-names}.  They cannot redefine those names, even temporarily (if 
they did there  would be no way to avoid multiple assignments to the same 
array element).

\item Dependences are allowed from one statement to later statements, but 
never from an assignment statement to itself.
Masks and subscript bounds could conceivably have side effects visible in 
the rest of the nested statement.

\end{itemize}


\section{The INDEPENDENT Directive\protect\footnote{Version of August 20, 1992
 - Guy Steele, Thinking Machines Corporation, and Chuck Koelbel, Rice University}}

\label{do-independent}


Let there be a directive
                                                  \CODE
!HPF$INDEPENDENT
                                                  \EDOC
that can precede a DO loop.
It asserts to the compiler that the iterations of the loop
may be executed independently--that is, in any order, or
interleaved, or concurrently--without changing the semantics
of the program.  (The compiler is justified in producing
a warning if it can prove otherwise.)
                                                  \CODE
!HPF$INDEPENDENT
      DO I=1,100
        A(P(I))=B(I)   !I happen to know that P is a permutation
      END DO
                                                  \EDOC

One may apply this directive to a nest of multiple loops
by listing all the loop variables of the loops in question;
the loops must be contiguous with the directive and in the
same order that the variables are listed:
                                                  \CODE
!HPF$INDEPENDENT (I1,I2,I3)
      DO I1 = ...
        DO I2 = ...
          DO I3 = ...
            DO I4 = ...    !The inner two loops are *not* independent!
              DO I5 = ...
                ...
              END DO
            END DO
          END DO
        END DO
      END DO
                                                  \EDOC

These directives are purely advisory and a compiler is free
to ignore them if it cannot make use of the information.

This directive is of course similar to the DOSHARED directive
of Cray MPP Fortran.  A different name is offered here to avoid
even the hint of commitment to execution by a shared memory machine.
Also, the "mechanism" syntax is omitted here, though we might want
to adopt it as further advice to the compiler about appropriate
implementation strategies, if we can agree on a desirable set
of options.


\section{Other Proposals}

The proposals in this section have not been approved, even as a first 
reading.
Sections~\ref{begin-independent}, \ref{forall-pointer}, 
\ref{forall-allocate}, and \ref{data-ref} 
extend parts of the previous sections and/or the Fortran~90 standard.
Section~\ref{forall-elemental} is an alternative to the treatment of 
function calls in Sections~\ref{forall-stmt} and~\ref{forall-construct}.

\subsection{FORALL with INDEPENDENT Directives\protect\footnote{Version 
of July 21, 1992) - Min-You Wu}}
\label{begin-independent}

This proposal is an extension of Guy Steele's INDEPENDENT proposal.
We propose a block FORALL with the directives for independent 
execution of statements.  The INDEPENDENT directives are used
in a block style.  

The block FORALL is in the form of
\begin{verbatim}
      FORALL (...) [ON (...)]
        a block of statements
      END FORALL
\end{verbatim}
where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:
\begin{verbatim}
!HPF$BEGIN INDEPENDENT
!HPF$END INDEPENDENT
\end{verbatim}
The two directives must be used in pair.  There is a synchronization 
at each of these directives.  A sub-block of statements parenthesized 
in the two directives is called an {\em asynchronous} sub-block 
or {\em independent} sub-block.  The statements that are not in 
an asynchronous sub-block are in {\em synchronized} sub-blocks
or {\em non-independent} sub-block.  The synchronized sub-block is 
the same as Guy Steele's synchronized FORALL statement, and the 
asynchronous sub-block is the same as the FORALL with the INDEPENDENT 
directive.  Thus, the block FORALL
\begin{verbatim}
      FORALL (e)
        b1
!HPF$BEGIN INDEPENDENT
        b2
!HPF$END INDEPENDENT
        b3
      END FORALL
\end{verbatim}
means roughly the same as
\begin{verbatim}
      FORALL (e)
        b1
      END FORALL
!HPF$INDEPENDENT
      FORALL (e)
        b2
      END FORALL
      FORALL (e)
        b3
      END FORALL
\end{verbatim}

Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronous sub-block are completely independent.
The INDEPENDENT directives indicates to the compiler there is no 
dependence and consequently, synchronizations are not necessary.
It is users' responsibility to ensure there is no dependence
between instances in an asynchronous sub-block.
A compiler can do dependence analysis for the asynchronous sub-blocks
and issues an error message when there exists a dependence or a warning
when it finds a possible dependence.

\noindent
{\bf What does mean "no dependence between instances"?}

It means that no true dependence, anti-dependence,
or output dependence between instances.
Examples of these dependences are shown below:

\noindent
1) true dependence
\begin{verbatim}
      FORALL (i = 1:N)
        x(i) = ... 
        ...  = x(i+1)
      END FORALL
\end{verbatim}
Notice that dependences in FORALL are different from that in a DO loop.
If the above example was a DO loop, that would be an anti-dependence.

\noindent
2) anti-dependence:

\begin{verbatim}
      FORALL (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END FORALL
\end{verbatim}

\noindent
3) output dependence:
\begin{verbatim}
      FORALL (i = 1:N)
        x(i+1) = ... 
        x(i) = ...
      END FORALL
\end{verbatim}

Independent does not imply no communication.  One instance may access 
data in the other instances, as long as it does not cause a dependence.  
The following example is an independent block:

\begin{verbatim}
      FORALL (i = 1:N)
!HPF$BEGIN INDEPENDENT
        x(i) = a(i-1)
        y(i-1) = a(i+1)
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
{\bf Statements that can appear in FORALL}

There is no restriction on the type of statements in an asynchronous 
sub-block.  That is, FORALL statements, DO loops, WHILE loops, 
WHERE-ELSEWHERE statements, IF-THEN-ELSE statements, CASE statements, 
subroutine and function calls, etc. can appear, as long as there is
no dependence.  On the other hand, the statements that can appear in a 
synchronized sub-block are restricted.  The degree of restrictions will 
be determined later.  The following statements could be allowed:
assignment statements, FORALL statements, DO loops, WHERE statements 
(not WHERE-ELSEWHERE), IF statements (not IF-THEN-ELSE) and some 
intrinsic functions (and elemental functions and subroutines).

Some examples are given below for the asynchronous sub-blocks:

\noindent
1) FORALL statement
\begin{verbatim}
      FORALL (I = 1 : N)
        A(I,0) = A(I-1,0)
!HPF$BEGIN INDEPENDENT
        FORALL (J = 1 : N)
          A(I,J) = A(I,0) + B(I-1,J-1)
          C(I,J) = A(I,J)
        END FORALL
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
2) DO loop
\begin{verbatim}
      FORALL (I = 1 : N)
!HPF$BEGIN INDEPENDENT
        DO J = 1, N 
          A(I) = A(I) * B(I)
        END DO 
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
3) WHILE loop
\begin{verbatim}
      FORALL (I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHILE (A(I) < BIG) DO
          A(I) = A(I) * B(I)
        END DO 
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
4) IF-THEN-ELSE
\begin{verbatim}
      FORALL ( I = 1 : N )
!HPF$BEGIN INDEPENDENT
        IF ( A(I) < EPS ) THEN                
          A(I) = 0.0                          
          B(I) = 0.0                          
        ELSE
          TMP(I) = B(I)
          B(I) = A(I)                       
          A(I) = TMP(I)
        END IF
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
5) WHERE
\begin{verbatim}
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}

\noindent
6) subroutine CALL
\begin{verbatim}
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        A(I) = C(I)
        CALL FOO(A(I))
!HPF$END INDEPENDENT
      END FORALL

      SUBROUTINE FOO(x)
      real :: x
      x = x * 10 + x 
      RETURN
      END
\end{verbatim}

\noindent
Another example for subroutine CALL:

\begin{verbatim}
      FORALL(I = 1 : N)
        A(I) = C(I)
!HPF$BEGIN INDEPENDENT
        CALL FOO(B(I), A(I-1), A(I+1))
!HPF$END INDEPENDENT
      END FORALL

      SUBROUTINE FOO(x, y, z)
      real :: x, y, z
      x = (y + z) / 2
      RETURN
      END
\end{verbatim}

\vspace{.1in}
\noindent
{\bf Rationale}

1. A FORALL with a single asynchronous sub-block as shown below is 
the same as a do independent (or doall, or doeach, or parallel do, etc.).
\begin{verbatim}
      FORALL (e)
!HPF$BEGIN INDEPENDENT
        b1
!HPF$END INDEPENDENT
      END FORALL
\end{verbatim}
A FORALL without any INDEPENDENT directive is the same as a tightly 
synchronized FORALL.  We only need to define one type of parallel 
constructs including both synchronized and asynchronous blocks.  
Furthermore, combining asynchronous and synchronized FORALLs, we 
have a loosely synchronized FORALL which is more flexible for many 
loosely synchronous applications.

2. With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  The INDEPENDENT directives can act 
as barrier synchronizations.  One may suggest a smart compiler 
that can recognize dependences and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependences.  
When the compiler cannot determine whether there is a dependence, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.


\subsection{Pointer Assignments in FORALL\protect\footnote{Version of 
July 28, 1992 - Guy Steele, Thinking Machines Corporation}}
\label{forall-pointer}


Proposal: Pointer assignments may appear in the body of a FORALL.
	(It is not necessary to support this proposal until one
	supports derived types, as that is the only way of specifying
	assignment to more than one pointer at a time.)

Rationale: this is just another kind of assignment.

Example:
                                                    \CODE
      TYPE MONARCH
        INTEGER, POINTER :: P
      END TYPE MONARCH
      TYPE(MONARCH) :: A(N)
      INTEGER B(N)
      ...
C  Set up a butterfly pattern
      FORALL (J=1:N)  A(J)%P => B(1+IEOR(J-1,2**K))
                                                    \EDOC
\subsection{ALLOCATE in FORALL\protect\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation}}

\label{forall-allocate}


Proposal:  ALLOCATE, DEALLOCATE, and NULLIFY statements may appear
	in the body of a FORALL.

Rationale: these are just another kind of assignment.  They may have
	a kind of side effect (storage management), but it is a
	benign side effect (even milder than random number generation).

Example:
                                                            \CODE
      TYPE SCREEN
        INTEGER, POINTER :: P(:,:)
      END TYPE SCREEN
      TYPE(SCREEN) :: S(N)
      INTEGER IERR(N)
      ...
!  Lots of arrays with different aspect ratios
      FORALL (J=1:N)  ALLOCATE(S(J)%P(J,N/J),STAT=IERR(J))
      IF(ANY(IERR)) GO TO 99999
                                                            \EDOC
\subsection{Generalized Data References\protect\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation}}

\label{data-ref}

Proposal:  Delete the constraint in section 6.1.2 of the Fortran 90
	standard (page 63, lines 7 and 8):
\begin{quote}
	Constraint: In a data-ref, there must not be more than one
		part-ref with nonzero rank.  A part-name to the right
		of a part-ref with nonzero rank must not have the
		POINTER attribute.
\end{quote}

Rationale: further opportunities for parallelism.

Example:
                                                                     \CODE
      TYPE(MONARCH) :: C(N), W(N)
      ...
C  Munch that butterfly
      C = C + W * A%P		!Currently illegal in Fortran 90
                                                                      \EDOC
																	  
\subsection{ELEMENTAL Functions\protect\footnote{Version of August 20, 
1992 - John Merlin, University of Southhampton, and Chuck Koelbel, Rice 
University}}

\label{forall-elemental}

The intent of this counter-proposal is to further restrict functions 
called from within FORALL so that they have no side effects.
This is more restrictive than the Fortran~90 constraints on function 
calls in array assignments; however, the definition is simpler and 
presumably clearer.

\subsubsection{General Form of Element Array Assignment}

To the definition of {\it forall-assignment}, add the following:

\noindent Constraint: If any subexpression in {\it expr}, {\it 
array-element}, or {\it array-section} is a {\it function-reference}, 
then the {\it function-name} must be an ELEMENTAL function as defined below.


\subsubsection{Interpretation of Element Array Assignments}  

Change the paragraphs after the step-by-step interpretation to the 
following:

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.
By the nature of ELEMENTAL functions, no expression evaluations can 
have any affect on other expressions, either for the same combination of 
{\it subscript-name} values or for a different combination.
 
\subsubsection{ELEMENTAL Procedure}

An elemental procedure is one which produces no side effects except for 
returning a value or assigning to scalar dummy arguments of type OUT or 
INOUT.
It may be used in any way that a normal procedure of its type may 
be used.
In addition, elemental functions may be called from a FORALL statement.
An array expression may also apply an elemental function to all elements 
of an array.

A procedure may be asserted to be elemental at its definition or at its 
interface.
The form of the assertion is a directive
                                                                 \BNF
elemental-directive \IS !HPF$ ELEMENTAL
                                                                 \FNB
																 
To assert that a definition defines an elemental procedure, Rules~R1215 
and R1219 are 
changed to
                                                                 \BNF
function-subprogram \IS [elemental-directive]
                        function-stmt
						[specification-part]
						[execution-part]
						[internal-subprogram-part]
						end-function-stmt
                                                                \FNB

\noindent
Constraint: All dummy arguments to the function must have INTENT(IN).

\noindent
Constraint: No local variable in {\it specification-part} may have the 
SAVE attribute.

\noindent
Constraint: No executable statement in {\it execution-part} or {\it 
internal-subprogram-part} may assign to a global data object.

\noindent
Constraint: No executable statement in {\it execution-part} or {\it 
internal-subprogram-part} may be an I/O statement.

\noindent
Constraint: Any function or subroutine called from {\it execution-part} 
or {\it internal-subprogram-part} must be an elemental function.

                                                                 \BNF
subroutine-subprogram \IS [elemental-directive]
                        subroutine-stmt
						[specification-part]
						[execution-part]
						[internal-subprogram-part]
						end-subroutine-stmt
                                                                \FNB

\noindent
Constraint: All dummy arguments must have explicit INTENT.

\noindent
Constraint: All dummy arguments to the subroutine with INTENT(OUT) or 
INTENT(INOUT) must be of scalar type.

\noindent
Constraint: No local variable in {\it specification-part} may have the 
SAVE attribute.

\noindent
Constraint: No executable statement in {\it execution-part} or {\it 
internal-subprogram-part} may assign to a global data object.

\noindent
Constraint: No executable statement in {\it execution-part} or {\it 
internal-subprogram-part} may be an I/O statement.

\noindent
Constraint: Any function or subroutine called from {\it execution-part} 
or {\it internal-subprogram-part} must be an elemental function.


To define elemental interfaces, Rule~R1204 is changed to
                                                                \BNF
interface-body \IS [elemental-directive]
                   function-stmt
				   [specification-part]
				   end-function-stmt
			   \OR [elemental-directive]
                   subroutine-stmt
				   [specification-part]
				   end-subroutine-stmt
                                                                \FNB

\noindent
Constraint: All dummy arguments must have explicit INTENT.

\noindent
Constraint: All dummy arguments to a subroutine with INTENT(OUT) or 
INTENT(INOUT) must be of scalar type.
All dummy arguments to a function must have INTENT(IN).

When applied to a procedure interface, {\it elemental-directive} asserts 
that the function will make no assignment to any global data object except for 
actual arguments corresponding to INTENT(OUT) or INTENT(INOUT) arguments, 
and that the function will perform no I/O.
The compiler does not need to check this assertion, although if the 
definition of the function has an {\it elemental-directive} it will be 
true.
Only procedures with explicit interfaces may be asserted to be elemental.
It is allowed for the interface to a function to include ELEMENTAL even 
when its definition does not include it, or vice versa.

\paragraph{Comments}

Fortran 90 introduces the concept of 'elemental procedures',
which are defined for scalar arguments but may also be applied to
conforming array-valued arguments.  
For an elemental function,
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g SIN(X).
For an elemental subroutine, the effect on each element of an INTENT(OUT) or 
INTENT(INOUT) array argument is would be obtained by calling the 
subroutine with the corresponding elements of the arguments.
An example is the intrinsic routine MVBITS.

Unfortunately, Fortran~90 restricts the application of elemental
procedures to a subset of the intrinsic procedures --- the programmer
cannot write his own.  We therefore propose the extension of 
allowing the programmer to define elemental procedures.

Detailed comments:
\begin{itemize}
\item The constraints are obviously designed so that the function can
be invoked concurrently at each element of an array.
The constraints under the procedure definition part are sufficient (but 
not necessary) for this.

\item An earlier draft proposed allowing the dummy arguments of elemental 
functions to be themselves arrays.
These provisions were dropped to avoid promoting storage order to a 
higher level in Fortran~90.

\item An earlier draft of this proposal contained constraints disallowing 
elmental functions to access global data objects, particularly 
distributed data objects.
These have been dropped as inessential to the side-effect freedom that 
the HPF committee requested.
However, it may well be that some machines will have great difficulty 
implementing FORALLs without these constraints.

\item These rules show that a compiler cannot deduce from a procedure's 
interface whether it can validly be used as an elemental procedure, 
as that depends on its local declarations and internal operations.  
Hence, it is necessary to use a specifier like 'ELEMENTAL' in the 
procedure interface to identify such procedures.  The compiler can 
check that the procedure satisfies all the necessary constraints 
when it compiles the procedure itself.

\end{itemize}

\paragraph{Uses and MIMD aspects}

\subparagraph{FORALL statements and constructs}

Elemental functions may be used in expressions 
in FORALL statements and 
constructs, unlike general functions.
This includes their use in array expressions in FORALL statements and 
constructs, using the features described below.

Because a {\it forall-assignment}
may be an {\it array-assignment} the elemental
function can have an array result.  
For example, if a certain problem
is data-parallel over a 2d grid, and the data structure at each grid
point is a vector of length 3 (2d QCD?), we could have:
                                                              \CODE
	REAL  v (3,10,10)
	INTERFACE
	  ELEMENTAL FUNCTION f (x)
	    REAL, DIMENSION(3) :: f, x
	  END FUNCTION f
	END INTERFACE
	...
	FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 
                                                              \EDOC


A natural way of obtaining some MIMD parallelism is by
means of branches within an elemental function which depend on argument 
values.  These branches can be governed by content-based or index-based 
conditionals (the latter in a FORALL context).  For example:
                                                              \CODE
	ELEMENTAL FUNCTION f (x, i)
	  IF (x > 0) THEN     ! content-based conditional
	    ...
	  ELSE IF (i==1 .OR. i==n) THEN    ! index-based conditional
	    ...
	  ENDIF
	END FUNCTION

	...
	FORALL (i=1:n)  x (i) = f(x(i), i)
	...
                                                              \EDOC
Content-based conditionals can be exploited generally, including in
array assignments, which may sometimes obviate the need for WHERE 
statements and constructs with their potential synchronisation overhead. 

\subparagraph{Array expressions}

Elemental functions returning a scalar result can be used in array 
expressions with the 
same interpretation as Fortran~90 elemental intrinsic functions.
This interpretation (Fortran~90 standard, Section~13.2.1) is as follows:
\begin{quote}
If a generic name or a specific name is used to reference an elemental 
intrinsic function, the shape of the result is the same as the shape of 
the argument with the greatest rank.
If the arguments are all scalar, the result is scalar.
For those elemental intrinsic functions that have more than one argument, 
all arguments must be conformable.
In the array-valued case, the values of the elements, if any of the 
result are the same as would have been obtained if the scalar-valued 
function had been applied separately, in any order, to corresponding 
elements of each argument.  An argument called KIND must be specified as 
a scalar integer initialization expression and must specify a 
representation method for the function result that exists on the processor.
\end{quote}
We propose the following extensions to this interpretation:
\begin{itemize}
\item Array dummy arguments are allowed in elemental functions; the 
ranks of the actual arguments corresponding to these dummies must match 
the ranks of the dummies.
\item The shape of the result is the same as the shape of the highest-rank actual 
argument matching a scalar dummy argument.
\item Actual arguments corresponding to scalar dummy arguments must be 
either conformable with other actuals or scalar.
\end{itemize}

\subparagraph{CALL statements}

Elemental subroutines can be called with array arguments as Fortran~90 
elemental subroutine intrinsics are, using the same interpretation.
This interpretation (Fortran~90 standard, Section~13.2.2) is as follows:
\begin{quote}
An elemental subroutine is one that is specified for scalar arguments, 
but may be applied to array arguments.  In a reference to an elemental 
intrinsic subroutine, either all actual arguments must be scalar or all 
INTENT(OUT) or INTENT(INOUT) arguments must be arrays of the same shape 
and the remaining arguments must be conformable with them. In the case 
that the INTENT(OUT) and INTENT(INOUT) arguments are arrays, the values 
of the elements, if any, of the results are the same as would be obtained 
if the subroutine with scalar arguments were applied separately, in any 
order, to corresponding elements of each argument.
\end{quote}
We propose the following extensions to this interpretation:
\begin{itemize}
\item The ranks of actual arguments corresponding to array dummy 
arguments must be equal to the ranks of those dummies.
\item Actual arguments corresponding to scalar dummy arguments with 
INTENT(IN) may be either scalar or arrays; if they are arrays, then they 
must be conformable with other array actuals.
\item All actual arguments with INTENT(OUT) or INTENT(INOUT) must be 
scalar or all must be arrays of the same shape; in the latter case, any 
other array actuals corresponding to scalar dummy arguments must be 
conformable with the the INTENT(OUT) and INTENT(INOUT) actuals.
\end{itemize}


\subsection{A Proposal for MIMD Support in HPF\protect\footnote{Version 
of July 18, 1992 - Clemens-August Thole, GMD I1.T}}

\label{mimd-support}
	          

\subsubsection{Abstract}

This proposal tries to supply sufficient language support in order 
to deal with loosely sysnchronous programs, some of which have been 
identified in my "A vote for explicit MIMD support".
This is a proposal for the support of MIMD parallelism, which extends
Section~\ref{do-independent}. 
It is more oriented
towards the CRAY - MPP Fortran Programming Model and the PCF proposal. 
The fine-grain synchronization of PCF is not proposed for implementation.
Instead of the CRAY-mechanims for assigning work to processors an 
extension of the ON-clause is used.
Due to the lack of fine-grain synchronization the constructs can be executed
on SIMD or sequential architectures just by ignoring the additional information.


\subsubsection{Summary of the current situation of MIMD support as part of HPF}

According to the Chuck Koelbel's (Rice) mail dated March 20th "Working Group 4 -
Issues for discussion" MIMD-support is a topic for discussion within working
group 4. 

Dave Loveman (DEC) has produced a document on FORALL statements 
(inorporated in Sections~\ref{forall-stmt} and \ref{forall-construct}) which
summarizes the discussion. Marc Snir proposed some extensions. These
constructs allow to describe SIMD extensions in an extended way compared
to array assignments. 

A topic for working papers is the interface of HPF Fortran to program units
which execute in SPMD mode. Proposals for "Local Subroutines" have been made
by Marc Snir and Guy Steele
(Chapter~\ref{foreign}). Both proposals
define local subroutines as program units, which are executed by all
processors independent of each other. Each processor has only access
to the data contained in its local memory. Parts of distributed data objects
can be accessed and updated by calls to a special library. Any message-passing
library might be used for synchronization and communication.
This approach does not really integrate MIMD-support into HPF programming.

The MPP Fortran proposal by Douglas M. Pase, Tom MacDonald, Andrew Meltzer (CRAY)
contained the following features in order to support integrated MIMD features:
\begin{itemize}
   \item  parallel directive
   \item  shared loops 
   \item  private variables
   \item  barrier synchronization
   \item  no-barrier directive for removing synchronization
   \item  locks, events, critical sections and atomic update
   \item  functions, to examine the mapping of data objects.
\end{itemize}

Steele's "Proposal for loops in HPF" (02.04.92) included a proposal for a 
directive "!HPF$ INDEPENDENT( integer_variable_list)", which specifies
for the next set of nested loops, that the loops with the specified
loop variables can be executed independent from each other.
(Sectin~\ref{do-independent} is a short version of this proposal.) 

Chuck Koelbel gave an overview on different styles for parallel loops
in "Parallel Loops Position Paper". No specific proposal was made.

Min-You Wu "Proposal for FORALL, May 1992" extended Guy Steele's 
"!HPF$INDEPENDENT" proposal to use the directive in a block style.

Clemens-August Thole "A vote for explicit MIMD support" contains 3 examples
from different application areas, which seem to require MIMD support for
efficient execution. 

\paragraph{Summary}

In contrast to FORALL extensions MIMD support is currently not well-established
as part of HPF Fortran. The examples in "A vote for explicit MIMD support"
show clearly the need for such features. Local subroutines do not fulfill
the requirements because they force to use a distributed memory programming model,
which should not be necessary in most cases.

With the exception of parallel sections all interesting features
are contained in the MPP-proposal. I would like to split the discussion
on specifying parallelism, synchronization and mapping into three different
topics. Furthermore I would like to see corresponding features to be expessed
in the style of of the current X3H5 proposal, if possible, in order to
be in line with upcoming standards.


\subsubsection{Proposal for MIMD support}

In order to support the spezification of MIMD-type of parallelism the following
features are taken from the "Fortran 77 Binding of X3H5 Model for 
Parallel Programming Constructs": 
\begin{itemize}
    \item   PARALLEL DO construct/directive
    \item   PARALLEL SECTIONS worksharing construct/directive
    \item   NEW statement/directive
\end{itemize}

These constructs are not used with PCF like options for mapping or 
sysnchronisation but are combined with the ON clause for mapping operations
onto the parallel architecture. 

\paragraph{PARALLEL DO}

\subparagraph{Explicit Syntax}

The PARALLEL DO construct is used to specify parallelism amoung the 
iterations of a block of code. The PARALLEL DO construct has the same
syntax as a DO statement. For an directive approach the directive
!HPF$ PARALLEL can be used in front of a do statement.
After the PARALLEL DO statement a new-declaration may be inserted.

A PARALLEL DO construct might be nested with other parallel constructs. 

\subparagraph{Interpretation}

The PARALLEL DO is used to specify parallel execution of the iterations of
a block of code. Each iteration of a PARALLEL DO is an independent unit
of work. The iterations of PARALLEL DO must be data independent. Iterations
are data independent if the storage sequence accociated with each variable
are array element that is assigned a value by each iteration is not referenced
by any other iteration. 

A program is not HPF conforming, if for any iteration a statement is executed,
which causes a transfer of control out of the block defined by the PARALLEL
DO construct. 

The value of the loop index of a PARALLEL DO is undefined outside the scope
of the PARALLEL DO construct. 


\paragraph{PARALLEL SECTIONS}

The parallel sections construct is used to specify parallelism among sections
of code.

\subparagraph{Explicit Syntax}


                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        !HPF$ SECTION
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
structured as
                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        [new-declaration-stmt-list]
        [section-block]
        [section-block-list]
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
where [section-block] is
                                                              \CODE
        !HPF$ SECTION
        [execution-part]
                                                              \EDOC

\subparagraph{Interpretation}

The parallel sections construct is used to specify parallelism among sections
of code. Each section of the code is an independent unit of work. A program
is not standard conforming if during the execution of any parallel sections
construct a transfer of control out of the blocks defined by the Parallel
Sections construct is performed. 
In a standard conforming program the sections of code shall be data 
independent. Sections are data independent if the storage sequence accociated 
with each variable are array element that is assigned a value by each section
is not referenced by any other section. 


\paragraph{Data scoping}

Data objects, which are local to a subroutine, are different between 
distinct units of work, even if the execute the same subroutine.


\paragraph{NEW statement/directive}

The NEW statement/directive allows the user to generate new instances of 
objects with the same name as an object, which can currently be referenced.


\subparagraph{Explicit Syntax}

A [new-declaration-stmt] is
                                                                \CODE
       !HPF$ NEW variable-name-list
                                                                \EDOC

\subparagraph{Coding rules}

A [varable-name] shall not be
\begin{itemize} 
\item    the name of an assumed size array, dummy argument, common block, 
function or entry point
\item    of type character with an assumed length
\item    specified in a SAVE of DATA statement
\item    associated with any object that is shared for this parallel construct.
\end{itemize}

\subparagraph{Interpretation}
 
Listing a variable on a NEW statement causes the object to be explicitly
private for the parallel construct. For each unit of work of the parallel 
construct a new instance of the object is created and referenced with the
specific name. 


\end{document}

From chk@cs.rice.edu  Tue Aug 25 11:44:32 1992
Received: from moe.rice.edu by cs.rice.edu (AA21382); Tue, 25 Aug 92 11:44:32 CDT
Received: from cs.rice.edu by moe.rice.edu (AA10632); Tue, 25 Aug 92 11:44:31 CDT
Received: from DialupEudora (charon.rice.edu) by cs.rice.edu (AA21361); Tue, 25 Aug 92 11:43:45 CDT
Message-Id: <9208251643.AA21361@cs.rice.edu>
Date: Tue, 25 Aug 1992 11:46:20 -0600
To: hpff-forall@rice.edu
From: chk@cs.rice.edu
Subject: Comments on FORALL Tentative Proposal, version of Aug 24

As promised, here are my comments on the current FORALL proposal.

Section 1.2 FORALL Statement

My comments are in the "Consequences" subsections.


Section 1.3 FORALL Construct

My comments are in the "Consequences" subsections.


Section 1.4 INDEPENDENT

My contribution to this section consisted of deleting all references to
INDEPENDENT applied to FORALL (to keep consistent with the HPFF meeting,
where only the DO version was covered).  It was probably a mistake not to
move those to their own section, and I'll do that in the next draft unless
the group working on extending INDEPENDENT makes a proposal in the
meantime.

I agree in principle with Geoffrey Fox that INDEPENDENT is very restrictive
as it is now phrased; there's no way to do reductions, and local variables
are only possible by calling subroutines.  If we define INDEPENDENT in
terms of data dependences, however, I see no way around this.  If anyone
can suggest another definition of INDEPENDENT, it might shed some light.


Section 1.5.1 FORALL with INDEPENDENT Directives

In general, this proposal should either stick to the syntax constraints of
sections 1.2 and 1.3, or explicitly propose changing FORALL to a new
construct.  I am firmly opposed to any statement that has one meaning (and
one set of syntactic constraints) when there is a comment before it, and
another meaning (and constraints) without the comment.  If Min-You wants to
fight for a new FORALL, he's welcome to; I've given up those crusades.

Guy Steele's proposal for FORALL INDEPENDENT is below:
>Let there be a directive
>
>!HPF$INDEPENDENT
>
>that can precede either a DO loop or a FORALL statement.
>It asserts to the compiler that the iterations of the loop
>may be executed independently--that is, in any order, or
>interleaved, or concurrently--without changing the semantics
>of the program.  (The compiler is justified in producing
>a warning if it can prove otherwise.)
>
>!HPF$INDEPENDENT
>      DO I=1,100
>        A(I)=B(P(I))   !I happen to know that P is a permutation
>      END DO
>
>!HPF$INDEPENDENT
>      FORALL (I=1:100) A(I)=A(F(I))
>!I happen to know that F(I) > 100, so synchronization is not
>!needed to delay assignments until every rhs has been computed.
>
>One may apply this directive to a nest of multiple loops
>by listing all the loop variables of the loops in question;
>the loops must be contiguous with the directive and in the
>same order that the variables are listed:
>
>!HPF$INDEPENDENT (I1,I2,I3)
>      DO I1 = ...
>        DO I2 = ...
>          DO I3 = ...
>            DO I4 = ...    !The inner two loops are *not* independent!
>              DO I5 = ...
>                ...
>              END DO
>            END DO
>          END DO
>        END DO
>      END DO
>
>In the case of a FORALL, any of the variables may be mentioned:
>
>!HPF$INDEPENDENT (I1,I3)
>      FORALL(I1=...,I2=...,I3=...) ...
>
>This means that for any given values for I1 and I3,
>all the right-hand sides for all values of I2 must
>be computed before any assignment are done for that
>specific pair of (I1,I3) values; but assignments for
>one pair of (I1,I3) values need not wait for rhs
>evaluation for a different pair of (I1,I3) values.

Thus, Guy's proposal was a pure (presumably unchecked) assertion that no
inter-instantiation (not inter-iteration, this isn't a sequential loop)
race conditions occured.  This is slightly at odds with Min-You's
definition that "There is a synchronization at each of these directives." 
I would favor an assertion in Guy's style, and I think that Min-You's later
descriptions of data independence do this without relying on the
synchronization.


Section 1.5.2 Pointer Assignments

Sounds reasonable to me.  I'll include the appropriate changes in the next
draft unless controversy erupts.


Section 1.5.3 ALLOCATE in FORALL

Semantically, this is probably OK.

Occam's Razor warns us against putting too many new features in.

Are other implementors as happy about implementing parallel allocation as
Guy is?  And would anybody like to comment on how this meshes with
allocating distributed arrays / distributing allocatable arrays?

If no controversy erupts, I'll put this into the next draft too.  I'm
uneasy doing so, however.


Section 1.5.4 Generalized Data References

I'm not sure I understand the example; are + and * applied to TYPE(MONARCH)
supposed to be defined somewhere else?

I'd like to see some more examples of using these before I make a real
comment, if just to get a feel for how useful they are.

This is a direct extension to F90, not a FORALL/INDEPENDENT matter.  My
priorities are to handle the FORALL first (actually, it seems to be going
well), INDEPENDENT and other assertions next, and array syntax (or other
F90) changes last.  Yes, I went ahead and added elemental functions applied
to arrays to the section below.  Those parts are independent of the main
FORALL part, and if push comes to shove I'll drop the array syntax to get
the FORALL part passed.


Section 1.5.5 Elemental Functions

This is a little more general than Fortran 90 elementals were supposed to
be.  In particular these elementals can expand some of their arguments and
not others, for example
        INTERFACE 
          !HPF$ ELEMENTAL 
          REAL FUNCTION FOO( X, Y, Z )
            REAL, INTENT(IN) :: X, Y, Z
          END FUNCTION
        END INTERFACE

        REAL A(100), B(100), C(100)
        REAL P, Q, R

        A(1:N) = FOO( A(1:N), B(1:N), C(1:N) )  ! OK
        P = FOO( P, Q, R )      ! OK
        A(1:N) = FOO( A(1:N), Q, R )    ! OK
        A(1:5) = FOO( A(1:10), B(1:10), C(1:10)      ! ERROR
The intent was to allow the kind of array - scalar matching possible in
array operations like
        A(1:N) = B(1:N) * P
I can't claim that this improved the proposal's readability quotient.  I
stand ready to go back to Fortran 8X elementals if that's the consensus.

Similarly, I think that allowing nonscalar arguments and returns is
valuable for elemental functions called from FORALL, but difficult to
define for array expressions.  I've made the corresponding restrictions,
and am ready to back off to everything scalar if that's what the group
wants.

Note that ELEMENTAL in INTERFACE blocks is an assertion about behavior, not
constraints about what can be in the function.  For example, by my reading 
        ! LOTS OF DECLARATIONS
        FORALL ( I = 1:N ) A(I) = COUNT_ON_ME(I, A(I))
        ...
        REAL FUNCTION COUNT_ON_ME( K, Y )
        INTEGER K
        REAL Y
        COMMON /ME/ ICOUNT
        IF ( Y > 0 ) IOCUNT = ICOUNT+1
        ...
        END
would be OK iff A(I) <= 0 for all I.  Yet another correctness condition
that Fortran can't necessarily check.


Section 1.5.6 A Proposal for MIMD Support in HPF

Generally, I view MIMD support as a problem for HPF round 2.  However, to
prime the pumps for that debate (and maybe bring some of this into HPF 1),
here goes...

When this proposal says "specify parallelism", which of the following does
it mean?
        1. Assert these operations are independent
        2. Command the compiler to run these sections in parallel.
        3. Both of the above.
        4. Something else. (If so, please explain...)
Claim: If the answer is 1., then PARALLEL DO is equivalent to DO with
INDEPENDENT.  I would support assertions of this type.
Claim: If the answer is 2., then lots of work has to be done to make this
work; for example, what does the command mean when all processors are busy?
 I believe this answer will produce the stalemate that PCF did.
I don't know how I would react to answers of 3. or 4.  Probably I would
reject 3 as overspecifying the program (specifying which section of code
should be parallel is surely machine-dependent).

Seeing the full definitions of elemental functions, FORALLs, INDEPENDENT,
BEGIN/END INDEPENDENT, and these PARALLEL constructs, I'm beginning to
think we should make "data independent" a basic term.  At least then we
wouldn't have to keep repeating the definition.  (Whether it is a good idea
to define semantics based on data dependence is a different question.)

The NEW construct: Presuming that we have the rule that directives are
assertions, what does NEW assert?  Does it change the semantics of the
program (i.e. are there programs that will produce incorrect output if NEW
is inserted)?  What is the scope of NEW (one loop?  one procedure?)?

I assume that the intent is that INDEPENDENT (or PARALLEL) assertions would
take NEW into account; that is, dependences related to NEW variables do not
make a loop serial, for example.  I'd be grateful if someone would write up
a precise definiton of "data independent" that took this (and procedure
local variables, and allocated pointers, etc.) into account.

I favor some form of NEW variables, but we have to realize that these will
require some backpatching to definitions we've already made.

                                                Chuck


From wu@cs.buffalo.edu  Tue Aug 25 22:52:22 1992
Received: from moe.rice.edu by cs.rice.edu (AA06363); Tue, 25 Aug 92 22:52:22 CDT
Received: from ruby.cs.Buffalo.EDU by moe.rice.edu (AA15567); Tue, 25 Aug 92 22:52:21 CDT
Received: by ruby.cs.Buffalo.EDU (4.1/1.01)
	id AA01380; Tue, 25 Aug 92 23:52:27 EDT
Date: Tue, 25 Aug 92 23:52:27 EDT
From: wu@cs.buffalo.edu (Min-You Wu)
Message-Id: <9208260352.AA01380@ruby.cs.Buffalo.EDU>
To: chk@cs.rice.edu, hpff-forall@rice.edu
Subject: Re:  Comments on FORALL Tentative Proposal, version of Aug 24
Cc: wu@cs.buffalo.edu


> Section 1.5.1 FORALL with INDEPENDENT Directives
> 
> In general, this proposal should either stick to the syntax constraints of
> sections 1.2 and 1.3, or explicitly propose changing FORALL to a new
> construct.  I am firmly opposed to any statement that has one meaning (and
> one set of syntactic constraints) when there is a comment before it, and
> another meaning (and constraints) without the comment.  If Min-You wants to
> fight for a new FORALL, he's welcome to; I've given up those crusades.
> 

My revised proposal (July 25) is consistent with the syntax of David Loveman 
and Chuck Koelbel's proposal.  Although I believe FORALL should be a new 
parallel construct instead of an extended array assignment, I don't want 
to propose the new FORALL now.  My current proposal for HPF is simply 
extended Guy's INDEPENDENT to a pair of BEGIN INDEPENDENT and END INDEPENDENT.

> 
> Thus, Guy's proposal was a pure (presumably unchecked) assertion that no
> inter-instantiation (not inter-iteration, this isn't a sequential loop)
> race conditions occured.  This is slightly at odds with Min-You's
> definition that "There is a synchronization at each of these directives." 
> I would favor an assertion in Guy's style, and I think that Min-You's later
> descriptions of data independence do this without relying on the
> synchronization.
> 

Since the statement before BEGIN INDEPENDENT and the one after END INDEPENDENT
already have synchronization, there is no problem to eliminate the sentence: 
"There is a synchronization at each of these directives." 


Min-You

From chk@cs.rice.edu  Wed Aug 26 11:04:38 1992
Received: from moe.rice.edu by cs.rice.edu (AA15960); Wed, 26 Aug 92 11:04:38 CDT
Received: from cs.rice.edu by moe.rice.edu (AA19476); Wed, 26 Aug 92 11:04:38 CDT
Received: from  by cs.rice.edu (AB15748); Wed, 26 Aug 92 11:04:25 CDT
Message-Id: <9208261604.AB15748@cs.rice.edu>
Date: Wed, 26 Aug 1992 11:07:03 -0600
To: wu@cs.buffalo.edu (Min-You Wu)
From: chk@cs.rice.edu
Subject: Re:  Comments on FORALL Tentative Proposal, version of Aug 24
Cc: hpff-forall@rice.edu

>> Section 1.5.1 FORALL with INDEPENDENT Directives
>> 
>My revised proposal (July 25) is consistent with the syntax of David Loveman 
>and Chuck Koelbel's proposal.  Although I believe FORALL should be a new 
>parallel construct instead of an extended array assignment, I don't want 
>to propose the new FORALL now.  My current proposal for HPF is simply 
>extended Guy's INDEPENDENT to a pair of BEGIN INDEPENDENT and END INDEPENDENT.

I apparently was working from a stale copy.  I've asked Min-You to send me
a new one.

>
>Since the statement before BEGIN INDEPENDENT and the one after END INDEPENDENT
>already have synchronization, there is no problem to eliminate the sentence: 
>"There is a synchronization at each of these directives." 
>
>
>Min-You

OK, I think I agree with everything in that proposal now.  Does anybody
else have comments?  (I know somebody was working on an extended
INDEPENDENT proposal...)

                                                Chuck


From @ecs.southampton.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Mon Aug 31 09:11:20 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA03811); Mon, 31 Aug 92 09:11:20 CDT
Via: uk.ac.southampton.ecs; Mon, 31 Aug 1992 14:54:52 +0100
Via: camra.ecs.soton.ac.uk; Mon, 31 Aug 92 14:49:35 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Mon, 31 Aug 92 14:56:32 BST
Date: Mon, 31 Aug 92 14:53:13 BST
Message-Id: <6213.9208311353@bacchus.ecs.soton.ac.uk>
To: hpff-forall@cs.rice.edu
Subject: Revised 'Elemental functions' proposal

Here is a modified draft of the 'elemental functions' proposal
(now called 'pure procedures').  It should replace the section
on 'ELEMENTAL Functions' in the document 'Tentative Proposal: 
High Performance Fortran FORALL and INDEPENDENT Proposal', 
24 August (section 1.5.5), which we believe contained some technical
problems which this version hopefully cures.

The intent of this proposal is to define 'pure' (i.e. side-effect
free) functions which may be used freely and safely in FORALL - and
in any normal F90 context - and whose dummy arguments and results
can be array-valued.  A separate aspect of the proposal is that pure
procedures may be used *elementally* (i.e. like F90 elemental
intrinsic procedures) *provided* their arguments and result satisfy 
the additional constraint that they are *scalar*.  This avoids
introducing into Fortran the controversial concept of 'array-of-arrays'
(which Fortran seems to assiduously avoid) present in my original 
proposal, and still allows the flexibility of array-valued functions 
in FORALL.  Furthermore, it does not reduce the functionality
(at least for functions) -- if the programmer wants the effect of
elemental invocation of functions that have array-valued dummy args 
or result, he can obtain this with FORALL.

I commend this proposal to the house!

(BTW, I'm going on holiday now until Sept 21, so won't be able to 
respond to any questions or comments until after the next HPF meeting. 
I hope for good news about 'pure' procedures when I return! :-)
Have a good meeting!)

              John Merlin.
----------------------------------------------------------------------

\subsection{PURE Procedures and Elemental Invocation\protect\footnote{Version 
of August 28, 
1992 - John Merlin, University of Southampton, and Chuck Koelbel, Rice 
University}}

\label{forall-elemental}

The intent of this counter-proposal is to further restrict functions 
called from within FORALL so that they have no side effects.
This is more restrictive than the constraints on function calls in 
section \ref{forall-stmt}; however, the definition is simpler and 
presumably clearer, as well as providing complete security against 
non-deterministic behaviour.

A separate aspect of this proposal is to extend the concept of 
`elemental procedures', which in Fortran~90 are restricted to a subset 
of the intrinsic procedures, so that they can be user-defined.


\subsubsection{General Form of Element Array Assignment}

To the definition of {\it forall-assignment}, add the following:
\begin{quotation}

\noindent Constraint: If any subexpression in {\it expr}, {\it 
array-element}, or {\it array-section} is a {\it function-reference}, 
then the {\it function-name} must be a `pure' function as defined below.
\end{quotation}


\subsubsection{Interpretation of Element Array Assignments}  

Change the paragraphs after the step-by-step interpretation to the 
following:
\begin{quotation}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.
By the nature of PURE functions, no expression evaluations can 
have any affect on other expressions, either for the same combination of 
{\it subscript-name} values or for a different combination.
\end{quotation}
 
\subsubsection{PURE Procedures}

A `pure' procedure is one which produces no side effects, except for 
assigning to dummy arguments of INTENT (OUT) or (INOUT) in the case of a
`pure' subroutine.
It may be used in any way that a normal procedure of its type may 
be used.
In addition, pure functions may be used in a FORALL statement or construct.
Also, a pure procedure whose dummy arguments (and, in the case of a function,
result) are all scalar may be used `elementally', that is, it may be
applied to conforming array arguments in a similar manner to the elemental
intrinsic procedures defined in Fortran~90.

If a procedure is used in a context that requires it to be pure
(namely, it is used elementally or in a FORALL statement or construct),
then its interface must be explicit, and it must be declared to be pure 
in both its definition and interface.  The form of this declaration is 
a directive preceding the {\it function-stmt\/} or {\it subroutine-stmt\/}:
                                                                 \BNF
pure-directive \IS !HPF$ PURE
                                                                 \FNB

To define pure functions, Rule~R1215 of the Fortran~90 standard is changed 
to:
                                                                 \BNF
function-subprogram \IS [pure-directive]
                        function-stmt
						[specification-part]
						[execution-part]
						[internal-subprogram-part]
						end-function-stmt
                                                                \FNB
with the following additional constraints:

\noindent
Constraint: The dummy arguments of a pure function must have INTENT(IN).

\noindent
Constraint: The local variables of a pure function must not have the SAVE 
attribute.

\noindent
Constraint: A pure function must not contain assignments to global 
data objects.

\noindent
Constraint: A pure function must not contain I/O statements.

\noindent
Constraint: Any procedure called from a pure function must be pure.

\noindent
Constraint: A pure function must not contain data mapping directives.

To assert that a function is pure, a {\it pure-directive\/} must be given.

\vspace*{1ex}

To define pure subroutines, Rule~R1219 is changed to:
                                                                 \BNF
subroutine-subprogram \IS [pure-directive]
                        subroutine-stmt
						[specification-part]
						[execution-part]
						[internal-subprogram-part]
						end-subroutine-stmt
                                                                \FNB
with the following additional constraints:

\noindent
Constraint: The dummy arguments of a pure subroutine must have explicit
INTENT.

\noindent
Constraint: The local variables of a pure subroutine must not have the SAVE 
attribute.

\noindent
Constraint: A pure subroutine must not contain assignments to global 
data objects.

\noindent
Constraint: A pure subroutine must not contain I/O statements.

\noindent
Constraint: Any procedure called from a pure subroutine must be pure.

\noindent
Constraint: A pure subroutine must not contain data mapping directives.

To assert that a subroutine is pure, a {\it pure-directive\/} must be given.
\vspace*{1ex}

To define interface specifications for pure procedures, Rule~R1204 is 
changed to:
                                                                \BNF
interface-body \IS [pure-directive]
                   function-stmt
				   [specification-part]
				   end-function-stmt
			   \OR [pure-directive]
                   subroutine-stmt
				   [specification-part]
				   end-subroutine-stmt
                                                                \FNB

\noindent
Constraint: An {\it interface-body\/} of a pure subroutine must specify
the intents of all dummy arguments.

When applied to a procedure interface body, the {\it pure-directive} asserts 
that the procedure satisfies the constraints required of pure procedures.
Because of the limited information provided by an interface specification,
this assertion can only be checked to a limited extent in this context
(i.e. with respect to argument intent and the absence of data mapping 
directives).

If a procedure is used in a context that requires it to be pure
(namely, it is used elementally or in a FORALL statement or construct),
then its interface must be explicit.  If the interface is provided by 
means of an interface block, the {\it interface-body\/} must contain a 
{\it pure-directive\/}.

If an interface body contains a {\it pure-directive\/}, then the 
corresponding procedure definition must also contain it, though the 
reverse is not true.  When a procedure definition with a {\it pure-directive\/}
is compiled, the compiler may check that it satisfies the necessary 
constraints.

\vspace*{1ex}

To define elemental invocations of pure procedures, the following 
extra constraint is added after Rules R1209 ({\it function-reference\/}) 
and R1210 ({\it call-stmt\/}):
\begin{quotation}

\noindent
Constraint: A non-intrinsic function (subroutine) that is invoked 
elementally must be a pure function (subroutine) with scalar dummy 
arguments (and result), and its interface must be explicit.
\end{quotation}

Additionally, the beginning of section 12.4.3 should be changed to:
\begin{quotation}

A reference to {\em a pure function or\/} an elemental intrinsic function
is an elemental reference if\ldots
\end{quotation}
and the beginning of section 12.4.5 to:
\begin{quotation}

A reference to {\em a pure subroutine or\/} an elemental intrinsic subroutine
is an elemental reference if\ldots
\end{quotation}
(where the additional words are italicised).


\paragraph{Comments}
Detailed comments on pure procedures:
\begin{itemize}
\item The constraints for a pure procedure guarantee freedom from 
side-effects, thus ensuring that it can be invoked concurrently at each 
`element' of an array (where an `element' may itself be a data-structure, 
including an array).

\item An earlier draft of this proposal contained a constraint disallowing 
pure procedures from accessing global data objects, particularly distributed 
data objects.
This constraint has been dropped as inessential to the side-effect freedom 
that the HPF committee requested.
However, it may well be that some machines will have great difficulty 
implementing FORALL without this constraint.\footnote{            %
%
One of us (JHM) is still in favour of this additional constraint for a 
number of reasons: 
(i) aesthetically, it is in keeping with the
nature of a `pure' function, i.e. a function in the mathematical
sense, and in practical terms it imposes no real restrictions on the 
programmer, as global data can be passed-in via the argument list; 
(ii) without this constraint HPF programs can no longer be implemented 
by pure message-passing, or at least not efficiently, i.e. without
sequentialising FORALL statements containing function calls and greatly
complicating their implementation; 
(iii) absence of this restriction may inhibit optimisation of FORALLs
and array assignments, as the optimisation of assigning the {\it expr\/}
directly to the assignment variable rather than to a temporary intermediate 
array may now require interprocedural analysis rather than just local 
analysis.  However, JHM does not want this to be a make-or-break point!}


\item The constraints are such that a compiler cannot deduce from a 
procedure's interface body whether it can validly be used as a pure 
procedure, as that depends partly on its local declarations and internal 
operations.  Hence, it is necessary to use a specifier like `PURE' in the 
interface body to identify such procedures.  The compiler can 
check that the procedure satisfies all the necessary constraints 
when it compiles the procedure itself (provided it also has the `PURE' 
specifier).

\end{itemize}

As well as using pure functions in FORALL, a pure procedure can also be 
used `elementally', provided it satisfies the additional constraint that 
its dummy arguments (and, in the case of a function, its result) are scalar.

Fortran 90 introduces the concept of `elemental procedures',
which are defined for scalar arguments but may also be applied to
conforming array-valued arguments.  
For an elemental function,
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g SIN(X).
For an elemental subroutine, the effect on each element of an INTENT(OUT) or 
INTENT(INOUT) array argument is would be obtained by calling the 
subroutine with the corresponding elements of the arguments.
An example is the intrinsic subroutine MVBITS.

However, Fortran~90 restricts the application of elemental
procedures to a subset of the intrinsic procedures --- the programmer
cannot define his own.  Obviously, elemental invocation is equivalent to 
concurrent invocation, so extra constraints beyond those for normal
Fortran procedures are required to allow this to be done safely
(e.g. deterministically).  Appropriate constraints in this case are
the same as those for function calls in FORALL---indeed, the latter are 
virtually equivalent to elemental invocation in an array assignment, 
given the close correspondence between FORALL and array assignment.
Hence, we propose that pure procedures may also be invoked elementally,
subject to the additional constraint that their dummy arguments 
(and, for a function, result) are scalar.

Comment:
\begin{itemize}
	\item The original draft proposed allowing pure procedures 
to be invoked elementally even if their dummy arguments or results 
were array-valued.  These provisions have been dropped to avoid 
promoting storage order to a higher level in Fortran~90
(i.e.\ to avoid introducing the concept of `arrays-if-arrays', 
which Fortran~90 seems to strenuously avoid!)   In practical terms,
the current proposal provides the same functionality as the original 
one for functions, though not for subroutines.  If a programmer wants 
elemental function behaviour, but also wants the `elements' to be
array-valued, this can be achieved using FORALL.
	\item In typical FORALL or elemental usage, a pure procedure 
would be called independently in each process, and its dummy arguments 
would be associated with `elements' local to that process.  
This is the reason for disallowing data mapping directives within the 
bodies of such procedures.
Note that, particularly in elemental invocations, the actual arguments
can be distributed arrays which need not be `co-distributed'; if not,
a typical implementation would in general perform all data communications 
prior to calling the procedure, and would then pass-in the required 
elements locally via its argument list.
\end{itemize}


\paragraph{Uses and MIMD aspects}

\subparagraph{FORALL statements and constructs}

Pure functions may be used in expressions in FORALL statements and 
constructs, unlike general functions.  Because a {\it forall-assignment}
may be an {\it array-assignment} the pure function can have an array result.  
For example, if a certain problem
is data-parallel over a 2d grid, and the data structure at each grid
point is a vector of length 3 (2d QCD?), we could have:
                                                              \CODE
    INTERFACE
!HPF$ PURE
      FUNCTION f (x)
        REAL, DIMENSION(3) :: f, x
      END FUNCTION f
    END INTERFACE
    REAL  v (3,10,10)
    ...
    FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 
                                                              \EDOC


\subparagraph{MIMD parallelism}

A natural way of obtaining some MIMD parallelism is by
means of branches within a pure function which depend on argument 
values.  These branches can be governed by content-based or index-based 
conditionals (the latter in a FORALL context).  For example:
                                                              \CODE
!HPF$ PURE
    FUNCTION f (x, i)
      IF (x > 0) THEN     ! content-based conditional
        ...
      ELSE IF (i==1 .OR. i==n) THEN    ! index-based conditional
        ...
      ENDIF
    END FUNCTION

    ...
    FORALL (i=1:n)  x (i) = f(x(i), i)
    ...
                                                              \EDOC
Content-based conditionals can be exploited generally, including in
array assignments (see below), which may sometimes obviate the need for 
WHERE-ELSEWHERE constructs or sequences of masked FORALLs with their 
potential synchronisation overhead. 

\subparagraph{Elemental function references}

Pure functions with scalar dummy arguments and result can be invoked
{\em elementally\/} in array expressions with the 
same interpretation as Fortran~90 elemental intrinsic functions.
This interpretation (Fortran~90 standard, Section~13.2.1) is as follows:
\begin{quote}
If a generic name or a specific name is used to reference an elemental
intrinsic function, the shape of the result is the same as the shape of 
the argument with the greatest rank.
If the arguments are all scalar, the result is scalar.
For those elemental intrinsic functions that have more than one argument, 
all arguments must be conformable.
In the array-valued case, the values of the elements, if any of the 
result are the same as would have been obtained if the scalar-valued 
function had been applied separately, in any order, to corresponding 
elements of each argument. [An argument called KIND must be specified as 
a scalar integer initialization expression and must specify a 
representation method for the function result that exists on the processor.]
\end{quote}
(The last sentence of this section, enclosed in square brackets,
does not apply to elemental references of user-defined pure functions.)

Examples of elemental usage are:
                                                              \CODE
    INTERFACE 
!HPF$ PURE
      REAL FUNCTION foo (x, y, z)
        REAL, INTENT(IN) :: x, y, z
      END FUNCTION
    END INTERFACE

    REAL a(100), b(100), c(100)
    REAL p, q, r

    p      = foo (p, q, r)         ! OK - scalar call
    a(1:n) = foo (a(1:n), b(1:n), c(1:n))    ! OK - elemental call
    a(1:n) = foo (a(1:n), q, r)    ! OK - scalar args 'promoted' to arrays
    a(1:n) = foo (p, q, r)         ! OK - scalar result assigned to array
                                                              \EDOC
An example involving a WHERE-ELSEWHERE construct is:
                                                              \CODE
    INTERFACE
!HPF$ PURE
      REAL FUNCTION f_egde (x)
         REAL x
      END FUNCTION f_edge
!HPF$ PURE
      REAL FUNCTION f_interior (x)
        REAL x
      END FUNCTION f_interior
    END INTERFACE

    REAL a (10,10)
    LOGICAL edges (10,10)
! ...  initialise mask array 'edges' ...

    WHERE (edges)
      a = f_egde (a)
    ELSE WHERE
      a = f_interior (a)
    END WHERE
                                                              \EDOC
(Incidentally, this example also presents the possibility of obtaining
MIMD parallelism, if the compiler can establish that the two assignments 
are independent and so does not force a synchronisation at the ELSEWHERE
statement.)

\subparagraph{Elemental subroutine references}

Pure subroutines with scalar dummy arguments can be invoked 
{\em elementally\/} with the same interpretation as Fortran~90 
elemental intrinsic subroutines (of which there is only one).
This interpretation (Fortran~90 standard, Section~13.2.2) is as follows:
\begin{quote}
An elemental subroutine is one that is specified for scalar arguments, 
but may be applied to array arguments.  In a reference to an elemental 
intrinsic subroutine, either all actual arguments must be scalar or all 
INTENT(OUT) and INTENT(INOUT) arguments must be arrays of the same shape 
and the remaining arguments must be conformable with them. In the case 
that the INTENT(OUT) and INTENT(INOUT) arguments are arrays, the values 
of the elements, if any, of the results are the same as would be obtained 
if the subroutine with scalar arguments were applied separately, in any 
order, to corresponding elements of each argument.
\end{quote}

\subparagraph{Advantages of elemental usage}

User-defined elemental procedures have several potential advantages.
They would be a very convenient programming tool, as the same procedure 
can be applied to actual arguments of any rank.

In addition, the implementation of an elemental function returning an
array-valued result in an array expression is likely to be more 
efficient than that of an equivalent array function.  One reason is 
that it requires less temporary storage for the result (i.e.\ storage 
for a single result versus storage for the entire array of results).  
Another is that it saves on looping if an array expression is 
implemented by sequential iteration over the component elemental 
expressions (as may be done for the `segment' of the array expression 
local to each process).  This is because, in the sequential version, 
the elemental function can be invoked elementally in situ within the 
expression.  The array function, on the other hand, must be executed 
before the expression is evaluated, storing its result in a temporary 
array for use within the expression.  Looping is then required during 
the execution of the array function body as well as the expression 
evaluation.

From @ecs.southampton.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Mon Aug 31 09:46:26 1992
Received: from sun2.nsfnet-relay.ac.uk by cs.rice.edu (AA04698); Mon, 31 Aug 92 09:46:26 CDT
Via: uk.ac.southampton.ecs; Mon, 31 Aug 1992 15:45:51 +0100
Via: camra.ecs.soton.ac.uk; Mon, 31 Aug 92 15:40:35 BST
From: John Merlin <jhm@ecs.southampton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Mon, 31 Aug 92 15:47:33 BST
Date: Mon, 31 Aug 92 15:44:14 BST
Message-Id: <6253.9208311444@bacchus.ecs.soton.ac.uk>
To: hpff-forall@cs.rice.edu
Subject: Comments on last-but-one FORALL proposal!

I originally posted this (comments on the previous FORALL proposal
by David Loveman and Chuck Koelbel) on jJuly 21, which by dint
of bad timing was too late for people to read before the last HPF
meeting.  

While some of the comments have become obsolete with the new proposal, 
many still apply, so I've taken the liberty of posting it again in 
advance of the next meeting.  If you've recently read it and can 
remember its contents, or don't want to remember its contents, 
please ignore it.

-- John Merlin.
---------------------------------------------------------------------
Let me start with an obvious and uncontroversial observation: 
that a FORALL statement without a mask is like a generalised array 
assignment, and a FORALL with a mask is like a generalised masked
array assignment (i.e. WHERE statement).  Every array assignment, 
masked or not, can be written as a FORALL, and (I would maintain) 
sufficiently restricted FORALL statements should be expressible as 
array assignments or WHERE statements.  I think it's a good idea to 
keep this duality in mind and try to maintain it wherever possible.  
I mention this as it underlies some of my comments.


(1)   With Chuck's limitations on the contents of a FORALL construct,
this correspondence extends to FORALL constructs, which are equivalent
to a sequence of generalised array assignments or a WHERE construct.
I think that's a good reason for the limitations!

Incidentally, this correspondence leads to my first observation:
since Fortran 90 has a WHERE - ELSEWHERE construct, perhaps the 
FORALL construct should be extended to a FORALL - ELSEFORALL construct.  
(Of course, the ELSE FORALL part is only relevant if the FORALL has a mask; 
if not, the ELSE FORALL isn't executed).


(2)  The only restrictions I can see on the use of 'subscript-name's
in the LHS of a forall assignment are that every subscript name must 
be referenced, and that no array element must be assigned a value more 
than once.  Do you really want to keep it this general?

For example, can more than one subscript name appear in a single 
subscript expression?  E.g.:

	FORALL (i=1:n:2, j=0:1)  a(i+j) = ...

Can a subscript name appear in a subscript triplet?  E.g.:

	FORALL (i=1:n:2)  a(i:i+1) = ...

(Perhaps the correspondence with array assignments suggests that 
no more than one subscript name should appear in each subscript expression 
of the assignment variable, and that subscript names should not appear in a 
subscript triplet).

I assume that a subscript name can be used in more than one dimension
of the assignment variable (e.g. FORALL (i=1:n) b(i,i) ...), as you give 
an example of this, but perhaps it should be explicitly stated.


(3)  The use of the WHERE statement and construct within a FORALL
seems redundant, as the FORALL construct is already a generalised 
WHERE construct!  Everything you can express with an embedded WHERE 
can be expressed without it by scalarising the sectional dimensions 
and absorbing the mask in the forall-stmt.  My objection is that it
opens up multiple ways of expressing the same thing (which is already 
a big flaw with Fortran).  E.g. if A is 2 dimensional,

	WHERE (a > 0)  a = ...

can be written as:

	FORALL (i=1:n)
	  WHERE (a(i,:) > 0)  a(i,:) = ...
or:

	FORALL (i=1:n, j=1:n, a(i,j)>0)  a(i,j) = ...

or even as:

	FORALL (i=1:n, j=1:n)
	  WHERE (a(i:i, j:j) > 0)  a(i:i, j:j) = ...

etc.

In contrast to Arthur Veen, I'd advocate dropping the nested WHERE 
stmt and construct in FORALL in favour of the scalar mask expression in the 
forall-stmt, as the latter can express more general masks (as you've already
said in your reply).

I suppose an advantage with having a WHERE construct within a FORALL 
construct is that you can then have an ELSEWHERE part.  However, this
would be unnecessary if the FORALL construct had an optional ELSE FORALL, 
as I've already suggested.

A minor aesthetic argument against WHERE in FORALL is that Fortran 90 
doesn't allow nested WHERE constructs (for whatever reason), 
so it seems inconsistent to allow them to be nested within FORALL, 
which effectively amounts to the same thing.

Also, WHERE introduces a small inconsistency in FORALLs, namely, that
a normal 'forall-assignment' can be array valued and can have any shape,
but within the WHERE the shape is no longer free--it must conform 
with that of the WHERE mask expression.


(4) In a similar vein, nested FORALL's seem redundant.  It appears
that the main reason for using them is to obtain non-rectangular
index domains.  If so, why not just allow this to be achieved in a 
single 'forall-triplet-spec-list' and have done with it (i.e. allow
each forall-triplet to refer to previous subscript names, as
permitted in some other proposals).  Are there any advantages
in using nested FORALLs to achieve this effect?

Note that I'm not strongly against nested WHERE and FORALL---it's
just my gut reaction that they're superfluous, and I suspect it may 
be the reaction of users too.


<<< N.B. The following stuff in particular may not be so relevant
to the new proposal. >>>

Far more important is the consideration of how functions are handled 
within FORALL.  Basically, your proposal imposes no more constraints 
than already exist in Fortran 90, namely that there should be no side 
effects that affect the evaluation of the rest of the assignment 
(here extended to cover forall-assignment).
I can see the obvious attraction of this approach, but I believe it's 
inadequate for a number of reasons:

(i) You allow arbitrary access (read and write) to distributed 
global (i.e. common block) data, which in the FORALL context requires 
demand-driven communication for it's implementation.  This probably
poses no problems on shared memory architectures, but would require 
considerable software support on many distributed-memory message-passing 
platforms, with a big performance overhead.  I don't think this
requirement appears anywhere else in HPF, and I'm not convinced
that all vendors/implementors would want to support it (as it wouldn't
be High Performance).  If not, HPF programs would be *non-portable*.

>>> Extra comment: This isn't the whole extent of the problem.
>>> What if a FORALL function is just invoked once on a single process,
>>> but internally it declares distributed local variables.  Using any
>>> type of distributed data within functions callable in FORALL seems
>>> to be a minefield.

(ii) Your constraints permit non-determinism (e.g. multiple writes 
to the same global memory location, provided it's not read within 
the same assignment; non-deterministic I/O).
However, it seems that Fortran 90 tries assiduously to avoid 
non-determinism in its array syntax (right now, I can't think of any 
way it can arise via array syntax in standard-conforming programs---but 
I may have overlooked something!).  Also, apart from function calls, you 
extend this principle to FORALLs by not allowing multiple writes to the 
same element of a forall-assignment variable.


(i) & (ii) raise the spectres of non-portability, deadlock and non-determinism,
which appear nowhere else in HPF (as far as I can see).
For these reasons I think functions in FORALL shouldn't be allowed
to perform I/O, and access to global data should be restricted to read-only
access of non-distributed data.  (In fact, perhaps they shouldn't access
global data at all---it can always be passed-in as an argument).


(iii) I'd like to stick to my guns on the proposal that such functions
should be denoted by a directive like:

>>> N.B. Refs to 'elemental' in the original message have been replaced
>>> by 'pure' here, in line with the new proposal.

	!HPF$ PURE function-name

appearing in the function's interface and definition.
This would greatly simplify the job of checking these functions, as well
as making their characteristics and purpose obvious to programmers.

The simplification of checking resulting from this directive is fairly
obvious.  With the directive, checking can be done locally when the
function is compiled; if it calls other functions, they too must be 
'pure', which is established by the presence or absence of the directive 
in the function's interface body.  Without the directive, it appears 
that a preliminary pass thought the whole input program must be performed 
to establish whether each function is or is not 'pure', with backpatching 
required to fill in information when a function calls other functions, 
before the usage of any function in a FORALL can be checked.  Also, what 
about separate compilation?

I believe that checking is particularly desirable in the case of 'pure'
functions, for incorrect usage could result in deadlock, which can be 
very difficult for the user to track down.


                    Regards,
                         John Merlin.

From chk@cs.rice.edu  Mon Aug 31 14:17:24 1992
Received: from moe.rice.edu by cs.rice.edu (AA12573); Mon, 31 Aug 92 14:17:24 CDT
Received: from cs.rice.edu by moe.rice.edu (AA02261); Mon, 31 Aug 92 14:17:24 CDT
Received: from  by cs.rice.edu (AB12499); Mon, 31 Aug 92 14:16:56 CDT
Message-Id: <9208311916.AB12499@cs.rice.edu>
Date: Mon, 31 Aug 1992 14:19:38 -0600
To: hpff-forall@rice.edu
From: chk@cs.rice.edu
Subject: And now for something completely different (comments on July minutes!)


>From: John Merlin <jhm@ecs.southampton.ac.uk>
>Date: Mon, 31 Aug 92 13:47:25 BST
>To: hpff-distribute@cs.rice.edu
>Subject: And now for something completely different (comments on July minutes!)
>
>Comments and questions on July HPFF minutes.
>--------------------------------------------
>
>Sorry to interject this into the fascinating discussion on Templates, 
>but since I wasn't present at the July HPFF meeting, here are my
>belated comments and questions on matters arising from the minutes.
>(BTW, thanks, Chuck, for your excellent and detailed minutes.  
>I reckon I've got almost as much out of reading them as I would have 
>done from actually attending the meeting!)
>
>I've posted this just to the 'hpff-distribute' group - though that's 
>probably wrong as some of these items relate to the 'forall' group - 
>on the assumption that interested parties read both groups and wouldn't 
>want to see the same thing twice.  If bits of it don't reach who they 
>should I'd be grateful if someone would forward them for me.  
>(BTW -- where should I post this -- I assume I can't post replies 
>directly to 'hpff'?).
>
>Anyway, here goes.

Hi, John

I've edited out the distribution parts of you message, and am forwarding
the rest to hpff-forall.  The suggested address for replying to any message
is the appropriate subgroup list (there's nothing to prevent you from
sending to "hpff@rice.edu", but I don't recommend it unless the message is
really generally important!)

>> FORALL and INDEPENDENT
>> 
>>                                       ... There was some
>> discussion whether the evaluation of functions in right-hand
>> sides following these rules would be generally identical to
>> Fortran 90 array assignments. A consensus eventually
>> developed that there were two appropriate analogies: user
>> functions returning arrays, which are executed once, and
>> intrinsic elemental functions, which are executed once for
>> each element. The semantics given were consistent with
>> elemental functions, satisfying the group...
>
>I don't want to flog this issue to death, but the conditions on FORALL
>functions stated in the original proposals (viz, taken from later-on
>in the minutes):
>
>>      Function calls are allowed in FORALL subject to
>>        the following conditions:
>>        *  Side effects cannot affect values computed
>>           by the same statement in other
>>           instantiations
>>        *  Side effects cannot affect other
>>           subexpressions in the same statement on
>>           the same instantiation
>
>are *not* equivalent to elemental functions.  The latter have no
>side-effects whatever.  The above conditions allow side effects
>to variables not referenced by the statement in which the function
>is called, which can lead to non-determinism if the function is invoked 
>multiple times concurrently (e.g. multiple assignments, in any order, 
>to the same global variable).
>
>This is a convenient plug for Chuck's and my revised proposal for
>'pure procedures' for use in FORALL (which replaces the section on
>'ELEMENTAL Functions' in the draft document 'Tentative Proposal for 
>FORALL and INDEPENDENT', which had some technical problems), which 
>will be posted later today!
>
>
>>                                      ... David Loveman
>> pointed out that ...                    ...     ELEMENTAL
>> functions had been dropped from Fortran 90 for requiring too
>> much support from an already large language.
>
>I wasn't party to the F90 discussions so I don't know the
>basis for this claim, but I'm surprised.  As far as I can tell
>from considering the possibility of user-defined elemental functions
>in 'ADAPT', they require no extra mechanisms or support whatsoever -- 
>they appear to be simplicity itself to implement.
>
>
>>   Marina Chen began the discussion by asking, "What about
>> Alan Karp's comments?" ...
>>                   ... Karp's next criticism was that FORALL
>> is dangerous because statements in it have different
>> semantics. For example, the statement
>>           A(I) = A(I) + A(I-1)
>> has a markedly different affect inside DO and FORALL loops;
>> one is a prefix sum operation, while the other is a shift
>> and add.
>
>This seems to me to be a spurious argument.  The above statement can 
>have either meaning in a DO-loop, depending on whether the DO-loop stride 
>is positive or negative.
>
>If this argument has any validity all, it seems to be an argument
>against DO-loops and in favour of FORALL, as the interpretation
>of the statement in FORALL is independent of the FORALL stride, and is
>the same as for the unadorned statement (i.e. not in any constructs), 
>viz., use the old values on the rhs.
> 
>

>Best regards,
>       John.
>
>


From chk@erato.cs.rice.edu  Mon Aug 31 14:27:00 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA12881); Mon, 31 Aug 92 14:27:00 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA09008); Mon, 31 Aug 92 14:26:52 CDT
Message-Id: <9208311926.AA09008@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Word-Of-The-Day: contentious : (adj) showing a marked and wearisome
	tendency to argue
Subject: New proposal draft
Date: Mon, 31 Aug 92 14:26:47 -0500
From: chk@erato.cs.rice.edu


This is the latest draft I have.  It makes several cosmetic
changes, and has the PURE function proposal that John Merlin made
earlier today.  (Actually, it has three changes to John's text, made
with his permission - removing the constraints against explicitly
distributed data in PURE functions, and changing a footnote into a
list item.)

Any discussion, new proposals, etc. should reach me by Friday in order
to make it on the agenda for the next HPFF meeting.

						Chuck

%Version of August 5, 1992 - David Loveman, Digital Equipment Corporation

\documentstyle[11pt]{report}
\pagestyle{plain}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin=.25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin=-.5in
\textwidth=6.0in
\textheight=9.0in
\parindent=2em

%the file syntax-macros.tex is physically included below

%syntax-macros.tex

%Version of July 29, 1992 - Guy Steele, Thinking Machines

\newdimen\bnfalign         \bnfalign=2in
\newdimen\bnfopwidth       \bnfopwidth=.3in
\newdimen\bnfindent        \bnfindent=.2in
\newdimen\bnfsep           \bnfsep=6pt
\newdimen\bnfmargin        \bnfmargin=0.5in
\newdimen\codemargin       \codemargin=0.5in
\newdimen\intrinsicmargin  \intrinsicmargin=3em
\newdimen\casemargin       \casemargin=0.75in
\newdimen\argumentmargin   \argumentmargin=1.8in

\def\IT{\it}
\def\RM{\rm}
\let\CHAR=\char
\let\CATCODE=\catcode
\let\DEF=\def
\let\GLOBAL=\global
\let\RELAX=\relax
\let\BEGIN=\begin
\let\END=\end


\def\FUNNYCHARACTIVE{\CATCODE`\a=13 \CATCODE`\b=13 \CATCODE`\c=13 \CATCODE`\d=13
		     \CATCODE`\e=13 \CATCODE`\f=13 \CATCODE`\g=13 \CATCODE`\h=13
		     \CATCODE`\i=13 \CATCODE`\j=13 \CATCODE`\k=13 \CATCODE`\l=13
		     \CATCODE`\m=13 \CATCODE`\n=13 \CATCODE`\o=13 \CATCODE`\p=13
		     \CATCODE`\q=13 \CATCODE`\r=13 \CATCODE`\s=13 \CATCODE`\t=13
		     \CATCODE`\u=13 \CATCODE`\v=13 \CATCODE`\w=13 \CATCODE`\x=13
		     \CATCODE`\y=13 \CATCODE`\z=13 \CATCODE`\[=13 \CATCODE`\]=13
                     \CATCODE`\-=13}

\def\RETURNACTIVE{\CATCODE`\
=13}

\makeatletter
\def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus 
 -.2ex}{2.3ex plus .2ex}{\large\sf}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-3.25ex plus -1ex minus 
 -.2ex}{1.5ex plus .2ex}{\large\sf}}

\def\@ifpar#1#2{\let\@tempe\par \def\@tempa{#1}\def\@tempb{#2}\futurelet
    \@tempc\@ifnch}

\def\?#1.{\begingroup\def\@tempq{#1}\list{}{\leftmargin\intrinsicmargin}\relax
  \item[]{\bf\@tempq.} \@intrinsictest}
\def\@intrinsictest{\@ifpar{\@intrinsicpar\@intrinsicdesc}{\@intrinsicpar\relax}}
\long\def\@intrinsicdesc#1{\list{}{\relax
  \def\@tempb{ Arguments}\ifx\@tempq\@tempb
			  \leftmargin\argumentmargin
			  \else \leftmargin\casemargin \fi
  \labelwidth\leftmargin  \advance\labelwidth -\labelsep
  \parsep 4pt plus 2pt minus 1pt
  \let\makelabel\@intrinsiclabel}#1\endlist}
\long\def\@intrinsicpar#1#2\\{#1{#2}\@ifstar{\@intrinsictest}{\endlist\endgroup}}
\def\@intrinsiclabel#1{\setbox0=\hbox{\rm #1}\ifnum\wd0>\labelwidth
  \box0 \else \hbox to \labelwidth{\box0\hfill}\fi}
\def\Case(#1):{\item[{\it Case (#1):}]}
\def\ {\@ifnextchar({\def\@tempq{#1}\@intrinsicopt}{\item[#1]}}
\def\@intrinsicopt(#1){\item[{\@tempq} (#1)]}

\def\MATRIX#1{\relax
    \@ifnextchar,{\@MATRIXTABS{}#1,\@FOO, \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar;{\@MATRIXTABS{}#1,\@FOO; \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar:{\@MATRIXTABS{}#1,\@FOO: \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar.{\hfill\penalty1\null\penalty10000\hskip0pt plus 1filll
		  \@MATRIXTABS{}#1,\@FOO.\penalty-50\@gobble
  }{\@MATRIXTABS{}#1,\@FOO{ }\hskip0pt plus 1filll\penalty-1}}}}}

\def\@MATRIXTABS#1#2,{\@ifnextchar\@FOO{\@MATRIX{#1#2}}{\@MATRIXTABS{#1#2&}}}
\def\@MATRIX#1\@FOO{\(\left[\begin{array}{rrrrrrrrrr}#1\end{array}\right]\)}

\def\@IFSPACEORRETURNNEXT#1#2{\def\@tempa{#1}\def\@tempb{#2}\futurelet\@tempc\@ifspnx}

{
\FUNNYCHARACTIVE
\GLOBAL\DEF\FUNNYCHARDEF{\RELAX
    \DEFa{{\IT\CHAR"61}}\DEFb{{\IT\CHAR"62}}\DEFc{{\IT\CHAR"63}}\RELAX
    \DEFd{{\IT\CHAR"64}}\DEFe{{\IT\CHAR"65}}\DEFf{{\IT\CHAR"66}}\RELAX
    \DEFg{{\IT\CHAR"67}}\DEFh{{\IT\CHAR"68}}\DEFi{{\IT\CHAR"69}}\RELAX
    \DEFj{{\IT\CHAR"6A}}\DEFk{{\IT\CHAR"6B}}\DEFl{{\IT\CHAR"6C}}\RELAX
    \DEFm{{\IT\CHAR"6D}}\DEFn{{\IT\CHAR"6E}}\DEFo{{\IT\CHAR"6F}}\RELAX
    \DEFp{{\IT\CHAR"70}}\DEFq{{\IT\CHAR"71}}\DEFr{{\IT\CHAR"72}}\RELAX
    \DEFs{{\IT\CHAR"73}}\DEFt{{\IT\CHAR"74}}\DEFu{{\IT\CHAR"75}}\RELAX
    \DEFv{{\IT\CHAR"76}}\DEFw{{\IT\CHAR"77}}\DEFx{{\IT\CHAR"78}}\RELAX
    \DEFy{{\IT\CHAR"79}}\DEFz{{\IT\CHAR"7A}}\DEF[{{\RM\CHAR"5B}}\RELAX
    \DEF]{{\RM\CHAR"5D}}\DEF-{\@IFSPACEORRETURNNEXT{{\CHAR"2D}}{{\IT\CHAR"2D}}}}
}

%%% Warning!  Devious return-character machinations in the next several lines!
%%%           Don't even *breathe* on these macros!
{\RETURNACTIVE\global\def\RETURNDEF{\def
{\@ifnextchar\FNB{}{\@stopline\@ifnextchar
{\@NEWBNFRULE}{\penalty\@M\@startline\ignorespaces}}}}\global\def\@NEWBNFRULE
{\vskip\bnfsep\@startline\ignorespaces}\global\def\@ifspnx{\ifx\@tempc\@sptoken \let\@tempd\@tempa \else \ifx\@tempc
\let\@tempd\@tempa \else \let\@tempd\@tempb \fi\fi \@tempd}}
%%% End of bizarro return-character machinations.

\def\IS{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth  \hskip-\bnfalign
  \hbox to \bnfalign{\unhbox\@curfield\hfill}\hbox to \bnfopwidth{\bf is \hfill}}
\def\OR{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth \hbox to \bnfopwidth{\bf or \hfill}}
\def\R#1 {\hbox to 0pt{\hskip-\bnfmargin R#1\hfill}}
\def\XBNF{\FUNNYCHARDEF\FUNNYCHARACTIVE\RETURNDEF\RETURNACTIVE
  \def\@underbarchar{{\char"5F}}\tt\frenchspacing
  \advance\@totalleftmargin\bnfmargin \tabbing
  \hskip\bnfalign\hskip\bnfopwidth\hskip\bnfindent\=\kill\>\+\@gobblecr}
\def\endXBNF{\-\endtabbing}

\def\BNF{\BEGIN{XBNF}}
\def\FNB{\END{XBNF}}

\begingroup \catcode `|=0 \catcode`\\=12
|gdef|@XCODE#1\EDOC{#1|endtrivlist|end{tt}}
|endgroup

\def\CODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces \@XCODE}
\def\ICODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces
   \FUNNYCHARDEF\FUNNYCHARACTIVE \UNDERBARACTIVE\UNDERBARDEF \@XCODE}

\def\@underbarsub#1{{\ifmmode _{#1}\else {$_{#1}$}\fi}}
\let\@underbarchar\_
\def\@underbar{\let\@tempq\@underbarsub\if\@tempz A\let\@tempq\@underbarchar\fi
  \if\@tempz B\let\@tempq\@underbarchar\fi\if\@tempz C\let\@tempq\@underbarchar\fi
  \if\@tempz D\let\@tempq\@underbarchar\fi\if\@tempz E\let\@tempq\@underbarchar\fi
  \if\@tempz F\let\@tempq\@underbarchar\fi\if\@tempz G\let\@tempq\@underbarchar\fi
  \if\@tempz H\let\@tempq\@underbarchar\fi\if\@tempz I\let\@tempq\@underbarchar\fi
  \if\@tempz J\let\@tempq\@underbarchar\fi\if\@tempz K\let\@tempq\@underbarchar\fi
  \if\@tempz L\let\@tempq\@underbarchar\fi\if\@tempz M\let\@tempq\@underbarchar\fi
  \if\@tempz N\let\@tempq\@underbarchar\fi\if\@tempz O\let\@tempq\@underbarchar\fi
  \if\@tempz P\let\@tempq\@underbarchar\fi\if\@tempz Q\let\@tempq\@underbarchar\fi
  \if\@tempz R\let\@tempq\@underbarchar\fi\if\@tempz S\let\@tempq\@underbarchar\fi
  \if\@tempz T\let\@tempq\@underbarchar\fi\if\@tempz U\let\@tempq\@underbarchar\fi
  \if\@tempz V\let\@tempq\@underbarchar\fi\if\@tempz W\let\@tempq\@underbarchar\fi
  \if\@tempz X\let\@tempq\@underbarchar\fi\if\@tempz Y\let\@tempq\@underbarchar\fi
  \if\@tempz Z\let\@tempq\@underbarchar\fi\@tempq}
\def\@under{\futurelet\@tempz\@underbar}

\def\UNDERBARACTIVE{\CATCODE`\_=13}
\UNDERBARACTIVE
\def\UNDERBARDEF{\def_{\protect\@under}}
\UNDERBARDEF

\catcode`\$=11  

%the following line would allow derived-type component references 
%FOO%BAR in running text, but not allow LaTeX comments
%without this line, write FOO\%BAR
%\catcode`\%=11 

\makeatother

%end of file syntax-macros.tex

\title{{\em Tentative Proposal} \\ High Performance Fortran \\ 
FORALL and INDEPENDENT Proposal}
\author{FORALL Subgroup, High Performance Fortran Forum}
\date{August 31, 1992}

\hyphenation{RE-DIS-TRIB-UT-ABLE sub-script Wil-liam-son}

\begin{document}

\maketitle

\newpage

\pagenumbering{roman}

\vspace*{4.5in}

This is the result of a LaTeX run of a draft of a single chapter of 
the HPFF Language Specification document.

\vspace*{3.0in}

\copyright 1992 Rice University, Houston Texas.  Permission to copy 
without fee all or part of this material is granted, provided the 
Rice University copyright notice and the title of this document 
appear, and notice is given that copying is by permission of Rice 
University.

\tableofcontents

\newpage

\pagenumbering{arabic}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%put text of chapter here


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%put \end{document} here

%statements.tex

%Version of August 2, 1992 - David Loveman, Digital Equipment Corporation 
%	and Chuck Koelbel, Rice University

%Revision history:
%August 19, 1992 - chk - cleaned up discrepancies with Fortran 90 array 
%	expressions
%August 20, 1992 - chk - added DO INDEPENDENT section, Guy Steele's 
%	pointer proposals
%August 24, 1992 - chk - ELEMENTAL functions proposal
%August 31, 1992 - chk - PURE functions proposal

\chapter{Statements\protect\footnote{Version of August 2, 1992 - David
Loveman, Digital Equipment Corporation and Chuck Koelbel, Rice University}}
\label{statements}

\section{Overview}

The purpose of the FORALL construct is to provide a convenient syntax for 
simultaneous assignments to large groups of array elements.
In this respect it is very similar to the functionality provided by array 
assignments and WHERE constructs.
FORALL differs from these constructs primarily in its syntax, which is 
intended to be more suggestive of local operations on each element of an 
array.
It is also possible to specify slightly more general array regions than 
are allowed by the basic array triplet notation.
Both single-statement and block FORALLs are defined in this proposal.

Recent discussions within the FORALL subgroup have made it clear that 
some are opposed to the construct on the grounds that it is unnecessary 
and perhaps confusing.
It is, however, clear that others in the group strongly support the added 
expressiveness provided by FORALL.
Because of this disagreement, the committee recommends that if FORALL is 
accepted into HPF, that it not be included in the official subset.
We feel, however, that it is important to define the construct so that 
implementations with this functionality have consistent semantics.

The following proposal is designed as a modification of the Fortran 90 
standard; all references to rule numbers and section numbers pertain to 
that document unless otherwise noted.


\section{Element Array Assignment - FORALL\protect\footnote{Version of 
August 20, 1992 - David
Loveman, Digital Equipment Corporation and Chuck Koelbel, Rice University}}  

\label{forall-stmt}

The element array
assignment statement is used to specify an array assignment in terms of
array elements or array sections.  The element array assignment may be
masked with a scalar logical expression.  Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsection{General Form of Element Array Assignment}

                                                                       \BNF
forall-stmt          \IS FORALL (forall-triplet-spec-list 
                           [,scalar-mask-expr ]) forall-assignment

forall-triplet-spec  \IS subscript-name = subscript : subscript 
                           [ : stride]
                                                                       \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

                                                                       \BNF
forall-assignment    \IS array-element = expr
					 \OR array-element => target
                     \OR array-section = expr
                                                                       \FNB

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: In the cases of simple assignment, the {\it array-element} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent
Constraint: In the case of pointer assignment, the {\it array-element} 
and {\it target} have the same constraints as the {\it pointer-object} 
and {\it target}, respectively, in a {\it pointer-assignment-stmt}.

\noindent
Constraint: In the cases of array section assignment, the {\it 
array-section} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.


For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(INT((m2 -m1 + m3) / m3) \leq 0\), the {\it forall-assignment} is not
executed.

Examples of element array assignments are:

                                                                  \CODE
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, A(I,J) .NE. 0.0) B(I,J) = 1.0 / A(I,J)

TYPE MONARCH
    INTEGER, POINTER :: P
END TYPE MONARCH
TYPE(MONARCH) :: A(N)
INTEGER B(N)
      ...
! Set up a butterfly pattern
FORALL (J=1:N)  A(J)%P => B(1+IEOR(J-1,2**K))

                                                                  \EDOC 

\subsection{Interpretation of Element Array Assignments}  

Execution of an element array assignment consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Evaluation in any order of the {\it expr} or {\it target} and all 
subscripts contained in the 
{\it array-element} or {\it array-section} in the {\it forall-assignment} 
for all active combinations of {\em subscript-name} values.
In the case of pointer assignment where the {\it target} is not a 
pointer, the evaluation consists of identifying the object referenced 
rather than computing its value.

\item Assignment of the computed {\it expr} values to the corresponding 
elements specified by {\it array-element} or {\it array-section}.
In the case of a pointer assignment where the {\it target} is not a 
pointer, this assignment consists of associating the {\it array-element} 
with the object referenced.

\end{enumerate}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.
Similarly, the evaluations of {\em expr}, {\it array-element} or {\it 
array-section} may not cause any scalar data object to be assigned a 
value more than once, nor may they cause an array element to be assigned 
which is also assigned directly by the {\it forall-assignment}.
Local variables within two instantiations of the same function do not 
refer to the same data object unless they have the SAVE attribute or are 
sequence or storage associated with the same object.
 
The evaluation of the {\it expr} for a particular active combination of
{\it subscript-name} values may neither affect nor be affected by 
the evaluation of {\it expr} for any other combination of {\it 
subscript-name} values.
In particular, functions cannot produce side effects that are visible in 
the FORALL; 
nor may global variables be updated by functions unless the results of 
those updates are independent of the order of execution of {\it 
subscript-name} values.
The evaluation of the {\it expr} or any subscript on the left-hand side 
of the {\it forall-assignment} for any active combination of {\it 
subscript-name} values may not affect 
nor be affected by 
the evaluation of any subscript in the {\it forall-assignment}, either 
for the same 
combination of {\it subscript-name} values or a different active
combination.
In particular, a function reference
appearing in any expression in the {\it forall-assignment} must not
redefine any subscript name.


\subsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

                                                                   \CODE
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn , mask ) &
      a(e1,...,em) = rhs
                                                                   \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
	  ...
  END DO
END DO

!then evaluate the expr in the forall-assignment for all valid 
!combinations of subscript names for which the scalar mask 
!expression is true (it is safe to avoid saving the subscript 
!expressions because of the conditions on FORALL expressions)

DO v1=templ1,tempu1,temps1
  DO v2=tel2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
	  ...
  END DO
END DO

!then perform the assignment of these values to the corresponding 
!elements of the array being assigned to

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(e1,...,em) = temprhs(v1,v2,...,vn)
        END IF
      END DO
	  ...
  END DO
END DO
                                                                      \EDOC

\subsubsection{Consequences of the Definition of the FORALL Statement}

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  
(This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Right-hand sides and subscripts on the left hand side of a {\it 
forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\item Side-effects of function calls are allowed 
under the usual condition of Fortran statements that order of evaluation 
of subexpressions is undefined.
This principle has been extended so that side effects in computing one 
array element cannot affect other array elements.

\item Side-effects cannot assign to the same global array element or 
element of a function's actual argument twice.
This effectively disallows accumulations in function calls (including 
recording how many times a function is called).
It is still legal, however, to create side effects on {\it different} 
global array elements.

\item I am unable to find a clear answer to the following in the 
Fortran~90 standard: Can the evaluation of a function affect the values 
of global objects not referenced in the calling statement?
For example, is the following legal?
                                                                 \CODE
X = F(1) + F(2)
...
REAL FUNCTION F(K)
COMMON /HUH/ ICOUNT
ICOUNT = ICOUNT+1
F = K
END
																 \EDOC
The assignments to ICOUNT do not affect the values computed by F, and the 
final value of ICOUNT is mathematically independent of the order of 
evaulation here.
The constraints on FORALL evaluation above would not allow F to be called 
from a FORALL; if this is inconsistent with array expressions, the 
proposal should be amended or a note of the inconsistency made in the text.

\item Distinct function instantiations explicitly have distinct sets of 
local variables, to remove ambiguity about whether the following is legal:
                                                                  \CODE
FORALL ( I = 1:N ) A(I) = FOO( I )
...
INTEGER FUNCTION FOO( I )
INTEGER I, J, K
J = 1
K = I
DO WHILE ( K .GT. 1 )
  J = J+1
  IF (MOD(K,2) .EQ. 0) THEN
    K = K / 2
  ELSE
    K = K * 3 + 1
  END IF
END DO
FOO = K
END
                                                                 \EDOC 
Assuming distinct function calls have their own variables, there are no 
side effects to any global variable.
This is consistent (some might argue implied by) Section~12.5.2.4 of the 
Fortran~90 standard.
I don't claim this is particularly easy to implement on all machines.

\item This proposal is mute on whether I/O is allowed in functions called 
from FORALL statements.
This could, and probably should, be added as a constraint in the 
interpretation section.

\end{itemize}


\section{FORALL Construct\protect\footnote{Version of August 20, 1992 - David
Loveman, Digital Equipment Corporation and Chuck Koelbel, Rice University}}

\label{forall-construct}

The FORALL construct is a generalization of the element array
assignment statement allowing multiple assignments, masked array 
assignments, and nested FORALL statements to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it
forall-construct}.

\subsubsection{General Form of the FORALL Construct}

                                                                    \BNF
forall-construct        \IS FORALL (forall-triplet-spec-list 
                                  [,scalar-mask-expr ])
                               forall-body-stmt-list
                            END FORALL

forall-body-stmt     \IS forall-assignment
                     \OR where-stmt
                     \OR forall-stmt
                     \OR forall-construct
                                                                    \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\noindent
Constraint:  Any left-hand side {\it array-section} or {\it 
array-element} in any {\it forall-body-stmt}
must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
within a {\it forall-construct}, then the inner FORALL may not redefine 
any {\it subscript-name} used in the outer {\it forall-construct}.
This rule applies recursively in the event of multiple nesting levels.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., INT((m2 - m1 + m3) / m3)  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(INT((m2 -m1 + m3) / m3) \leq 0\), the {\it forall-assignment}s are not 
executed.

Examples of the FORALL construct are:

                                                                 \CODE
FORALL ( i = 2:n-1, j = 2:i-1 )
  a(i,j) = a(i,j-1) + a(i,j+1) + a(i-1,j) + a(i+1,j)
  b(i,j) = a(i,j)
END FORALL

FORALL ( i = 1:n-1 )
  FORALL ( j = i+1:n )
    a(i,j) = a(j,i)
  END FORALL
END FORALL

FORALL ( i = 1:n, j = 1:n )
  a(i,j) = MERGE( a(i,j), a(i,j)**2, i.eq.j )
  WHERE ( .not. done(i,j,1:m) )
    b(i,j,1:m) = b(i,j,1:m)*x
  END WHERE
END FORALL
								
		                                                       
         \EDOC


\subsection{Interpretation of the FORALL Construct}  

Execution of a FORALL construct consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Execute the {\it forall-body-stmts} in the order they appear.
Each statement is executed completely (that is, for all active 
combinations of {\it subscript-name} values) according to the following 
interpretation:

\begin{enumerate}

\item Assignment statements and array assignment statements (i.e. 
statements in the {\it forall-assignment} category) evaluate the 
right-hand side {\it expr} and any left-and side subscripts for all 
active {\it subscript-name} values,
then assign those results to the corresponding left-hand side references.

\item WHERE statements evaluate their {\it mask-expr} for all active 
combinations of values of {\it subscript-name}s.
All elements of all masks may be evaluated in any order. 
The assignments within the WHERE branch of the statement are then 
executed in order using the above interpretation of array assignments 
within the FORALL, but the only array elements assigned are those 
selected by both the active {\it subscript-names} and the WHERE mask.
Finally, the assignments in the ELSEWHERE branch are executed if that 
branch is present.
The assignments here are also treated as array assignments, but elements 
are only assigned if they are selected by both the active combinations 
and by the negation of the WHERE mask.

\item FORALL statements and FORALL constructs first evaluate the 
subscript and stride expressions in 
the {\it forall-triplet-spec-list} for all active combinations of the 
outer FORALL constructs.
The set of valid combinations of {\it subscript-names} for the inner 
FORALL is then the union of the sets defined by these bounds and strides 
for each active combination of the outer {\it subscript-names}.
For example, the valid set of the inner FORALL in the second example in 
the last section is the upper triangle (not including the main diagonal) 
of the \(n \times n\) matrix a.
The scalar mask expression is then evaluated for all valid combinations 
of the inner FORALL's {\it subscript-names} to produce the set of active 
combinations.
If there is no scalar mask expression, it is assumed to be always true.
Each statement in the inner FORALL is then executed for each valid 
combination (of the inner FORALL), recursively following the 
interpretations given in this section.

\end{enumerate}

\end{enumerate}

If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL construct itself. 

A single assignment or array assignment statement in a {\it 
forall-construct} must obey the same restrictions as a {\it 
forall-assignment} in a simple {\it forall-stmt}.
(Note that the lowest level of nested statements must always be an 
assignment statement.)
For example, an assignment may not cause the same array element to be 
assigned more than once.
It is, however, permitted that different statements may assign to the 
same array element, or that the evaluation of subexpressions in one 
statement affect the execution of a later statement.
Evaluation of the mask or subscript bounds and stride 
expressions in an inner WHERE or FORALL for one active combination of 
{\it subscript-name} values may not affect nor be affected by the 
evaluations of those subexpressions for any other active combination.

\subsection{Scalarization of the FORALL Construct}

A {\it forall-construct} othe form:

                                                                \CODE
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
                                                                \EDOC

where each si is an assignment is equivalent to the following scalar code:

                                                                \CODE
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
                                                                \EDOC

A similar statement can be made using FORALL constructs when the 
si may be WHERE or FORALL constructs.

A {\it forall-construct} of the form:

                                                                \CODE
FORALL ( v1=l1:u1:s1, mask )
  WHERE ( mask2(l2:u2) )
    a(vi,l2:u2) = rhs1
  ELSEWHERE
    a(vi,l2:u2) = rhs2
  END WHERE
END FORALL
                                                                \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the masks for the WHERE

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    tmpl2(v1) = l2
    tmpu2(v1) = u2
    tempmask2(v1,tmpl2(v1):tmpu2(v1)) = mask2(tmpl2(v1):tmpu2(v1))
  END IF
END DO

!then evaluate the WHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs1(v1,tmpl2(v1):tmpu2(v1)) = rhs1
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs1(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO

!then evaluate the ELSEWHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs2(v1,tmpl2(v1):tmpu2(v1)) = rhs2
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs2(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO
                                                                   \EDOC


A {\it forall-construct} of the form:

                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  FORALL ( v2=l2:u2:s2, mask2 )
    a(e1,e2) = rhs1
	b(e3,e4) = rhs2
  END FORALL
END FORALL
                                                                   \EDOC

is equivalent to the following standard Fortran 90 code:


                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the inner FORALL bounds, etc

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    templ2(v1) = l2
    tempu2(v1) = u2
    temps2(v1) = s2
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      tempmask2(v1,v2) = mask2
	END DO
  END IF
END DO

!first statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        temprhs1(v1,v2) = rhs1
	  END IF
	END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        a(e1,e2) = temprhs1(v1,v2)
	  END IF
	END DO
  END IF
END DO

!second statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        temprhs2(v1,v2) = rhs2
	  END IF
	END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
	DO v2 = templ2(v1),tempu2(v1),temps2(v1)
	  IF ( tempmask2(v1,v2) ) THEN
        b(e3,e4) = temprhs2(v1,v2)
	  END IF
	END DO
  END IF
END DO
                                                                   \EDOC


\subsubsection{Consequences of the Definition of the FORALL Construct}

\begin{itemize}

\item A block FORALL means the same as replicating the FORALL
header in front of each array assignment statement in the block, except
that any expressions in the FORALL header are evaluated only once,
rather than being re-evaluated before each of the statements in the body.
(This statement needs some modification in the case of nesting.)

\item One may think of a block FORALL as synchronizing twice per
contained assignment statement: once after handling the rhs and other 
expressions
but before performing assignments, and once after all assignments have
been performed but before commencing the next statement.  (In practice,
appropriate dependence analysis will often permit the compiler to
eliminate unnecessary synchronizations.)

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item In general, any expression in a FORALL is evaluated only for valid 
combinations of all surrounding subscript names for which all the
scalar mask eressions are true.

\item Nested FORALL bounds and strides can depend on outer FORALL {\it 
subscript-names}.  They cannot redefine those names, even temporarily (if 
they did there  would be no way to avoid multiple assignments to the same 
array element).

\item Dependences are allowed from one statement to later statements, but 
never from an assignment statement to itself.
Masks and subscript bounds could conceivably have side effects visible in 
the rest of the nested statement.

\end{itemize}


\section{The INDEPENDENT Directive\protect\footnote{Version of August 20, 1992
 - Guy Steele, Thinking Machines Corporation, and Chuck Koelbel, Rice University}}

\label{do-independent}


Let there be a directive
                                                  \CODE
!HPF$INDEPENDENT
                                                  \EDOC
that can precede a DO loop.
It asserts to the compiler that the iterations of the loop
may be executed independently--that is, in any order, or
interleaved, or concurrently--without changing the semantics
of the program.  (The compiler is justified in producing
a warning if it can prove otherwise.)
                                                  \CODE
!HPF$INDEPENDENT
      DO I=1,100
        A(P(I))=B(I)   !I happen to know that P is a permutation
      END DO
                                                  \EDOC

One may apply this directive to a nest of multiple loops
by listing all the loop variables of the loops in question;
the loops must be contiguous with the directive and in the
same order that the variables are listed:
                                                  \CODE
!HPF$INDEPENDENT (I1,I2,I3)
      DO I1 = ...
        DO I2 = ...
          DO I3 = ...
            DO I4 = ...    !The inner two loops are *not* independent!
              DO I5 = ...
                ...
              END DO
            END DO
          END DO
        END DO
      END DO
                                                  \EDOC

These directives are purely advisory and a compiler is free
to ignore them if it cannot make use of the information.

This directive is of course similar to the DOSHARED directive
of Cray MPP Fortran.  A different name is offered here to avoid
even the hint of commitment to execution by a shared memory machine.
Also, the "mechanism" syntax is omitted here, though we might want
to adopt it as further advice to the compiler about appropriate
implementation strategies, if we can agree on a desirable set
of options.


\section{Other Proposals}

The proposals in this section have not been approved, even as a first 
reading.
Sections~\ref{begin-independent}, \ref{forall-pointer}, 
\ref{forall-allocate}, and \ref{data-ref} 
extend parts of the previous sections and/or the Fortran~90 standard.
Section~\ref{forall-elemental} is an alternative to the treatment of 
function calls in Sections~\ref{forall-stmt} and~\ref{forall-construct}.

\subsection{FORALL with INDEPENDENT Directives\protect\footnote{Version 
of July 21, 1992) - Min-You Wu}}
\label{begin-independent}

This proposal is an extension of Guy Steele's INDEPENDENT proposal.
We propose a block FORALL with the directives for independent 
execution of statements.  The INDEPENDENT directives are used
in a block style.  

The block FORALL is in the form of
                                                         \CODE
      FORALL (...) [ON (...)]
        a block of statements
      END FORALL
                                                         \EDOC
where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:
                                                         \CODE
!HPF$BEGIN INDEPENDENT
!HPF$END INDEPENDENT
                                                         \EDOC
The two directives must be used in pair.  
A sub-block of statements 
parenthesized in the two directives is called an {\em asynchronous} 
sub-block or {\em independent} sub-block.  
The statements that are 
not in an asynchronous sub-block are in {\em synchronized} sub-blocks
or {\em non-independent} sub-block.  
The synchronized sub-block is 
the same as Guy Steele's synchronized FORALL statement, and the 
asynchronous sub-block is the same as the FORALL with the INDEPENDENT 
directive.  
Thus, the block FORALL
                                                          \CODE
      FORALL (e)
        b1
!HPF$BEGIN INDEPENDENT
        b2
!HPF$END INDEPENDENT
        b3
      END FORALL
                                                           \EDOC
means roughly the same as
                                                           \CODE
      FORALL (e)
        b1
      END FORALL
!HPF$INDEPENDENT
      FORALL (e)
        b2
      END FORALL
      FORALL (e)
        b3
      END FORALL
                                                          \EDOC
														  
Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronous sub-block are completely independent.
The INDEPENDENT directives indicates to the compiler there is no 
dependence and consequently, synchronizations are not necessary.
It is users' responsibility to ensure there is no dependence
between instances in an asynchronous sub-block.
A compiler can do dependence analysis for the asynchronous sub-blocks
and issue an error message when there exists a dependence or a warning
when it finds a possible dependence.

\subsubsection{What does ``no dependence between instances" mean?}

It means that no true dependence, anti-dependence,
or output dependence between instances.
Examples of these dependences are shown below:
\begin{enumerate}
\item True dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i) = ... 
        ...  = x(i+1)
      END FORALL
                                                            \EDOC
Notice that dependences in FORALL are different from that in a DO loop.
If the above example was a DO loop, that would be an anti-dependence.

\item Anti-dependence:
                                                            \CODE
      FORALL (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END FORALL
                                                            \EDOC

\item Output dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i+1) = ... 
        x(i) = ...
      END FORALL
                                                            \EDOC
\end{enumerate}

Independent does not imply no communication.  One instance may access 
data in the other instances, as long as it does not cause a dependence.  
The following example is an independent block:
                                                            \CODE
      FORALL (i = 1:N)
!HPF$BEGIN INDEPENDENT
        x(i) = a(i-1)
        y(i-1) = a(i+1)
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC

\subsubsection{Statements that can appear in FORALL}

FORALL statements, WHERE-ELSEWHERE statements, some intrinsic functions 
(and possibly elemental functions and subroutines) can appear in the FORALL:
\begin{enumerate}
\item FORALL statement
                                                            \CODE
      FORALL (I = 1 : N)
        A(I,0) = A(I-1,0)
        FORALL (J = 1 : N)
!HPF$BEGIN INDEPENDENT
          A(I,J) = A(I,0) + B(I-1,J-1)
          C(I,J) = A(I,J)
!HPF$END INDEPENDENT
        END FORALL
      END FORALL
                                                            \EDOC

\item WHERE
                                                            \CODE
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
\end{enumerate}


\subsubsection{Rationale}

\begin{enumerate}
\item A FORALL with a single asynchronous sub-block as shown below is 
the same as a do independent (or doall, or doeach, or parallel do, etc.).
                                                            \CODE
      FORALL (e)
!HPF$BEGIN INDEPENDENT
        b1
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
A FORALL without any INDEPENDENT directive is the same as a tightly 
synchronized FORALL.  We only need to define one type of parallel 
constructs including both synchronized and asynchronous blocks.  
Furthermore, combining asynchronous and synchronized FORALLs, we 
have a loosely synchronized FORALL which is more flexible for many 
loosely synchronous applications.

\item With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  The INDEPENDENT directives can act 
as barrier synchronizations.  One may suggest a smart compiler 
that can recognize dependences and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependences.  
When the compiler cannot determine whether there is a dependence, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.
\end{enumerate}


\subsection{ALLOCATE in FORALL\protect\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation}}

\label{forall-allocate}


Proposal:  ALLOCATE, DEALLOCATE, and NULLIFY statements may appear
	in the body of a FORALL.

Rationale: these are just another kind of assignment.  They may have
	a kind of side effect (storage management), but it is a
	benign side effect (even milder than random number generation).

Example:
                                                            \CODE
      TYPE SCREEN
        INTEGER, POINTER :: P(:,:)
      END TYPE SCREEN
      TYPE(SCREEN) :: S(N)
      INTEGER IERR(N)
      ...
!  Lots of arrays with different aspect ratios
      FORALL (J=1:N)  ALLOCATE(S(J)%P(J,N/J),STAT=IERR(J))
      IF(ANY(IERR)) GO TO 99999
                                                            \EDOC
\subsection{Generalized Data References\protect\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation}}

\label{data-ref}

Proposal:  Delete the constraint in section 6.1.2 of the Fortran 90
	standard (page 63, lines 7 and 8):
\begin{quote}
	Constraint: In a data-ref, there must not be more than one
		part-ref with nonzero rank.  A part-name to the right
		of a part-ref with nonzero rank must not have the
		POINTER attribute.
\end{quote}

Rationale: further opportunities for parallelism.

Example:
                                                                     \CODE
      TYPE(MONARCH) :: C(N), W(N)
      ...
C  Munch that butterfly
      C = C + W * A%P		!Currently illegal in Fortran 90
                                                                      \EDOC
																	  
\subsection{ELEMENTAL Functions\protect\footnote{Version of August 31, 
1992 - John Merlin, University of Southhampton, and Chuck Koelbel, Rice 
University}}

\label{forall-elemental}


The intent of this proposal is to define 'pure' (i.e. side-effect
free) functions which may be used freely and safely in FORALL - and
in any normal F90 context - and whose dummy arguments and results
can be array-valued.  A separate aspect of the proposal is that pure
procedures may be used *elementally* (i.e. like F90 elemental
intrinsic procedures) *provided* their arguments and result satisfy 
the additional constraint that they are *scalar*.  This avoids
introducing into Fortran the controversial concept of 'array-of-arrays'
(which Fortran seems to assiduously avoid) present in my original 
proposal, and still allows the flexibility of array-valued functions 
in FORALL.  Furthermore, it does not reduce the functionality
(at least for functions) -- if the programmer wants the effect of
elemental invocation of functions that have array-valued dummy args 
or result, he can obtain this with FORALL.

I commend this proposal to the house!

(BTW, I'm going on holiday now until Sept 21, so won't be able to 
respond to any questions or comments until after the next HPF meeting. 
I hope for good news about 'pure' procedures when I return! :-)
Have a good meeting!)

              John Merlin.
----------------------------------------------------------------------

\subsection{PURE Procedures and Elemental Invocation\protect\footnote{Version 
of August 28, 
1992 - John Merlin, University of Southampton, and Chuck Koelbel, Rice 
University}}

\label{forall-elemental}

The intent of this counter-proposal is to further restrict functions 
called from within FORALL so that they have no side effects.
This is more restrictive than the constraints on function calls in 
section \ref{forall-stmt}; however, the definition is simpler and 
presumably clearer, as well as providing complete security against 
non-deterministic behaviour.

A separate aspect of this proposal is to extend the concept of 
`elemental procedures', which in Fortran~90 are restricted to a subset 
of the intrinsic procedures, so that they can be user-defined.


\subsubsection{General Form of Element Array Assignment}

To the definition of {\it forall-assignment}, add the following:
\begin{quotation}

\noindent Constraint: If any subexpression in {\it expr}, {\it 
array-element}, or {\it array-section} is a {\it function-reference}, 
then the {\it function-name} must be a `pure' function as defined below.
\end{quotation}


\subsubsection{Interpretation of Element Array Assignments}  

Change the paragraphs after the step-by-step interpretation to the 
following:
\begin{quotation}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 

The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.
By the nature of PURE functions, no expression evaluations can 
have any affect on other expressions, either for the same combination of 
{\it subscript-name} values or for a different combination.
\end{quotation}
 
\subsubsection{PURE Procedures}

A `pure' procedure is one which produces no side effects, except for 
assigning to dummy arguments of INTENT (OUT) or (INOUT) in the case of a
`pure' subroutine.
It may be used in any way that a normal procedure of its type may 
be used.
In addition, pure fctions may be used in a FORALL statement or construct.
Also, a pure procedure whose dummy arguments (and, in the case of a function,
result) are all scalar may be used `elementally', that is, it may be
applied to conforming array arguments in a similar manner to the elemental
intrinsic procedures defined in Fortran~90.

If a procedure is used in a context that requires it to be pure
(namely, it is used elementally or in a FORALL statement or construct),
then its interface must be explicit, and it must be declared to be pure 
in both its definition and interface.  The form of this declaration is 
a directive preceding the {\it function-stmt\/} or {\it subroutine-stmt\/}:
                                                                 \BNF
pure-directive \IS !HPF$ PURE
                                                                 \FNB

To define pure functions, Rule~R1215 of the Fortran~90 standard is changed 
to:
                                                                 \BNF
function-subprogram \IS [pure-directive]
                        function-stmt
						[specification-part]
						[execution-part]
						[internal-subprogram-part]
						end-function-stmt
                                                                \FNB
with the following additional constraints:

\noindent
Constraint: The dummy arguments of a pure function must have INTENT(IN).

\noindent
Constraint: The local variables of a pure function must not have the SAVE 
attribute.

\noindent
Constraint: A pure function must not contain assignments to global 
data objects.

\noindent
Constraint: A pure function must not contain I/O statements.

\noindent
Constraint: Any procedure called from a pure function must be pure.

To assert that a function is pure, a {\it pure-directive\/} must be given.

\vspace*{1ex}

To define pure subroutines, Rule~R1219 is changed to:
                                                                 \BNF
subroutine-subprogram \IS [pure-directive]
                        subroutine-stmt
						[specification-part]
						[execution-part]
						[internal-subprogram-part]
						end-subroutine-stmt
                                                                \FNB
with the following additional constraints:

\noindent
Constraint: The dummy arguments of a pure subroutine must have explicit
INTENT.

\noindent
Constraint: The local variables of a pure subroutine must not have the SAVE 
attribute.

\noindent
Constraint: A pure subroutine must not contain assignments to global 
data objects.

\noindent
Constraint: A pure subroutine must not contain I/O statements.

\noindent
Constraint: Any procedure called from a pure subroutine must be pure.

To assert that a subroutine is pure, a {\it pure-directive\/} must be given.
\vspace*{1ex}

To define interface specifications for pure procedures, Rule~R1204 is 
changed to:
                                                                \BNF
interface-body \IS [pure-directive]
                   function-stmt
				   [specification-part]
				   end-function-stmt
			   \OR [pure-directive]
                   subroutine-stmt
				   [specification-part]
				   end-subroutine-stmt
                                                                \FNB

\noindent
Constraint: An {\it interface-body\/} of a pure subroutine must specify
the intents of all dummy arguments.

When applied to a procedure interface body, the {\it pure-directive} asserts 
that the procedure satisfies the constraints required of pure procedures.
Because of the limited information provided by an interface specification,
this assertion can only be checked to a limited extent in this context
(i.e. with respect to argument intent and the absence of data mapping 
directives).

If a procedure is used in a context that requires it to be pure
(namely, it is used elementally or in a FORALL statement or construct),
then its interface must be explicit.  If the interface is provided by 
means of an interface block, the {\it interface-body\/} must contain a 
{\it pure-directive\/}.

If an interface body contains a {\it pure-directive\/}, then the 
corresponding procedure definition must also contain it, though the 
reverse is not true.  When a procedure definition with a {\it pure-directive\/}
is compiled, the compiler may check that it satisfies the necessary 
constraints.

\vspace*{1ex}

To define elemental invocations of pure procedures, the following 
extra constraint is added after Rules R1209 ({\it function-reference\/}) 
and R1210 ({\it call-stmt\/}):
\begin{quotation}

\noindent
Constraint: A non-intrinsic function (subroutine) that is invok 
elementally must be a pure function (subroutine) with scalar dummy 
arguments (and result), and its interface must be explicit.
\end{quotation}

Additionally, the beginning of section 12.4.3 should be changed to:
\begin{quotation}

A reference to {\em a pure function or\/} an elemental intrinsic function
is an elemental reference if\ldots
\end{quotation}
and the beginning of section 12.4.5 to:
\begin{quotation}

A reference to {\em a pure subroutine or\/} an elemental intrinsic subroutine
is an elemental reference if\ldots
\end{quotation}
(where the additional words are italicised).


\paragraph{Comments}
Detailed comments on pure procedures:
\begin{itemize}
\item The constraints for a pure procedure guarantee freedom from 
side-effects, thus ensuring that it can be invoked concurrently at each 
`element' of an array (where an `element' may itself be a data-structure, 
including an array).

\item An earlier draft of this proposal contained a constraint disallowing 
pure procedures from accessing global data objects, particularly distributed 
data objects.
This constraint has been dropped as inessential to the side-effect freedom 
that the HPF committee requested.
However, it may well be that some machines will have great difficulty 
implementing FORALL without this constraint.

\item One of us (JHM) is still in favour of disallowing access to global 
variables for a number of reasons: 
\begin{enumerate}
\item Aesthetically, it is in keeping with the
nature of a `pure' function, i.e. a function in the mathematical
sense, and in practical terms it imposes no real restrictions on the 
programmer, as global data can be passed-in via the argument list; 
\item Without this constraint HPF programs can no longer be implemented 
by pure message-passing, or at least not efficiently, i.e. without
sequentialising FORALL statements containing function calls and greatly
complicating their implementation; 
\item Absence of this restriction may inhibit optimisation of FORALLs
and array assignments, as the optimisation of assigning the {\it expr\/}
directly to the assignment variable rather than to a temporary intermediate 
array may now require interprocedural analysis rather than just local 
analysis.  However, JHM does not want this to be a make-or-break point!
\end{enumerate}


\item The constraints are such that a compiler cannot deduce from a 
procedure's interface body whether it can validly be used as a pure 
procedure, as that depends partly on its local declarations and internal 
operations.  Hence, it is necessary to use a specifier like `PURE' in the 
interface body to identify such procedures.  The compiler can 
check that the procedure satisfies all the necessary constraints 
when it compiles the procedure itself (provided it also has the `PURE' 
specifier).

\end{itemize}

As well as using pure functions in FORALL, a pure procedure can also be 
used `elementally', provided it satisfies the additional constraint that 
its dummy arguments (and, in the case of a function, its result) are scalar.

Fortran 90 introduces the concept of `elemental procedures',
which are defined for scalar arguments but may also be applied to
conforming array-valued arguments.  
For an elemental function,
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g SIN(X).
For an elemental subroutine, the effect on each element of an INTENT(OUT) or 
INTENT(INOUT) array argument is would be obtained by calling the 
subroutine with the corresponding elements of the arguments.
An example is the intrinsic subroutine MVBITS.

However, Fortran~90 restricts the application of elemental
procedures to a subset of the intrinsic procedures --- the programmer
cannot define his own.  Obviously, elemental invocation is equivalent to 
concurrent invocation, so extra constraints beyond those for normal
Fortran procedures are required to allow this to be done safely
(e.g. deterministically).  Appropriate constraints in this case are
the same as those for function calls in FORALL---indeed, the latter are 
virtually equivalent to elemental invocation in an array assignment, 
given the close correspondence between FORALL and array assignment.
Hence, we propose that pure procedures may also be invoked elementally,
subject to the additional constraint that their dummy arguments 
(and, for a function, result) are scalar.

Comment:
\begin{itemize}
	\item The original draft proposed allowing pure procedures 
to be invoked elementally even if their dummy arguments or results 
were array-valued.  These provisions have been dropped to avoid 
promoting storage order to a higher level in Fortran~90
(i.e.\ to avoid introducing the concept of `arrays-if-arrays', 
which Fortran~90 seems to strenuously avoid!)   In practical terms,
the current proposal provides the same functionality as the original 
one for functions, though not for subroutines.  If a programmer wants 
elemental function behaviour, but also wants the `elements' to be
array-valued, this can be achieved using FORALL.
	\item In typical FORALL or elemental usage, a pure procedure 
would be called independently in each process, and its dummy arguments 
would be associated with `elements' local to that process.  
This is the reason for disallowing data mapping directives within the 
bodies of such procedures.
Note that, particularly in elemental invocations, the actual arguments
can be distributed arrays which need not be `co-distributed'; if not,
a typical implementation would in general perform all data communications 
prior to calling the procedure, and would then pass-in the required 
elements locally via its argument list.
\end{itemize}


\paragraph{Uses and MIMD aspects}

\subparagraph{FORALL statements and constructs}

Pure functions may be used in expressions in FORALL statements and 
constructs, unlike general functions.  Because a {\it forall-assignment}
may be an {\it array-assignment} the pure function can have an array result.  
For example, if a certain problem
is data-parallel over a 2d grid, and the data structure at each grid
point is a vector of length 3 (2d QCD?), we could have:
                                                              \CODE
    INTERFACE
!HPF$ PURE
      FUNCTION f (x)
        REAL, DIMENSION(3) :: f, x
      END FUNCTION f
    END INTERFACE
    REAL  v (3,10,10)
    ...
    FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 
                                                              \EDOC


\subparagraph{MIMD parallelism}

A natural way of obtaining some MIMD parallelism is by
means of branches within a pure function which depend on argument 
values.  These branches can be governed by content-based or index-based 
conditionals (the latter in a FORALL context).  For example:
                                                              \CODE
!HPF$ PURE
    FUNCTION f (x, i)
      IF (x > 0) THEN     ! content-based conditional
        ...
      ELSE IF (i==1 .OR. i==n) THEN    ! index-based conditional
        ...
      ENDIF
    END FUNCTION

    ...
    FORALL (i=1:n)  x (i) = f(x(i), i)
    ...
                                                              \EDOC
Content-based conditionals can be exploited generally, including in
array assignments (see below), which may sometimes obviate the need for 
WHERE-ELSEWHERE constructs or sequences of masked FORALLs with their 
potential synchronisation overhead. 

\subparagraph{Elemental function references}

Pure functions with scalar dummy arguments and result can be invoked
{\em elementally\/} in array expressions with the 
same interpretation as Fortran~90 elemental intrinsic functions.
This interpretation (Fortran~90 standard, Section~13.2.1) is as follows:
\begin{quote}
If a generic name or a specific name is used to reference an elemental
intrinsic function, the shape of the result is the same as the shape of 
the argument with the greatest rank.
If the arguments are all scalar, the result is scalar.
For those elemental intrinsic functions that have more than one argument, 
all arguments must be conformable.
In the array-valued case, the values of the elements, if any of the 
result are the same as would have been obtained if the scalar-valued 
function had been applied separately, in any order, to corresponding 
elements of each argument. [An argument called KIND must be specified as 
a scalar integer initialization expression and must specify a 
representation method for the function result that exists on the processor.]
\end{quote}
(The last sentence of this section, enclosed in square brackets,
does not apply to elemental references of user-defined pure functions.)

Examples of elemental usage are:
                                                              \CODE
INTERFACE 
!HPF$ PURE
  REAL FUNCTION foo (x, y, z)
    REAL, INTENT(IN) :: x, y, z
  END FUNCTION
END INTERFACE

REAL a(100), b(100), c(100)
REAL p, q, r

p      = foo (p, q, r)      ! OK - scalar call
a(1:n) = foo (a(1:n), b(1:n), c(1:n))    ! OK - elemental call
a(1:n) = foo (a(1:n), q, r) ! OK - scalar args 'promoted' to arrays
a(1:n) = foo (p, q, r)      ! OK - scalar result assigned to array
                                                              \EDOC
An example involving a WHERE-ELSEWHERE construct is:
                                                              \CODE
INTERFACE
!HPF$ PURE
  REAL FUNCTION f_egde (x)
     REAL x
  END FUNCTION f_edge
!HPF$ PURE
  REAL FUNCTION f_interior (x)
    REAL x
  END FUNCTION f_interior
END INTERFACE

REAL a (10,10)
LOGICAL edges (10,10)
! ...  initialise mask array 'edges' ...

WHERE (edges)
  a = f_egde (a)
ELSE WHERE
  a = f_interior (a)
END WHERE
                                                              \EDOC
(Incidentally, this example also presents the possibility of obtaining
MIMD parallelism, if the compiler can establish that the two assignments 
are independent and so does not force a synchronisation at the ELSEWHERE
statement.)

\subparagraph{Elemental subroutine references}

Pure subroutines with scalar dummy arguments can be invoked 
{\em elementally\/} with the same interpretation as Fortran~90 
elemental intrinsic subroutines (of which there is only one).
This interpretation (Fortran~90 standard, Section~13.2.2) is as follows:
\begin{quote}
An elemental subroutine is one that is specified for scalar arguments, 
but may be applied to array arguments.  In a reference to an elemental 
intrinsic subroutine, either all actual auments must be scalar or all 
INTENT(OUT) and INTENT(INOUT) arguments must be arrays of the same shape 
and the remaining arguments must be conformable with them. In the case 
that the INTENT(OUT) and INTENT(INOUT) arguments are arrays, the values 
of the elements, if any, of the results are the same as would be obtained 
if the subroutine with scalar arguments were applied separately, in any 
order, to corresponding elements of each argument.
\end{quote}

\subparagraph{Advantages of elemental usage}

User-defined elemental procedures have several potential advantages.
They would be a very convenient programming tool, as the same procedure 
can be applied to actual arguments of any rank.

In addition, the implementation of an elemental function returning an
array-valued result in an array expression is likely to be more 
efficient than that of an equivalent array function.  One reason is 
that it requires less temporary storage for the result (i.e.\ storage 
for a single result versus storage for the entire array of results).  
Another is that it saves on looping if an array expression is 
implemented by sequential iteration over the component elemental 
expressions (as may be done for the `segment' of the array expression 
local to each process).  This is because, in the sequential version, 
the elemental function can be invoked elementally in situ within the 
expression.  The array function, on the other hand, must be executed 
before the expression is evaluated, storing its result in a temporary 
array for use within the expression.  Looping is then required during 
the execution of the array function body as well as the expression 
evaluation.


\subsection{A Proposal for MIMD Support in HPF\protect\footnote{Version 
of July 18, 1992 - Clemens-August Thole, GMD I1.T}}

\label{mimd-support}
	          

\subsubsection{Abstract}

This proposal tries to supply sufficient language support in order 
to deal with loosely sysnchronous programs, some of which have been 
identified in my "A vote for explicit MIMD support".
This is a proposal for the support of MIMD parallelism, which extends
Section~\ref{do-independent}. 
It is more oriented
towards the CRAY - MPP Fortran Programming Model and the PCF proposal. 
The fine-grain synchronization of PCF is not proposed for implementation.
Instead of the CRAY-mechanims for assigning work to processors an 
extension of the ON-clause is used.
Due to the lack of fine-grain synchronization the constructs can be executed
on SIMD or sequential architectures just by ignoring the additional information.


\subsubsection{Summary of the current situation of MIMD support as part of HPF}

According to the Chuck Koelbel's (Rice) mail dated March 20th "Working Group 4 -
Issues for discussion" MIMD-support is a topic for discussion within working
group 4. 

Dave Loveman (DEC) has produced a document on FORALL statements 
(inorporated in Sections~\ref{forall-stmt} and \ref{forall-construct}) which
summarizes the discussion. Marc Snir proposed some extensions. These
constructs allow to describe SIMD extensions in an extended way compared
to array assignments. 

A topic for working papers is the interface of HPF Fortran to program units
which execute in SPMD mode. Proposals for "Local Subroutines" have been made
by Marc Snir and Guy Steele
(Chapter~\ref{foreign}). Both proposals
define local subroutines as program units, which are executed by all
processors independent of each other. Each processor has only access
to the data contained in its local memory. Parts of distributed data objects
can be accessed and updated by calls to a special library. Any message-passing
library might be used for synchronization and communication.
This approach does not really integrate MIMD-support into HPF programming.

The MPP Fortran proposal by Douglas M. Pase, Tom MacDonald, Andrew Meltzer (CRAY)
contained the following features in order to support integrated MIMD features:
\begin{itemize}
   \item  parallel directive
   \item  shared loops 
   \item  private variables
   \item  barrier synchronization
   \item  no-barrier directive for removing synchronization
   \item  locks, events, critical sections and atomic update
   \item  functions, to examine the mapping of data objects.
\end{itemize}

Steele's "Proposal for loops in HPF" (02.04.92) included a proposal for a 
directive "!HPF$ INDEPENDENT( integer_variable_list)", which specifies
for the next set of nested loops, that the loops with the specified
loop variables can be executed independent from each other.
(Sectin~\ref{do-independent} is a short version of this proposal.) 

Chuck Koelbel gave an overview on different styles for parallel loops
in "Parallel Loops Position Paper". No specific proposal was made.

Min-You Wu "Proposal for FORALL, May 1992" extended Guy Steele's 
"!HPF$INDEPENDENT" proposal to use the directive in a block style.

Clemens-August Thole "A vote for explicit MIMD support" contains 3 examples
from different application areas, which seem to require MIMD support for
efficient execution. 

\paragraph{Summary}

In contrast to FORALL extensions MIMD support is currently not well-established
as part of HPF Fortran. The examples in "A vote for explicit MIMD support"
show clearly the need for such features. Local subroutines do not fulfill
the requirements because they force to use a distributed memory programming model,
which should not be necessary in most cases.

With the exception of parallel sections all interesting features
are contained in the MPP-proposal. I would like to split the discussion
on specifying parallelism, synchronization and mapping into three different
topics. Furthermore I would like to see corresponding features to be expessed
in the style of of the current X3H5 proposal, if possible, in order to
be in line with upcoming standards.


\subsubsection{Proposal for MIMD support}

In order to support the spezification of MIMD-type of parallelism the following
features are taken from the "Fortran 77 Binding of X3H5 Model for 
Parallel Programming Constructs": 
\begin{itemize}
    \item   PARALLEL DO construct/directive
    \item   PARALLEL SECTIONS worksharing construct/directive
    \item   NEW statement/directive
\end{itemize}

These constructs are not used with PCF like options for mapping or 
sysnchronisation but are combined with the ON clause for mapping operations
onto the parallel architecture. 

\paragraph{PARALLEL DO}

\subparagraph{Explicit Syntax}

The PARALLEL DO construct is used to specify parallelism amoung the 
iterations of a block of code. The PARALLEL DO construct has the same
syntax as a DO statement. For an directive approach the directive
!HPF$ PARALLEL can be used in front of a do statement.
After the PARALLEL DO statement a new-declaration may be inserted.

A PARALLEL DO construct might be nested with other parallel constructs. 

\subparagraph{Interpretation}

The PARALLEL DO is used to specify parallel execution of the iterations of
a block of code. Each iteration of a PARALLEL DO is an independent unit
of work. The iterations of PARALLEL DO must be data independent. Iterations
are data independent if the storage sequence accociated with each variable
are array element that is assigned a value by each iteration is not referenced
by any other iteration. 

A program is not HPF conforming, if for any iteration a statement is executed,
which causes a transfer of control out of the block defined by the PARALLEL
DO construct. 

The value of the loop index of a PARALLEL DO is undefined outside the scope
of the PARALLEL DO construct. 


\paragraph{PARALLEL SECTIONS}

The parallel sections construct is used to specify parallelism among sections
of code.

\subparagraph{Explicit Syntax}


                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        !HPF$ SECTION
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
structured as
                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        [new-declaration-stmt-list]
        [section-block]
        [section-block-list]
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
where [section-block] is
                                                              \CODE
        !HPF$ SECTION
        [execution-part]
                                                              \EDOC

\subparagraph{Interpretation}

The parallel sections construct is used to specify parallelism among sections
of code. Each section of the code is an independent unit of work. A program
is not standard conforming if during the execution of any parallel sections
construct a transfer of control out of the blocks defined by the Parallel
Sections construct is performed. 
In a standard conforming program the sections of code shall be data 
independent. Sections are data independent if the storage sequence accociated 
with each variable are array element that is assigned a value by each section
is not referenced by any other section. 


\paragraph{Data scoping}

Data objects, which are local to a subroutine, are different between 
distinct units of work, even if the execute the same subroutine.


\paragraph{NEW statement/directive}

The NEW statement/directive allows the user to generate new instances of 
objects with the same name as an object, which can currently be referenced.


\subparagraph{Explicit Syntax}

A [new-declaration-stmt] is
                                                                \CODE
       !HPF$ NEW variable-name-list
                                                                \EDOC

\subparagraph{Coding rules}

A [varable-name] shall not be
\begin{itemize} 
\item    the name of an assumed size array, dummy argument, common block, 
function or entry point
\item    of type character with an assumed length
\item    specified in a SAVE of DATA statement
\item    associated with any object that is shared for this parallel construct.
\end{itemize}

\subparagraph{Interpretation}
 
Listing a variable on a NEW statement causes the object to be explicitly
private for the parallel construct. For each unit of work of the parallel 
construct a new instance of the object is created and referenced with the
specific name. 


\end{document}

From ngai@hpltfn.hpl.hp.com  Tue Sep 15 14:17:09 1992
Received: from hplms2.hpl.hp.com by cs.rice.edu (AA29327); Tue, 15 Sep 92 14:17:09 CDT
Received: from hpltfn.hpl.hp.com by hplms2.hpl.hp.com with SMTP
	(16.5/15.5+IOS 3.20) id AA01375; Tue, 15 Sep 92 12:17:04 -0700
Received: by hpltfn.hpl.hp.com
	(16.6/15.5+IOS 3.14) id AA01347; Tue, 15 Sep 92 12:16:43 -0700
Date: Tue, 15 Sep 92 12:16:43 -0700
From: Tin-Fook Ngai <ngai@hpltfn.hpl.hp.com>
Message-Id: <9209151916.AA01347@hpltfn.hpl.hp.com>
To: hpff-forall@cs.rice.edu
Subject: A proposal for EXECUTE-ON directive


The current HPF provides pretty adequate features to map data and to
specify data-parallel executions.  However, I found one important feature
still missing.  That is, for each parallel execution execution which data
accesses should be treated as local accesses and which can't.  Explicit
data mapping (i.e. alignment and distribution) does not necessarily
sufficient to determine the best execution location (provided that the
compiler is not super-smart).  In order to assist the compiler to generate
high performance code, a directive that hints at the execution location
will be very useful in HPF. So here comes this proposal.  Although it
comes late (I didn't make it into first reading in the last meeting!), I
wish we can fully discuss this important feature and incorporate it into
HPF (next meeting?).


This proposal is partially motivated by my previous discussion with
Michael Wolf of Oregon Graduate Institute.  He brought up the issue that
compiler may not be able to infer the best data locality from current HPF 
data mapping directives.  (See Example 6 in the proposal.)


Tin-Fook Ngai

------------------------------------------------------------------------

A PROPOSAL FOR EXECUTE-ON DIRECTIVE IN HPF

September 14, 1992

Tin-Fook Ngai
Hewlett-Packard Laboratories
Email: ngai@hpl.hp.com


The proposed EXECUTE-ON directive is used to suggest where an iteration of
a DO construct or an indexed parallel assignment should be executed.  The
directive informs the compiler which data access should be local and
which data access may be remote.  


SYNTAX

!HPF$ EXECUTE (subscript-list) ON align-spec [; LOCAL array-name-list]


CONSTRAINT

Each point in the index space must be executed on only one template node.


USAGE

The EXECUTE-ON directive must immediately precede the corresponding DO
loop body, array assignment, FORALL statement, FORALL construct or
individual assignment statement in a FORALL construct.


INTERPRETATION

The subscript-list identifies a distinguish iteration index or an indexed
parallel assignment.  The align-spec identifies a template node.  The
EXECUTE-ON directive suggests that the iteration or parallel assignment
should be executed on the processor where the template node resides.  The
optional LOCAL directive informs the compiler all data accesses to the
specified array-name-list can be handled as local data accesses if the
related HPF data mapping directives are honored.


EXAMPLES

Example 1

      REAL A(N), B(N)
!HPF$ TEMPLATE T(N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ DISTRIBUTE T(CYCLIC(2))

!HPF$ INDEPENDENT            
      DO I = 1, N/2 
!HPF$ EXECUTE (I) ON T(2*I); LOCAL A, B, C
      ! we know that P(2*I-1) and P(2*I) is a permutation of 2*I-1 and 2*I
        A(P(2*I - 1)) = B(2*I - 1) + C(2*I - 1)    
        A(P(2*I)) = B(2*I) + C(2*I)
      END DO


Example 2

      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ EXECUTE (I,J) ON T(I+1,J-1)
      FORALL (I=1:N-1, J=2:N)   A(I,J) = A(I+1,J-1) + B(I+1,J-1)


Example 3

      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ EXECUTE (I,J) ON T(I,J)       ! applies to the entire FORALL construct
      FORALL (I=1:N-1, J=2:N) 
	A(I,J) = A(I+1,J-1) + B(I+1,J-1)
	B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL


Example 4

      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
      FORALL (I=1:N-1, J=2:N) 
!HPF$ EXECUTE (I,J) ON T(I,J)       ! applies only to the following assignment
	A(I,J) = A(I+1,J-1) + B(I+1,J-1)
	B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
   

Example 5 

      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

!HPF$ EXECUTE (I,J) ON T(I+1,J-1)
      A(1:N-1,2:N) = A(2:N,1:N-1) + B(2:N,1:N-1)

   
Example 6 
 
      !* Original program due to Michael Wolfe of Oregan Graduate Institute

      !* This program performs matrix multiplication C = A x B
      !* In each step, array B is rotated by row-blocks, multiplied
      !* diagonal-block-wise in parallel with A, results are accumulated in C 

      !* Note: without the EXECUTE-ON and LOCAL directive, the compiler
      !* will have a hard time to figure out all A, B and C accesses are 
      !* actual local, thus unable to generate the best efficient code 
      !* (i.e. communication-free and no runtime checking in the parallel 
      !* loop body).
 
      REAL A(N,N), B(N,N), C(N,N)

!HPF$ REALIGNABLE B
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN (:,:) WITH T:: A, B, C      
!HPF$ DISTRIBUTE T(BLOCK,*)             !* A,B,C are distributed by row blocks

      NOP = NUMBER_OF_PROCESSORS()
      IB = N/NOP

      DO IT = 0, NOP-1

!HPF$ REALIGN B(I,J) WITH T(I-IT*IB,J)  !* assuming warp around realignment

!HPF$ INDEPENDENT                       !* data parallel loop
        DO IP = 0, NOP-1     
!HPF$ EXECUTE (IP) ON T(IP*IB+1,1); LOCAL A, B, C
          ITP = MOD( IT+IP, NOP )

          DO I = 1, IB
            DO J = 1, N
              DO K = 1, IB
                C(IP*IB+I,J) = C(IP*IB+I,J) + A(IP*IB+I,ITP*IB+K)*B(ITP*IB+K,J)
              ENDDO  !* K
            ENDDO  !* J
          ENDDO  !* I
        ENDDO  !* IP

      ENDDO  !* IT


END OF PROPOSAL

------------------------------------------------------------------------------

From gls@think.com  Tue Sep 15 15:21:17 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA01619); Tue, 15 Sep 92 15:21:17 CDT
Received: from mail.think.com by erato.cs.rice.edu (AA17331); Tue, 15 Sep 92 15:21:13 CDT
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Tue, 15 Sep 92 16:21:08 -0400
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.2)
	id AA15028; Tue, 15 Sep 92 16:21:07 EDT
Date: Tue, 15 Sep 92 16:21:07 EDT
Message-Id: <9209152021.AA15028@strident.think.com>
To: hpff-forall@erato.cs.rice.edu
Cc: gls@think.com
Subject: Nested WHERE statements


At this juncture, now that we have discussed FORALL statements
and their nesting, I would like to re-raise an issue that appeared
in the "big yellow book" distributed in January but has disappeared
since then.  It might be appropriate to discuss this in October.


(g) Nested WHERE statements.

Here is the text of a proposal once sent to X3J3:

| Briefly put, the less WHERE is like IF, the more difficult it is to
| translate existing serial codes into array notation.  Such codes tend to
| have the general structure of one or more DO loops iterating over array
| indices and surrounding a body of code to be applied to array elements.
| Conversion to array notation frequently involves simply deleting the DO
| loops and changing array element references to array sections or whole
| array references.  If the loop body contains logical IF statements, these
| are easily converted to WHERE statements.  The same is true for translating
| IF-THEN constructs to WHERE constructs, except in two cases.  If the IF
| constructs are nested (or contain IF statements), or if ELSE IF is used,
| then conversion suddenly becomes disproportionately complex, requiring the
| user to create temporary variables or duplicate mask expressions and to use
| explicit .AND. operators to simulate the effects of nesting.
| 
| Users also find it confusing that ELSEWHERE is syntactically and
| semantically analogous to ELSE rather than to ELSE IF.
| 
| [We] ... propose that the syntax of WHERE constructs be extended and
| changed to have the form
| 
| 	where-construct       is  where-construct-stmt
| 				    [ where-body-construct ]...
| 				  [ elsewhere-stmt
| 				    [ where-body-construct ]... ]...
| 				  [ where-else-stmt
| 				    [ where-body-construct ]... ]
| 				  end-where-stmt
| 
| 	where-construct-stmt  is  WHERE ( mask-expr )
| 
| 	elsewhere-stmt        is  ELSE WHERE ( mask-expr )
| 
| 	where-else-stmt       is  ELSE
| 
| 	end-where-stmt        is  END WHERE
| 
| 	mask-expr             is  logical-expr
| 
| 	where-body-construct  is  assignment-stmt
| 			      or  where-stmt
| 			      or  where-construct
| 	
| Constraint: In each assignment-stmt, the mask-expr and the variable
| 	being defined must be arrays of the same shape.  If a
| 	where-construct contains a where-stmt, an elsewhere-stmt,
| 	or another where-construct, then the two mask-expr's must
| 	be arrays of the same shape.
| 
| The meaning of such statements may be understood by rewrite rules.  First
| one may eliminate all occurrences of ELSE WHERE:
| 
| 	WHERE (m1)				WHERE (m1)
| 	  xxx					  xxx
| 	ELSE WHERE (m2)		becomes		ELSE
| 	  yyy					  WHERE (m2)
| 	END WHERE				    yyy
| 						  END WHERE
| 						END WHERE
| 
| where xxx and yyy represent any sequences of statements, so long as the
| original WHERE, ELSE WHERE, and END WHERE match, and the ELSE WHERE is the
| first ELSE WHERE of the construct (that is, yyy may include additional ELSE
| WHERE or ELSE statements of the construct).  Next one eliminates ELSE:
| 
| 	WHERE (m)				temp = m
| 	  xxx					WHERE (temp)
| 	ELSE			becomes		  xxx
| 	  yyy					END WHERE
| 	END WHERE				WHERE (.NOT. temp)
| 						  yyy
| 						END WHERE
| 
| Finally one eliminates nested WHERE constructs:
| 
| 	WHERE (m1)				temp = m1
| 	  xxx					WHERE (temp)
| 	  WHERE (m2)				  xxx
| 	    yyy			becomes		END WHERE
| 	  END WHERE				WHERE (temp .AND. (m2))
| 	  zzz					  yyy
| 	END WHERE				END WHERE
| 						WHERE (temp)
| 						  zzz
| 						END WHERE
| 
| and similarly for nested WHERE statements.
| 
| The effects of these rules will surely be a familiar or obvious possibility
| to all the members of the committee; I enumerate them explicitly here only
| so that there can be no doubt as to the meaning I intend to support.
| 
| Such rewriting rules are simple for a compiler to apply, or the code may
| easily be compiled even more directly.  But such transformations are
| tedious for our users to make by hand and result in code that is
| unnecessarily clumsy and difficult to maintain.
| 
| One might propose to make WHERE and IF even more similar by making two
| other changes.  First, require the noise word THERE to appear in a WHERE
| and ELSE WHERE statement after the parenthesized mask-expr, in exactly the
| same way that the noise word THEN must appear in IF and ELSE IF statements.
| (Read aloud, the results might sound a trifle old-fashioned--"Where knights
| dare not go, there be dragons!"--but technically would be as grammatically
| correct English as the results of reading an IF construct aloud.)  Second,
| allow a WHERE construct to be named, and allow the name to appear in ELSE
| WHERE, ELSE, and END WHERE statements.  I do not feel very strongly one way
| or the other about these no doubt obvious points, but offer them for your
| consideration lest the possibilities be overlooked.

[End of proposal as sent to X3J3.]

Now, for compatibility with Fortran 90, HPF should continue to
use ELSEWHERE instead of ELSE, but this causes no ambiguity:

      WHERE(...)
	...
      ELSE WHERE(...)
	...
      ELSEWHERE
	...
      END WHERE

is perfectly unambiguous, even when blanks are not significant.
Since X3J3 declined to adopt the keyword THERE, it should not be
used in HPF either (alas).

[End of excerpt from yellow book.]

I can't resist noting that the analogy to the IF statement
would be complete if there were also a single-statement form
of WHERE, as in

		WHERE (B .NE. 0.0)  A = A/B

And one could allow the noise-word THERE to be used optionally,
to complete the symmetry...

--Guy

From gls@think.com  Tue Sep 15 15:27:16 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA01884); Tue, 15 Sep 92 15:27:16 CDT
Received: from mail.think.com by erato.cs.rice.edu (AA17338); Tue, 15 Sep 92 15:27:11 CDT
Return-Path: <gls@Think.COM>
Received: from Strident.Think.COM by mail.think.com; Tue, 15 Sep 92 16:27:11 -0400
From: Guy Steele <gls@think.com>
Received: by strident.think.com (4.1/Think-1.2)
	id AA15082; Tue, 15 Sep 92 16:27:09 EDT
Date: Tue, 15 Sep 92 16:27:09 EDT
Message-Id: <9209152027.AA15082@strident.think.com>
To: gls@think.com
Cc: hpff-forall@erato.cs.rice.edu, gls@think.com
In-Reply-To: Guy Steele's message of Tue, 15 Sep 92 16:21:07 EDT <9209152021.AA15028@strident.think.com>
Subject: Nested WHERE statements: the se

   From: Guy Steele <gls@Think.COM>
   Date: Tue, 15 Sep 92 16:21:07 EDT

   ...

   I can't resist noting that the analogy to the IF statement
   would be complete if there were also a single-statement form
   of WHERE, as in

		   WHERE (B .NE. 0.0)  A = A/B

Forgive me; I don't know where my mind was.  Fortran 90
*of course* already has a single-statement form of WHERE.
D-uh.

I'm off to get another cup of coffee ...

--Guy

From chk@erato.cs.rice.edu  Wed Sep 23 16:45:35 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA06717); Wed, 23 Sep 92 16:45:35 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA25953); Wed, 23 Sep 92 16:45:25 CDT
Message-Id: <9209232145.AA25953@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Word-Of-The-Day: obstreperous : (adj) marked by unruly or aggressive
	noisiness; clamorous
Subject: New Draft FORALL Proposal
Date: Wed, 23 Sep 92 16:45:23 -0500
From: chk@erato.cs.rice.edu


I believe that this draft incorporates all changes recommended at the
September HPFF meeting.  (I haven't given it a final proofreading yet,
though.)  If there are questions or corrections, please send them to
the group where we can all see them.  I have not incorporated the
latest batch of proposals (Clemens Thole's ON clause, Guy Steele's
nested WHEREs, etc.) - hopefully, that will happen this weekend.

					Chuck


%chapter-head.tex

%Version of August 5, 1992 - David Loveman, Digital Equipment Corporation

\documentstyle[twoside,11pt]{report}
\pagestyle{headings}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin=.25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin=-.5in
\textwidth=6.0in
\textheight=9.0in
\parindent=2em

%the file syntax-macs.tex is physically included below

%syntax-macs.tex

%Version of July 29, 1992 - Guy Steele, Thinking Machines

\newdimen\bnfalign         \bnfalign=2in
\newdimen\bnfopwidth       \bnfopwidth=.3in
\newdimen\bnfindent        \bnfindent=.2in
\newdimen\bnfsep           \bnfsep=6pt
\newdimen\bnfmargin        \bnfmargin=0.5in
\newdimen\codemargin       \codemargin=0.5in
\newdimen\intrinsicmargin  \intrinsicmargin=3em
\newdimen\casemargin       \casemargin=0.75in
\newdimen\argumentmargin   \argumentmargin=1.8in

\def\IT{\it}
\def\RM{\rm}
\let\CHAR=\char
\let\CATCODE=\catcode
\let\DEF=\def
\let\GLOBAL=\global
\let\RELAX=\relax
\let\BEGIN=\begin
\let\END=\end


\def\FUNNYCHARACTIVE{\CATCODE`\a=13 \CATCODE`\b=13 \CATCODE`\c=13 \CATCODE`\d=13
		     \CATCODE`\e=13 \CATCODE`\f=13 \CATCODE`\g=13 \CATCODE`\h=13
		     \CATCODE`\i=13 \CATCODE`\j=13 \CATCODE`\k=13 \CATCODE`\l=13
		     \CATCODE`\m=13 \CATCODE`\n=13 \CATCODE`\o=13 \CATCODE`\p=13
		     \CATCODE`\q=13 \CATCODE`\r=13 \CATCODE`\s=13 \CATCODE`\t=13
		     \CATCODE`\u=13 \CATCODE`\v=13 \CATCODE`\w=13 \CATCODE`\x=13
		     \CATCODE`\y=13 \CATCODE`\z=13 \CATCODE`\[=13 \CATCODE`\]=13
                     \CATCODE`\-=13}

\def\RETURNACTIVE{\CATCODE`\
=13}

\makeatletter
\def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus 
 -.2ex}{2.3ex plus .2ex}{\large\sf}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-3.25ex plus -1ex minus 
 -.2ex}{1.5ex plus .2ex}{\large\sf}}

\def\@ifpar#1#2{\let\@tempe\par \def\@tempa{#1}\def\@tempb{#2}\futurelet
    \@tempc\@ifnch}

\def\?#1.{\begingroup\def\@tempq{#1}\list{}{\leftmargin\intrinsicmargin}\relax
  \item[]{\bf\@tempq.} \@intrinsictest}
\def\@intrinsictest{\@ifpar{\@intrinsicpar\@intrinsicdesc}{\@intrinsicpar\relax}}
\long\def\@intrinsicdesc#1{\list{}{\relax
  \def\@tempb{ Arguments}\ifx\@tempq\@tempb
			  \leftmargin\argumentmargin
			  \else \leftmargin\casemargin \fi
  \labelwidth\leftmargin  \advance\labelwidth -\labelsep
  \parsep 4pt plus 2pt minus 1pt
  \let\makelabel\@intrinsiclabel}#1\endlist}
\long\def\@intrinsicpar#1#2\\{#1{#2}\@ifstar{\@intrinsictest}{\endlist\endgroup}}
\def\@intrinsiclabel#1{\setbox0=\hbox{\rm #1}\ifnum\wd0>\labelwidth
  \box0 \else \hbox to \labelwidth{\box0\hfill}\fi}
\def\Case(#1):{\item[{\it Case (#1):}]}
\def\ {\@ifnextchar({\def\@tempq{#1}\@intrinsicopt}{\item[#1]}}
\def\@intrinsicopt(#1){\item[{\@tempq} (#1)]}

\def\MATRIX#1{\relax
    \@ifnextchar,{\@MATRIXTABS{}#1,\@FOO, \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar;{\@MATRIXTABS{}#1,\@FOO; \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar:{\@MATRIXTABS{}#1,\@FOO: \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar.{\hfill\penalty1\null\penalty10000\hskip0pt plus 1filll
		  \@MATRIXTABS{}#1,\@FOO.\penalty-50\@gobble
  }{\@MATRIXTABS{}#1,\@FOO{ }\hskip0pt plus 1filll\penalty-1}}}}}

\def\@MATRIXTABS#1#2,{\@ifnextchar\@FOO{\@MATRIX{#1#2}}{\@MATRIXTABS{#1#2&}}}
\def\@MATRIX#1\@FOO{\(\left[\begin{array}{rrrrrrrrrr}#1\end{array}\right]\)}

\def\@IFSPACEORRETURNNEXT#1#2{\def\@tempa{#1}\def\@tempb{#2}\futurelet\@tempc\@ifspnx}

{
\FUNNYCHARACTIVE
\GLOBAL\DEF\FUNNYCHARDEF{\RELAX
    \DEFa{{\IT\CHAR"61}}\DEFb{{\IT\CHAR"62}}\DEFc{{\IT\CHAR"63}}\RELAX
    \DEFd{{\IT\CHAR"64}}\DEFe{{\IT\CHAR"65}}\DEFf{{\IT\CHAR"66}}\RELAX
    \DEFg{{\IT\CHAR"67}}\DEFh{{\IT\CHAR"68}}\DEFi{{\IT\CHAR"69}}\RELAX
    \DEFj{{\IT\CHAR"6A}}\DEFk{{\IT\CHAR"6B}}\DEFl{{\IT\CHAR"6C}}\RELAX
    \DEFm{{\IT\CHAR"6D}}\DEFn{{\IT\CHAR"6E}}\DEFo{{\IT\CHAR"6F}}\RELAX
    \DEFp{{\IT\CHAR"70}}\DEFq{{\IT\CHAR"71}}\DEFr{{\IT\CHAR"72}}\RELAX
    \DEFs{{\IT\CHAR"73}}\DEFt{{\IT\CHAR"74}}\DEFu{{\IT\CHAR"75}}\RELAX
    \DEFv{{\IT\CHAR"76}}\DEFw{{\IT\CHAR"77}}\DEFx{{\IT\CHAR"78}}\RELAX
    \DEFy{{\IT\CHAR"79}}\DEFz{{\IT\CHAR"7A}}\DEF[{{\RM\CHAR"5B}}\RELAX
    \DEF]{{\RM\CHAR"5D}}\DEF-{\@IFSPACEORRETURNNEXT{{\CHAR"2D}}{{\IT\CHAR"2D}}}}
}

%%% Warning!  Devious return-character machinations in the next several lines!
%%%           Don't even *breathe* on these macros!
{\RETURNACTIVE\global\def\RETURNDEF{\def
{\@ifnextchar\FNB{}{\@stopline\@ifnextchar
{\@NEWBNFRULE}{\penalty\@M\@startline\ignorespaces}}}}\global\def\@NEWBNFRULE
{\vskip\bnfsep\@startline\ignorespaces}\global\def\@ifspnx{\ifx\@tempc\@sptoken \let\@tempd\@tempa \else \ifx\@tempc
\let\@tempd\@tempa \else \let\@tempd\@tempb \fi\fi \@tempd}}
%%% End of bizarro return-character machinations.

\def\IS{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth  \hskip-\bnfalign
  \hbox to \bnfalign{\unhbox\@curfield\hfill}\hbox to \bnfopwidth{\bf is \hfill}}
\def\OR{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth \hbox to \bnfopwidth{\bf or \hfill}}
\def\R#1 {\hbox to 0pt{\hskip-\bnfmargin R#1\hfill}}
\def\XBNF{\FUNNYCHARDEF\FUNNYCHARACTIVE\RETURNDEF\RETURNACTIVE
  \def\@underbarchar{{\char"5F}}\tt\frenchspacing
  \advance\@totalleftmargin\bnfmargin \tabbing
  \hskip\bnfalign\hskip\bnfopwidth\hskip\bnfindent\=\kill\>\+\@gobblecr}
\def\endXBNF{\-\endtabbing}

\def\BNF{\BEGIN{XBNF}}
\def\FNB{\END{XBNF}}

\begingroup \catcode `|=0 \catcode`\\=12
|gdef|@XCODE#1\EDOC{#1|endtrivlist|end{tt}}
|endgroup

\def\CODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces \@XCODE}
\def\ICODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces
   \FUNNYCHARDEF\FUNNYCHARACTIVE \UNDERBARACTIVE\UNDERBARDEF \@XCODE}

\def\@underbarsub#1{{\ifmmode _{#1}\else {$_{#1}$}\fi}}
\let\@underbarchar\_
\def\@underbar{\let\@tempq\@underbarsub\if\@tempz A\let\@tempq\@underbarchar\fi
  \if\@tempz B\let\@tempq\@underbarchar\fi\if\@tempz C\let\@tempq\@underbarchar\fi
  \if\@tempz D\let\@tempq\@underbarchar\fi\if\@tempz E\let\@tempq\@underbarchar\fi
  \if\@tempz F\let\@tempq\@underbarchar\fi\if\@tempz G\let\@tempq\@underbarchar\fi
  \if\@tempz H\let\@tempq\@underbarchar\fi\if\@tempz I\let\@tempq\@underbarchar\fi
  \if\@tempz J\let\@tempq\@underbarchar\fi\if\@tempz K\let\@tempq\@underbarchar\fi
  \if\@tempz L\let\@tempq\@underbarchar\fi\if\@tempz M\let\@tempq\@underbarchar\fi
  \if\@tempz N\let\@tempq\@underbarchar\fi\if\@tempz O\let\@tempq\@underbarchar\fi
  \if\@tempz P\let\@tempq\@underbarchar\fi\if\@tempz Q\let\@tempq\@underbarchar\fi
  \if\@tempz R\let\@tempq\@underbarchar\fi\if\@tempz S\let\@tempq\@underbarchar\fi
  \if\@tempz T\let\@tempq\@underbarchar\fi\if\@tempz U\let\@tempq\@underbarchar\fi
  \if\@tempz V\let\@tempq\@underbarchar\fi\if\@tempz W\let\@tempq\@underbarchar\fi
  \if\@tempz X\let\@tempq\@underbarchar\fi\if\@tempz Y\let\@tempq\@underbarchar\fi
  \if\@tempz Z\let\@tempq\@underbarchar\fi\@tempq}
\def\@under{\futurelet\@tempz\@underbar}

\def\UNDERBARACTIVE{\CATCODE`\_=13}
\UNDERBARACTIVE
\def\UNDERBARDEF{\def_{\protect\@under}}
\UNDERBARDEF

\catcode`\$=11  

%the following line would allow derived-type component references 
%FOO%BAR in running text, but not allow LaTeX comments
%without this line, write FOO\%BAR
%\catcode`\%=11 

\makeatother

%end of file syntax-macs.tex


\title{{\em D R A F T} \\High Performance Fortran \\ FORALL Proposal}
\author{High Performance Fortran Forum}
\date{September 23, 1992}

\hyphenation{RE-DIS-TRIB-UT-ABLE sub-script Wil-liam-son}

\begin{document}

\maketitle

\newpage

\pagenumbering{roman}

\vspace*{4.5in}

This is the result of a LaTeX run of a draft of a single chapter of 
the HPFF Final Report document.

\vspace*{3.0in}

\copyright 1992 Rice University, Houston Texas.  Permission to copy 
without fee all or part of this material is granted, provided the 
Rice University copyright notice and the title of this document 
appear, and notice is given that copying is by permission of Rice 
University.

\tableofcontents

\newpage

\pagenumbering{arabic}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%put text of chapter here


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%put \end{document} here
%statements.tex


%Revision history:
%August 2, 1992 - Original version of David Loveman, Digital Equipment
%	Corporation and Charles Koelbel, Rice University
%August 19, 1992 - chk - cleaned up discrepancies with Fortran 90 array 
%	expressions
%August 20, 1992 - chk - added DO INDEPENDENT section, Guy Steele's 
%	pointer proposals
%August 24, 1992 - chk - ELEMENTAL functions proposal
%August 31, 1992 - chk - PURE functions proposal
%September 3, 1992 - chk - reorganized sections
%September 21, 1992 - chk - began incorporating updates from Sept
%	10-11 meeting

\chapter{Statements}
\label{statements}

\section{Overview}

\footnote{Version of September 21, 1992 
- Charles Koelbel, Rice University.}
The purpose of the FORALL construct is to provide a convenient syntax for 
simultaneous assignments to large groups of array elements.
In this respect it is very similar to the functionality provided by array 
assignments and WHERE constructs.
FORALL differs from these constructs primarily in its syntax, which is 
intended to be more suggestive of local operations on each element of an 
array.
It is also possible to specify slightly more general array regions than 
are allowed by the basic array triplet notation.
Both single-statement and block FORALLs are defined in this proposal.

The FORALL statement, in both its single-statment and block forms, was
accepted by the High Performance Fortran Forum working group on its
second reading September 10, 1992.
This vote was contingent on a more complete definition of PURE
functions.
The idea of PURE functions was accepted by the HPFF working group at
its first reading on September 10, 1992.
However, the definition at that time was not completely acceptable due to
technical errors; those errors discussed at that time have been
revised in this draft.
The single-statement form of FORALL was accepted by the HPFF working
group as part of the official HPF subset in a first reading on
September 11, 1992; the block FORALL was excluded from the subset at
the same time.

The purpose of the INDEPENDENT directive is to allow the programmer to
give additional information to the compiler.
The user can assert that no data object is defined by one iteration of
a loop and used (read or written) by another; similar information can
be provided about the combinations of index values in a FORALL
statement.
A compiler may rely on this information to make optimizations, such as
parallelization or reorganizing communication.
If the assertion is true, the semantics of the program are not
changed; if it is false, the program is not standard-conforming and
has no defined meaning.
The ``Other Proposals'' section contains a number of additional
assertions with this flavor.

The INDEPENDENT assertion was accepted by the High Performance Fortran
Forum working group on its second reading on September 10, 1992.
The group also directed the FORALL subgroup to further explore methods for
allowing reduction operations to be accomplished in INDEPENDENT loops.

The following proposals are designed as a modification of the Fortran 90 
standard; all references to rule numbers and section numbers pertain to 
that document unless otherwise noted.


\section{Element Array Assignment - FORALL}
 

\label{forall-stmt}

\footnote{Version of September 21, 1992 - David
Loveman, Digital Equipment Corporation and Charles Koelbel, Rice
University.
Approved at second reading on September 10, 1992.}
The element array
assignment statement (FORALL statement) is used to specify an array
assignment in terms of array elements or groups of array sections.
The element array assignment may be
masked with a scalar logical expression.  
In functionality, it is similar to array assignment statements;
however, more general array sections can be assigned in FORALL.

Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsection{General Form of Element Array Assignment}

                                                                       \BNF
forall-stmt          \IS FORALL (forall-triplet-spec-list 
                           [,scalar-mask-expr ]) forall-assignment

forall-triplet-spec  \IS subscript-name = subscript : subscript 
                           [ : stride]
                                                                       \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

                                                                       \BNF
forall-assignment    \IS array-element = expr
                     \OR array-element => target
                     \OR array-section = expr
                                                                       \FNB

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: In the cases of simple assignment, the {\it array-element} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent
Constraint: In the case of pointer assignment, the {\it array-element} 
and {\it target} have the same constraints as the {\it pointer-object} 
and {\it target}, respectively, in a {\it pointer-assignment-stmt}.

\noindent
Constraint: In the cases of array section assignment, the {\it 
array-section} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent Constraint: If any subexpression in {\it expr}, {\it 
array-element}, or {\it array-section} is a {\it function-reference}, 
then the {\it function-name} must be a ``pure'' function as defined in
Section~\ref{forall-pure}.


For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
1}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
forall-assignment} is not executed.

A ``pure'' function is defined in Section~\ref{forall-pure}; the
intuition is that a pure function cannot have side effects.
The PURE declaration places syntactic constraints on the function to
ensure this.

Examples of element array assignments are:

                                                                  \CODE
REAL H(N,N), X(N,N), Y(N,N)
TYPE MONARCH
    INTEGER, POINTER :: P
END TYPE MONARCH
TYPE(MONARCH) :: A(N)
INTEGER B(N)
      ...
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, Y(I,J) .NE. 0.0) X(I,J) = 1.0 / Y(I,J)

! Set up a butterfly pattern
FORALL (J=1:N)  A(J)%P => B(1+IEOR(J-1,2**K))
                                                                  \EDOC 

\subsection{Interpretation of Element Array Assignments}  

Execution of an element array assignment consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Evaluation in any order of the {\it expr} or {\it target} and all 
subscripts contained in the 
{\it array-element} or {\it array-section} in the {\it forall-assignment} 
for all active combinations of {\em subscript-name} values.
In the case of pointer assignment where the {\it target} is not a 
pointer, the evaluation consists of identifying the object referenced 
rather than computing its value.

\item Assignment of the computed {\it expr} values to the corresponding 
elements specified by {\it array-element} or {\it array-section}.
The assignments may be made in any order.
In the case of a pointer assignment where the {\it target} is not a 
pointer, this assignment consists of associating the {\it array-element} 
with the object referenced.

\end{enumerate}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 


The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.

Note that if a function called in a FORALL is declared PURE, then it 
is impossible for that function's evaluation to affect other expressions' 
evaluations, either for the same combination of 
{\it subscript-name} values or for a different combination.
In addition, it is possible that the compiler can perform 
more extensive optimizations when all functions are declared PURE.


\subsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

                                                                   \CODE
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn , mask ) &
      a(e1,...,em) = rhs
                                                                   \EDOC

\noindent
is equivalent to the following standard Fortran 90 code:

\raggedbottom
                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
	  ...
  END DO
END DO

!then evaluate the expr in the forall-assignment for all valid 
!combinations of subscript names for which the scalar mask 
!expression is true (it is safe to avoid saving the subscript 
!expressions because of the conditions on FORALL expressions)

DO v1=templ1,tempu1,temps1
  DO v2=tel2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
	  ...
  END DO
END DO

!then perform the assignment of these values to the corresponding 
!elements of the array being assigned to

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(e1,...,em) = temprhs(v1,v2,...,vn)
        END IF
      END DO
	  ...
  END DO
END DO
                                                                      \EDOC
\flushbottom

\subsection{Consequences of the Definition of the FORALL Statement}

This section should be moved to the comments chapter in the final
draft.

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  
(This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Right-hand sides and subscripts on the left hand side of a {\it 
forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\item The intent of ``pure'' functions is to provide a class of
functions without side-effects, and to allow this side-effect freedom
to be checked syntactically.

\end{itemize}


\section{FORALL Construct}

\label{forall-construct}

\footnote{Version of August 20, 1992 -
David Loveman, Digital Equipment Corporation and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992.}
The FORALL construct is a generalization of the element array
assignment statement allowing multiple assignments, masked array 
assignments, and nested FORALL statements to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it
forall-construct}.

\subsection{General Form of the FORALL Construct}

                                                                    \BNF
forall-construct   \IS FORALL (forall-triplet-spec-list [,scalar-mask-expr ])
                               forall-body-stmt-list
                            END FORALL

forall-body-stmt     \IS forall-assignment
                     \OR where-stmt
                     \OR forall-stmt
                     \OR forall-construct
                                                                    \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\noindent
Constraint:  Any left-hand side {\it array-section} or {\it 
array-element} in any {\it forall-body-stmt}
must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
within a {\it forall-construct}, then the inner FORALL may not redefine 
any {\it subscript-name} used in the outer {\it forall-construct}.
This rule applies recursively in the event of multiple nesting levels.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
1)}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
forall-assignment}s are not  executed.

Examples of the FORALL construct are:

                                                                 \CODE
FORALL ( I = 2:N-1, J = 2:N-1 )
  A(I,J) = A(I,J-1) + A(I,J+1) + A(I-1,J) + A(I+1,J)
  B(I,J) = A(I,J)
END FORALL

FORALL ( I = 1:N-1 )
  FORALL ( J = I+1:N )
    A(I,J) = A(J,I)
  END FORALL
END FORALL

FORALL ( I = 1:N, J = 1:N )
  A(I,J) = MERGE( A(I,J), A(I,J)**2, I.EQ.J )
  WHERE ( .NOT. DONE(I,J,1:M) )
    B(I,J,1:M) = B(I,J,1:M)*X
  END WHERE
END FORALL
                                                                \EDOC


\subsection{Interpretation of the FORALL Construct}

Execution of a FORALL construct consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Execute the {\it forall-body-stmts} in the order they appear.
Each statement is executed completely (that is, for all active 
combinations of {\it subscript-name} values) according to the following 
interpretation:

\begin{enumerate}

\item Assignment statements, pointer assignment statements, and array
assignment statements (i.e.
statements in the {\it forall-assignment} category) evaluate the 
right-hand side {\it expr} and any left-and side subscripts for all 
active {\it subscript-name} values,
then assign those results to the corresponding left-hand side references.

\item WHERE statements evaluate their {\it mask-expr} for all active 
combinations of values of {\it subscript-name}s.
All elements of all masks may be evaluated in any order. 
The assignments within the WHERE branch of the statement are then 
executed in order using the above interpretation of array assignments 
within the FORALL, but the only array elements assigned are those 
selected by both the active {\it subscript-names} and the WHERE mask.
Finally, the assignments in the ELSEWHERE branch are executed if that 
branch is present.
The assignments here are also treated as array assignments, but elements 
are only assigned if they are selected by both the active combinations 
and by the negation of the WHERE mask.

\item FORALL statements and FORALL constructs first evaluate the 
subscript and stride expressions in 
the {\it forall-triplet-spec-list} for all active combinations of the 
outer FORALL constructs.
The set of valid combinations of {\it subscript-names} for the inner 
FORALL is then the union of the sets defined by these bounds and strides 
for each active combination of the outer {\it subscript-names}.
For example, the valid set of the inner FORALL in the second example in 
the last section is the upper triangle (not including the main diagonal) 
of the \(n \times n\) matrix a.
The scalar mask expression is then evaluated for all valid combinations 
of the inner FORALL's {\it subscript-names} to produce the set of active 
combinations.
If there is no scalar mask expression, it is assumed to be always true.
Each statement in the inner FORALL is then executed for each valid 
combination (of the inner FORALL), recursively following the 
interpretations given in this section.

\end{enumerate}

\end{enumerate}

If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL construct itself. 

A single assignment or array assignment statement in a {\it 
forall-construct} must obey the same restrictions as a {\it 
forall-assignment} in a simple {\it forall-stmt}.
(Note that the lowest level of nested statements must always be an 
assignment statement.)
For example, an assignment may not cause the same array element to be 
assigned more than once.
Different statements may, however, assign to the 
same array element, and assignments made in one
statement may affect the execution of a later statement.

\subsection{Scalarization of the FORALL Construct}

A {\it forall-construct} othe form:

                                                                \CODE
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
                                                                \EDOC

where each si is an assignment is equivalent to the following scalar code:

                                                                \CODE
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
                                                                \EDOC

A similar statement can be made using FORALL constructs when the 
si may be WHERE or FORALL constructs.

A {\it forall-construct} of the form:

                                                                \CODE
FORALL ( v1=l1:u1:s1, mask )
  WHERE ( mask2(l2:u2) )
    a(vi,l2:u2) = rhs1
  ELSEWHERE
    a(vi,l2:u2) = rhs2
  END WHERE
END FORALL
                                                                \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the masks for the WHERE

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    tmpl2(v1) = l2
    tmpu2(v1) = u2
    tempmask2(v1,tmpl2(v1):tmpu2(v1)) = mask2(tmpl2(v1):tmpu2(v1))
  END IF
END DO

!then evaluate the WHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs1(v1,tmpl2(v1):tmpu2(v1)) = rhs1
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs1(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO

!then evaluate the ELSEWHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs2(v1,tmpl2(v1):tmpu2(v1)) = rhs2
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs2(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO
                                                                   \EDOC


A {\it forall-construct} of the form:

                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  FORALL ( v2=l2:u2:s2, mask2 )
    a(e1,e2) = rhs1
	b(e3,e4) = rhs2
  END FORALL
END FORALL
                                                                  \EDOC

is equivalent to the following standard Fortran 90 code:


                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the inner FORALL bounds, etc

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    templ2(v1) = l2
    tempu2(v1) = u2
    temps2(v1) = s2
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      tempmask2(v1,v2) = mask2
    END DO
  END IF
END DO

!first statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs1(v1,v2) = rhs1
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        a(e1,e2) = temprhs1(v1,v2)
      END IF
    END DO
  END IF
END DO

!second statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs2(v1,v2) = rhs2
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        b(e3,e4) = temprhs2(v1,v2)
      END IF
    END DO
  END IF
END DO
                                                                   \EDOC


\subsection{Consequences of the Definition of the FORALL Construct}

This section should be moved to the comments chapter of the final
draft.

\begin{itemize}

\item A block FORALL means roughly the same as replicating the FORALL
header in front of each array assignment statement in the block, except
that any expressions in the FORALL header are evaluated only once,
rather than being re-evaluated before each of the statements in the body.
(The exceptions to this rule are nested FORALL statements and WHERE
statements, which introduce syntactic and functional complications
into the copying.)

\item One may think of a block FORALL as synchronizing twice per
contained assignment statement: once after handling the rhs and other 
expressions
but before performing assignments, and once after all assignments have
been performed but before commencing the next statement.  (In practice,
appropriate dependence analysis will often permit the compiler to
eliminate unnecessary synchronizations.)

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item In general, any expression in a FORALL is evaluated only for valid 
combinations of all surrounding subscript names for which all the
scalar mask eressions are true.

\item Nested FORALL bounds and strides can depend on outer FORALL {\it 
subscript-names}.  They cannot redefine those names, even temporarily (if 
they did there  would be no way to avoid multiple assignments to the same 
array element).

\item Dependences are allowed from one statement to later statements, but 
never from an assignment statement to itself.

\end{itemize}


\section{PURE Procedures and Elemental Invocation}

\label{forall-pure}

\footnote{Version of September 22, 
1992 - John Merlin, University of Southampton, 
and Charles Koelbel, Rice University. Approved at first reading on
September 10, 1992, subject to technical revisions for correctness.
The suggestions made there have been incorporated in this draft.}
A {\it pure\/} procedure is one which produces no side effects, except for 
assigning to dummy arguments of INTENT (OUT) or (INOUT) in the case of a
pure subroutine and returning a result in the case of a function.
It may be used in any way that a normal procedure of its type may 
be used.
In addition, pure functions may be used in a FORALL statement or
construct, and a pure procedure whose dummy arguments and result
variable (in the case of a
function) are all scalar may be used elementally, as defined in the
Fortran~90 standard.

\subsection{Declaration of Pure Procedures}

If a procedure is used in a context that requires it to be pure,
then it must be delared to be pure.
Intrinsic functions are always pure and require
no explicit declaration of this fact; intrinsic subroutines are not pure.
A statement function is pure if and only if all functions which it
calls are pure.
Otherwise, a pure procedure must have an explicit interface in the
scope where it is called, and this interface must contain the PURE
declaration.
The form of this declaration is 
a directive immediately after the {\it function-stmt\/} or {\it
subroutine-stmt\/}:
                                                                 \BNF
pure-directive \IS !HPF$ PURE [procedure-name]
                                                                 \FNB
The same declaration is used in both procedure definitions and
interface blocks.

To define pure functions, Rule~R1215 of the Fortran~90 standard is changed 
to:
                                                                 \BNF
function-subprogram \IS		function-stmt
				[pure-directive]
				[specification-part]
				[execution-part]
				[internal-subprogram-part]
				end-function-stmt
                                                                \FNB

\noindent
Constraint: If present, the {\it procedure-name\/} in the {\it
pure-directive\/} must match the name in the {\it function-statment}.

\noindent
Constraint: The dummy arguments of a pure function must have INTENT(IN).

\noindent
Constraint: The local variables of a pure function must not have the SAVE 
attribute.

\noindent
Constraint: The local variables of a pure function must not be used in
explicit ALIGN, DISTRIBUTE, REALIGN, or REDISTRIBUTE statements.

\noindent
Constraint: A pure function must not REALIGN or REDISTRIBUTE any
global object.

\noindent
Constraint: A pure function must not contain any of the following
statements:
\begin{enumerate}
\item DATA statements
\item PAUSE statements
\item STOP statements
\item Assignment (including array assignment and pointer assignment) to global
	variables
\item ALLOCATE, DEALLOCATE, and NULLIFY statements operating on global
	variables or dummy arguments
\item I/O statements
\end{enumerate}

\noindent
Constraint: Any procedure called from a pure function must be pure.

\noindent
Constraint: A pure function must not pass a global variable or a dummy
argument which is a pointer to a procedure argument with INTENT(OUT)
or INTENT(INOUT).

To assert that a function is pure, a {\it pure-directive\/} must be given.

To define pure subroutines, Rule~R1219 is changed to:
                                                                 \BNF
subroutine-subprogram \IS	subroutine-stmt
				[pure-directive]
				[specification-part]
				[execution-part]
				[internal-subprogram-part]
				end-subroutine-stmt
                                                                \FNB

\noindent
Constraint: If present, the {\it procedure-name\/} in the {\it
pure-directive\/} must match the name in the {\it subroutine-statment}.

\noindent
Constraint: The dummy arguments of a pure subroutine must have explicit INTENT.

\noindent
Constraint: The local variables of a pure subroutine must not have the SAVE 
attribute.

\noindent
Constraint: The local variables of a pure subroutine may not be used in
explicit ALIGN, DISTRIBUTE, REALIGN, or REDISTRIBUTE statements.

\noindent
Constraint: A pure subroutine must not REALIGN or REDISTRIBUTE any
global object.

\noindent
Constraint: A pure subroutine must not contain any of the following
statements:
\begin{enumerate}
\item DATA statements
\item PAUSE statements
\item STOP statements
\item Assignment (including array assignment and pointer assignment) to global
	variables
\item ALLOCATE, DEALLOCATE, and NULLIFY statements operating on global
	variables or dummy arguments
\item I/O statements
\end{enumerate}

\noindent
Constraint: Any procedure called from a pure subroutine must be pure.

\noindent
Constraint: A pure subroutine may not pass a global variable or a dummy
argument which is a pointer to a procedure argument with INTENT(OUT)
or INTENT(INOUT).

To assert that a subroutine is pure, a {\it pure-directive\/} must be
given.

To define interface specifications for pure procedures, Rule~R1204 is 
changed to:
                                                                \BNF
interface-body \IS	function-stmt
			[pure-directive]
			[specification-part]
			end-function-stmt
		\OR	subroutine-stmt
			[pure-directive]
			[specification-part]
			end-subroutine-stmt
                                                                \FNB
with the following constraints in addition to those in
Section~12.3.2.1 of the Fortran~90 standard:

\noindent
Constraint: An {\it interface-body\/} of a pure subroutine must specify
the intents of all dummy arguments.

The procedure characteristics defined by an interface body must be
consistent with the procedure's definition.
Regarding pure functions, this is interpreted as follows:
\begin{enumerate}
\item A function which is declared pure at its definition may be
declared pure in an interface block, but this is not required.
\item A function which is not declared pure at its definition must not
be declared pure in an interface block.
\end{enumerate}
That is, if an interface body contains a {\it pure-directive\/}, then the 
corresponding procedure definition must also contain it, though the 
reverse is not true.
When a procedure definition with a {\it pure-directive\/}
is compiled, the compiler may check that it satisfies the necessary 
constraints.


\subsection{Uses and MIMD Aspects of Pure Procedures}

\subsubsection{FORALL statements and constructs}

Pure functions may be used in expressions in FORALL statements and 
constructs, unlike general functions.  
Because a {\it forall-assignment}
may be an {\it array-assignment} the pure function can have an array
result.  
For example:
                                                              \CODE
    INTERFACE
      FUNCTION f (x)
        !HPF$ PURE f
        REAL, DIMENSION(3) :: f, x
      END FUNCTION f
    END INTERFACE
    REAL  v (3,10,10)
    ...
    FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 
                                                              \EDOC


\subsubsection{Elemental function references}

A pure function with scalar dummy arguments and scalar result
can be invoked {\em elementally\/} in array expressions with the 
same interpretation as Fortran~90 elemental intrinsic functions.

To define elemental invocations of pure functions, the following 
extra constraint is added after Rule~R1209 ({\it function-reference\/}):
\begin{quotation}
\noindent
Constraint: A user-defined function that is invoked
elementally (12.4.3) must be a pure function with only scalar dummy 
arguments and result.
\end{quotation}
Additionally, the first sentence of section 12.4.3 should be changed to:
\begin{quotation}
\noindent
A reference to {\em a user-defined pure function or\/} an elemental
intrinsic function 
is an {\bf elemental reference} if one or more actual arguments are
arrays and all array arguments have the same shape.
\end{quotation}
(where the additional words are italicised).
The interpretation of an elemental function reference is as follows
(based on Section~13.2.1 of the Fortran~90 standard):
\begin{quote}
If a pure function with only scalar arguments and result is invoked
with array arguments, the shape of the result is the same as the shape of 
the argument with the greatest rank.
If the arguments are all scalar, the result is scalar.
For functions that have more than one argument, 
all arguments must be conformable.
In the array-valued case, the values of the elements, if any, of the 
result are the same as would have been obtained if the scalar-valued 
function had been applied separately, in any order, to corresponding 
elements of each argument. 
\end{quote}

Examples of elemental function usage are
                                                              \CODE
    INTERFACE 
      REAL FUNCTION foo (x, y, z)
        !HPF$ PURE foo
        REAL, INTENT(IN) :: x, y, z
      END FUNCTION
    END INTERFACE

    REAL a(100), b(100), c(100)
    REAL p, q, r

    a(1:n) = foo (a(1:n), b(1:n), c(1:n))
    a(1:n) = foo (a(1:n), q, r)
    a = sin(b)
                                                              \EDOC
An example involving a WHERE-ELSEWHERE construct is
                                                              \CODE
    INTERFACE
      REAL FUNCTION f_egde (x)
        !HPF$ PURE
        REAL x
      END FUNCTION f_edge
      REAL FUNCTION f_interior (x)
        !HPF$ PURE
        REAL x
      END FUNCTION f_interior
    END INTERFACE

    REAL a (10,10)
    LOGICAL edges (10,10)

    WHERE (edges)
      a = f_egde (a)
    ELSE WHERE
      a = f_interior (a)
    END WHERE
                                                              \EDOC

\subsubsection{Elemental subroutine references}

Pure subroutines with scalar dummy arguments can be invoked 
{\em elementally\/} with the same interpretation as Fortran~90 
elemental intrinsic subroutines.
To define this, an
extra constraint is added after Rule~R1210 ({\it call-stmt\/}):
\begin{quotation}
\noindent
Constraint: A non-intrinsic subroutine that is invoked
elementally must be a pure subroutine with scalar dummy 
arguments.
\end{quotation}
Additionally, 
and the beginning of section 12.4.5 to:
\begin{quotation}
A reference to {\em a pure subroutine or\/} an elemental intrinsic
subroutine is an elemental reference if all actual arguments
corresponding to INTENT(OUT) and INTENT(INOUT) dummy arguments are
arrays that have the same shape and the remaining actual arguments are
conformable with them.
\end{quotation}
The interpretation of elemental subroutine invocation (based on
Section~13.2.2 of the Fortran~90 standard) is as follows:
\begin{quote}
An elemental subroutine is one that is specified for scalar arguments, 
but may be applied to array arguments.  In a reference to an elemental 
intrinsic subroutine, either all actual arguments must be scalar or all 
INTENT(OUT) and INTENT(INOUT) arguments must be arrays of the same shape 
and the remaining arguments must be conformable with them. In the case 
that the INTENT(OUT) and INTENT(INOUT) arguments are arrays, the values 
of the elements, if any, of the results are the same as would be obtained 
if the subroutine with scalar arguments were applied separately, in any 
order, to corresponding elements of each argument.
\end{quote}

Examples of elemental subroutine usage are
                                                                \CODE
    INTERFACE 
      SUBROUTINE solve_simul(tol, y, z)
        !HPF$ PURE solve_simul
        REAL, INTENT(IN) :: tol
        REAL, INTENT(INOUT) :: y, z
      END FUNCTION
    END INTERFACE

    REAL a(100), b(100), c(100)
    INTEGER bits(10)

    CALL solve_simul(0.1, a, b)
    CALL solve_simul(c, a, b)
    CALL mvbits(bits, 0, 4, bits, 4)
                                                                \EDOC
\subsubsection{Advantages of elemental usage}

User-defined elemental procedures have several potential advantages.
They would be a very convenient programming tool, as the same procedure 
can be applied to actual arguments of any rank.

In addition, the implementation of an elemental function returning an
array-valued result in an array expression is likely to be more 
efficient than that of an equivalent array function.  One reason is 
that it requires less temporary storage for the result (i.e.\ storage 
for a single result versus storage for the entire array of results).  
Another is that it saves on looping if an array expression is 
implemented by sequential iteration over the component elemental 
expressions (as may be done for the `segment' of the array expression 
local to each process).  This is because, in the sequential version, 
the elemental function can be invoked elementally in situ within the 
expression.  The array function, on the other hand, must be executed 
before the expression is evaluated, storing its result in a temporary 
array for use within the expression.  Looping is then required during 
the execution of the array function body as well as the expression 
evaluation.


\subsubsection{MIMD parallelism}

A natural way of obtaining some MIMD parallelism is by
means of branches within a pure function which depend on argument 
values.  These branches can be governed by content-based or index-based 
conditionals (the latter in a FORALL context).  For example:
                                                              \CODE
!HPF$ PURE
    FUNCTION f (x, i)
      IF (x > 0) THEN     ! content-based conditional
        ...
      ELSE IF (i==1 .OR. i==n) THEN    ! index-based conditional
        ...
      ENDIF
    END FUNCTION

    ...
    FORALL (i=1:n)  x (i) = f(x(i), i)
    ...
                                                              \EDOC
Content-based conditionals can be exploited generally, including in
array assignments (see below), which may sometimes obviate the need for 
WHERE-ELSEWHERE constructs or sequences of masked FORALLs with their 
potential synchronisation overhead. 


\subsection{Comments}

This section should be moved to the comments chapter of the final draft.

Comments on pure procedures:
\begin{itemize}

\item We believe that the constraints for a pure procedure guarantee
freedom from side-effects, thus ensuring that it can be invoked
concurrently at each
`element' of an array (where an `element' may itself be a data-structure, 
including an array).

\item All constraints can be statically checked, thus providing safety
for the programmer.  Of course, this means that there are some
functions without side effects that are not pure.

\item An earlier draft of this proposal contained a constraint disallowing 
pure procedures from accessing global data objects, particularly
distributed data objects.
This constraint has been dropped as inessential to the side-effect freedom 
that the HPF committee requested.
However, it may well be that some machines will have great difficulty 
implementing FORALL without this constraint.

\item One of us (JHM) is still in favour of disallowing access to global 
variables for a number of reasons: 
\begin{enumerate}
\item Aesthetically, it is in keeping with the
nature of a `pure' function, i.e. a function in the mathematical
sense, and in practical terms it imposes no real restrictions on the 
programmer, as global data can be passed-in via the argument list; 
\item Without this constraint HPF programs can no longer be implemented 
by pure message-passing, or at least not efficiently, i.e. without
sequentialising FORALL statements containing function calls and greatly
complicating their implementation; 
\item Absence of this restriction may inhibit optimisation of FORALLs
and array assignments, as the optimisation of assigning the {\it expr\/}
directly to the assignment variable rather than to a temporary intermediate
array may now require interprocedural analysis rather than just local 
analysis.
\end{enumerate}

\item Fortran 90 introduces the concept of `elemental procedures',
which are defined for scalar arguments but may also be applied to
conforming array-valued arguments.  
For an elemental function,
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g SIN(X).
For an elemental subroutine, the effect on each element of an INTENT(OUT)
or 
INTENT(INOUT) array argument is would be obtained by calling the 
subroutine with the corresponding elements of the arguments.
An example is the intrinsic subroutine MVBITS.

However, Fortran~90 restricts the application of elemental
procedures to a subset of the intrinsic procedures --- the programmer
cannot define his own.  Obviously, elemental invocation is equivalent to 
concurrent invocation, so extra constraints beyond those for normal
Fortran procedures are required to allow this to be done safely
(e.g. deterministically).  Appropriate constraints in this case are
the same as those for function calls in FORALL---indeed, the latter are 
virtually equivalent to elemental invocation in an array assignment, 
given the close correspondence between FORALL and array assignment.
Hence, we propose that pure procedures may also be invoked elementally,
subject to the additional constraint that their dummy arguments 
(and, for a function, result) are scalar.

\item The original draft proposed allowing pure procedures 
to be invoked elementally even if their dummy arguments or results 
were array-valued.  These provisions have been dropped to avoid 
promoting storage order to a higher level in Fortran~90
(i.e.\ to avoid introducing the concept of `arrays-if-arrays', 
which Fortran~90 seems to strenuously avoid!)   In practical terms,
the current proposal provides the same functionality as the original 
one for functions, though not for subroutines.  If a programmer wants 
elemental function behaviour, but also wants the `elements' to be
array-valued, this can be achieved using FORALL.

\item In typical FORALL or elemental implementation, a pure procedure 
would be called independently in each process, and its dummy arguments 
would be associated with `elements' local to that process. 
However, access to large global data structures such as look-up tables
is often useful within functions that are otherwise mathematically pure. 
This is the reason for restricting data mapping directives within the 
bodies of such procedures.
Note that, particularly in elemental invocations, the actual arguments
can be distributed arrays which need not be `co-distributed'; if not,
a typical implementation would in general perform all data communications 
prior to calling the procedure, and would then pass-in the required 
elements locally via its argument list.

\end{itemize}


\section{The INDEPENDENT Directive}

\label{do-independent}

\footnote{Version of August 20, 1992
 - Guy Steele, Thinking Machines Corporation, and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992; however, the INDEPENDENT subgroup was directed to
examine methods of allowing reductions to be performed within
INDEPENDENT constructs.}
The INDEPENDENT directive can procede a DO loop or FORALL statement or
construct.
Intuitively, it asserts to the compiler that the operations in the
following construct
may be executed independently--that is, in any order, or
interleaved, or concurrently--without changing the semantics
of the program.

The syntax of the INDEPENDENT directive is
                                                  \CODE
independent-dir	\IS	!HPF$INDEPENDENT [ (integer-variable-list) ]
                                                  \EDOC

\noindent
Constraint: An {\it independent-dir\/} must immediately precede a DO or FORALL
statement.

\noindent
Constraint: If the {\it integer-variable-list\/} is present, then the
variables named must be the index variables of set of perfectly nested
DO loops or indices from the same FORALL header.

The directive is said to apply to the indices named in its {\it
integer-variable-list}, or equivalently to the loops or FORALL indexed
by those variables.
If no {\it integer-variable-list\/} is present, then it is as if it
were present and contained the index variable for the DO or FORALL
imediately following the directive.


When applied to a nest of DO loops, an INDEPENDENT directive is an
assertion by the programmer that no iteration may affect any other
iteration, either directly or indirectly.
This implies that there are no no exits from the construct other than
normal loop termination, and no I/O is performed by the loop.
A sufficient condition for ensuring this is that
during
the execution of the loop(s), no iteration assigns to any scalar
data object which is 
accessed (i.e.\ read or written) by any other iteration.
The directive is purely advisory and a compiler is free
to ignore them if it cannot make use of the information.


For example:
                                                  \CODE
!HPF$INDEPENDENT
      DO I=1,100
        A(P(I)) = B(I)
      END DO
                                                  \EDOC
asserts that the array P does not have any repeated entries (else they
would cause interference when A was assigned).
It also limits how A and B may be storage associated.
(The remaining examples in this
section assume that no variables are storage or sequence associated.)

Another example:
                                                  \CODE
!HPF$INDEPENDENT (I1,I2,I3)
      DO I1 = 1,N1
        DO I2 = 1,N2
          DO I3 = 1,N3
            DO I4 = 1,N4   !The inner loop is not independent!
              A(I1,I2,I3) = A(I1,I2,I3) + B(I1,I2,I4)*C(I2,I3,I4)
            END DO
          END DO
        END DO
      END DO
                                                  \EDOC
The inner loop is not independent because each element of A is
assigned repeatedly.
However, the three outer loops are independent because they access
different elements of A.
It is not relevant that the outer loops read the same elements from B
and C, because those arrays are not assigned.

The interpretation of INDEPENDENT for FORALL is similar to that for
DO: it asserts that no combination of the indices that INDEPENDENT
applies to may affect another combination.
This is only possible if one combination of index values assigns to a
scalar data object accessed by another
combination.
A DO and a FORALL with the same body are equivalent if they both
have the INDEPENDENT directive.
In the case of a FORALL, any of the variables may be mentioned in the
INDEPENDENT directive:
                                                                \CODE
!HPF$INDEPENDENT (I1,I3)
    FORALL(I1=1:N1,I2=1:N2,I3=1:N3) 
      A(I1,I2,I3) = A(I1,I2-1,I3)
    END FORALL
                                                                \EDOC
This means that for any given values for I1 and I3,
all the right-hand sides for all values of I2 must
be computed before any assignment are done for that
specific pair of (I1,I3) values; but assignments for
one pair of (I1,I3) values need not wait for rhs
evaluation for a different pair of (I1,I3) values.

Graphically, the INDEPENDENT directive can be visualized as
eliminating edges from a precedence graph representing the program.
Figure~\ref{fig-dep} shows the dependences that may normally be
present in a DO an a FORALL.
An arrow from a left-hand-side node (for example, ``lhsa(1)'') 
to a right-hand-side node (e.g. ``rhsb(1)'') means that the RHS
computation may use values assigned in the LHS nodel; thus the
right-hand side must be computed after the left-hand side completes
its store.
Similarly, an arrow from a RHS node to a LHS node means that the LHS
may overwrite a value needed by the RHS computation, again forcing an
ordering.
Edges from the ``BEGIN'' and to the ``END'' nodes represent control
dependences.
The INDEPENDENT directive asserts that the only dependences that a
compiler need enforce are those in Figure~\ref{fig-indep}.
That is, the programmer who uses INDEPENDENT is certifying that if the
compiler only enforces these edges, then the resulting program will be
equivalent to the one in which all the edges are present.
Note that the set of asserted dependences is identical for INDEPENDENT
DO and FORALL constructs.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Here come the pictures!
%

{

%length for use in pictures
\setlength{\unitlength}{0.03in}

%nodes used in all pictures
\newsavebox{\nodes}
\savebox{\nodes}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(50.0,100){\makebox(0,0){BEGIN}}
    \put(20.0,80.0){\makebox(0,0){rhsa(1)}}
    \put(50.0,80.0){\makebox(0,0){rhsa(2)}}
    \put(80.0,80.0){\makebox(0,0){rhsa(3)}}
    \put(20.0,60.0){\makebox(0,0){lhsa(1)}}
    \put(50.0,60.0){\makebox(0,0){lhsa(2)}}
    \put(80.0,60.0){\makebox(0,0){lhsa(3)}}
    \put(20.0,40.0){\makebox(0,0){rhsb(1)}}
    \put(50.0,40.0){\makebox(0,0){rhsb(2)}}
    \put(80.0,40.0){\makebox(0,0){rhsb(3)}}
    \put(20.0,20.0){\makebox(0,0){lhsb(1)}}
    \put(50.0,20.0){\makebox(0,0){lhsb(2)}}
    \put(80.0,20.0){\makebox(0,0){lhsb(3)}}
    \put(50.0,0){\makebox(0,0){END}}
    \put(50.0,100){\oval(25,5)}
    \put(20.0,80.0){\oval(20,5)}
    \put(50.0,80.0){\oval(20,5)}
    \put(80.0,80.0){\oval(20,5)}
    \put(20.0,60.0){\oval(20,5)}
    \put(50.0,60.0){\oval(20,5)}
    \put(80.0,60.0){\oval(20,5)}
    \put(20.0,40.0){\oval(20,5)}
    \put(50.0,40.0){\oval(20,5)}
    \put(80.0,40.0){\oval(20,5)}
    \put(20.0,20.0){\oval(20,5)}
    \put(50.0,20.0){\oval(20,5)}
    \put(80.0,20.0){\oval(20,5)}
    \put(50.0,0){\oval(25,5)}
    \put(50,97.5){\vector(-2,-1){30}}
    \put(50,97.5){\vector(0,-1){15}}
    \put(50,97.5){\vector(2,-1){30}}
    \put(20,17.5){\vector(2,-1){30}}
    \put(50,17.5){\vector(0,-1){15}}
    \put(80,17.5){\vector(-2,-1){30}}
    \end{picture}
}

\begin{figure}

\begin{minipage}{2.70in}
\CODE
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\web}
\savebox{\web}{
    \begin{picture}(60,15)(0,0)
    \put(0,15){\vector(0,-1){15}}
    \put(0,15){\vector(2,-1){30}}
    \put(0,15){\vector(4,-1){60}}
    \put(30,15){\vector(-2,-1){30}}
    \put(30,15){\vector(0,-1){15}}
    \put(30,15){\vector(2,-1){30}}
    \put(60,15){\vector(0,-1){15}}
    \put(60,15){\vector(-2,-1){30}}
    \put(60,15){\vector(-4,-1){60}}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,62.5){\usebox\web}
\put(20,42.5){\usebox\web}
\put(20,22.5){\usebox\web}
\end{picture}
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\chain}
\savebox{\chain}{
    \begin{picture}(20,70)(0,0)
    \put(2.5,2.5){\oval(5,5)[bl]}
    \put(2.5,0){\vector(1,0){5}}
    \put(7.5,2.5){\oval(5,5)[br]}
    \put(10,2.5){\vector(0,1){32.5}}
    \put(10,35){\line(0,1){32.5}}
    \put(12.5,67.5){\oval(5,5)[tl]}
    \put(12.5,70){\vector(1,0){5}}
    \put(17.5,67.5){\oval(5,5)[tr]}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,77.5){\vector(0,-1){15}}
\put(20,57.5){\vector(0,-1){15}}
\put(20,37.5){\vector(0,-1){15}}
\put(25,15){\usebox\chain}
\put(50,77.5){\vector(0,-1){15}}
\put(50,57.5){\vector(0,-1){15}}
\put(50,37.5){\vector(0,-1){15}}
\put(55,15){\usebox\chain}
\put(80,77.5){\vector(0,-1){15}}
\put(80,57.5){\vector(0,-1){15}}
\put(80,37.5){\vector(0,-1){15}}
\end{picture}
\end{minipage}

\caption{Dependences in DO and FORALL without
INDEPENDENT assertions}
\label{fig-dep}
\end{figure}

\begin{figure}

%Draw the picture once, use it twice
\newsavebox{\easy}
\savebox{\easy}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(10,-2.5){\usebox\nodes}
    \put(20,77.5){\vector(0,-1){15}}
    \put(20,57.5){\vector(0,-1){15}}
    \put(20,37.5){\vector(0,-1){15}}
    \put(50,77.5){\vector(0,-1){15}}
    \put(50,57.5){\vector(0,-1){15}}
    \put(50,37.5){\vector(0,-1){15}}
    \put(80,77.5){\vector(0,-1){15}}
    \put(80,57.5){\vector(0,-1){15}}
    \put(80,37.5){\vector(0,-1){15}}
    \end{picture}
}

\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\usebox\easy
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\usebox\easy
\end{minipage}

\caption{Dependences in DO and FORALL with
INDEPENDENT assertions}
\label{fig-indep}
\end{figure}

}

%
%
% End of pictures
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


The compiler is justified in producing
a warning if it can prove that one of these assertions is incorrect.
It is not required to do so, however.
A program containing any false assertion of this type is not
standard-conforming, and the compiler may take any action it deems necessary.


This directive is of course similar to the DOSHARED directive
of Cray MPP Fortran.  A different name is offered here to avoid
even the hint of commitment to execution by a shared memory machine.
Also, the "mechanism" syntax is omitted here, though we might want
to adopt it as further advice to the compiler about appropriate
implementation strategies, if we can agree on a desirable set
of options.


\section{Other Proposals}

The following are proposals made for modification or replacement of the 
above sections.

\subsection{ALLOCATE in FORALL}

\label{forall-allocate}

\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation.
At the September 10-11 meeting, this was not included as part of the
FORALL because it seemed too big a leap from the allowed assignment
statements.}
Proposal:  ALLOCATE, DEALLOCATE, and NULLIFY statements may appear
	in the body of a FORALL.

Rationale: these are just another kind of assignment.  They may have
	a kind of side effect (storage management), but it is a
	benign side effect (even milder than random number generation).

Example:
                                                            \CODE
      TYPE SCREEN
        INTEGER, POINTER :: P(:,:)
      END TYPE SCREEN
      TYPE(SCREEN) :: S(N)
      INTEGER IERR(N)
      ...
!  Lots of arrays with different aspect ratios
      FORALL (J=1:N)  ALLOCATE(S(J)%P(J,N/J),STAT=IERR(J))
      IF(ANY(IERR)) GO TO 99999
                                                            \EDOC

\subsection{Generalized Data References}

\label{data-ref}

\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation.
This was not acted on at the September 10-11 meeting because the
FORALL subgroup wanted to minimize changes to the Fortran~90 standard.}
Proposal:  Delete the constraint in section 6.1.2 of the Fortran 90
	standard (page 63, lines 7 and 8):
\begin{quote}
	Constraint: In a data-ref, there must not be more than one
		part-ref with nonzero rank.  A part-name to the right
		of a part-ref with nonzero rank must not have the
		POINTER attribute.
\end{quote}

Rationale: further opportunities for parallelism.

Example:
                                                                     \CODE
      TYPE(MONARCH) :: C(N), W(N)
      ...
C  Munch that butterfly
      C = C + W * A%P		!Currently illegal in Fortran 90
                                                                      \EDOC


\subsection{FORALL with INDEPENDENT Directives}
\label{begin-independent}

\footnote{Version of July 21, 1992) - Min-You Wu.
This was rejected at the FORALL subgroup meeting on September 9, 1992,
because it only offered syntactic sugar for capabilities already in
the FORALL INDEPENDENT.  It was also suggested that the BEGIN
INDEPENDENT syntax
should be reserved for other uses, such as MIMD features.}
This proposal is an extension of Guy Steele's INDEPENDENT proposal.
We propose a block FORALL with the directives for independent 
execution of statements.  The INDEPENDENT directives are used
in a block style.  

The block FORALL is in the form of
                                                         \CODE
      FORALL (...) [ON (...)]
        a block of statements
      END FORALL
                                                         \EDOC
where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:
                                                         \CODE
!HPF$BEGIN INDEPENDENT
!HPF$END INDEPENDENT
                                                         \EDOC
The two directives must be used in pair.  
A sub-block of statements 
parenthesized in the two directives is called an {\em asynchronous} 
sub-block or {\em independent} sub-block.  
The statements that are 
not in an asynchronous sub-block are in {\em synchronized} sub-blocks
or {\em non-independent} sub-block.  
The synchronized sub-block is 
the same as Guy Steele's synchronized FORALL statement, and the 
asynchronous sub-block is the same as the FORALL with the INDEPENDENT 
directive.  
Thus, the block FORALL
                                                          \CODE
      FORALL (e)
        b1
!HPF$BEGIN INDEPENDENT
        b2
!HPF$END INDEPENDENT
        b3
      END FORALL
                                                           \EDOC
means roughly the same as
                                                           \CODE
      FORALL (e)
        b1
      END FORALL
!HPF$INDEPENDENT
      FORALL (e)
        b2
      END FORALL
      FORALL (e)
        b3
      END FORALL
                                                          \EDOC
														  
Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronous sub-block are completely independent.
The INDEPENDENT directives indicates to the compiler there is no 
dependence and consequently, synchronizations are not necessary.
It is users' responsibility to ensure there is no dependence
between instances in an asynchronous sub-block.
A compiler can do dependence analysis for the asynchronous sub-blocks
and issue an error message when there exists a dependence or a warning
when it finds a possible dependence.

\subsubsection{What does ``no dependence between instances" mean?}

It means that no true dependence, anti-dependence,
or output dependence between instances.
Examples of these dependences are shown below:
\begin{enumerate}
\item True dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i) = ... 
        ...  = x(i+1)
      END FORALL
                                                            \EDOC
Notice that dependences in FORALL are different from that in a DO loop.
If the above example was a DO loop, that would be an anti-dependence.

\item Anti-dependence:
                                                            \CODE
      FORALL (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END FORALL
                                                            \EDOC

\item Output dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i+1) = ... 
        x(i) = ...
      END FORALL
                                                            \EDOC
\end{enumerate}

Independent does not imply no communication.  One instance may access 
data in the other instances, as long as it does not cause a dependence.  
The following example is an independent block:
                                                            \CODE
      FORALL (i = 1:N)
!HPF$BEGIN INDEPENDENT
        x(i) = a(i-1)
        y(i-1) = a(i+1)
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC

\subsubsection{Statements that can appear in FORALL}

FORALL statements, WHERE-ELSEWHERE statements, some intrinsic functions 
(and possibly elemental functions and subroutines) can appear in the
FORALL:
\begin{enumerate}
\item FORALL statement
                                                            \CODE
      FORALL (I = 1 : N)
        A(I,0) = A(I-1,0)
        FORALL (J = 1 : N)
!HPF$BEGIN INDEPENDENT
          A(I,J) = A(I,0) + B(I-1,J-1)
          C(I,J) = A(I,J)
!HPF$END INDEPENDENT
        END FORALL
      END FORALL
                                                            \EDOC

\item WHERE
                                                            \CODE
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
\end{enumerate}


\subsubsection{Rationale}

\begin{enumerate}
\item A FORALL with a single asynchronous sub-block as shown below is 
the same as a do independent (or doall, or doeach, or parallel do, etc.).
                                                            \CODE
      FORALL (e)
!HPF$BEGIN INDEPENDENT
        b1
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
A FORALL without any INDEPENDENT directive is the same as a tightly 
synchronized FORALL.  We only need to define one type of parallel 
constructs including both synchronized and asynchronous blocks.  
Furthermore, combining asynchronous and synchronized FORALLs, we 
have a loosely synchronized FORALL which is more flexible for many 
loosely synchronous applications.

\item With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  The INDEPENDENT directives can act 
as barrier synchronizations.  One may suggest a smart compiler 
that can recognize dependences and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependences.  
When the compiler cannot determine whether there is a dependence, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.
\end{enumerate}


\subsection{A Proposal for MIMD Support in HPF}

\label{mimd-support}
	          

\subsubsection{Abstract}

\footnote{Version of July 18, 1992 - Clemens-August Thole, GMD I1.T.
Although the the FORALL subgroup discussed this proposal at the
meeting on September 9, 1992, the feeling was that it would be better
to wait for the second round of HPF before pursuing these features.}
This proposal tries to supply sufficient language support in order 
to deal with loosely sysnchronous programs, some of which have been 
identified in my "A vote for explicit MIMD support".
This is a proposal for the support of MIMD parallelism, which extends
Section~\ref{do-independent}. 
It is more oriented
towards the CRAY - MPP Fortran Programming Model and the PCF proposal. 
The fine-grain synchronization of PCF is not proposed for implementation.
Instead of the CRAY-mechanims for assigning work to processors an 
extension of the ON-clause is used.
Due to the lack of fine-grain synchronization the constructs can be
executed
on SIMD or sequential architectures just by ignoring the additional
information.


\subsubsection{Summary of the current situation of MIMD support as part of
HPF}

According to the Charles Koelbel's (Rice) mail dated March 20th "Working
Group 4 -
Issues for discussion" MIMD-support is a topic for discussion within
working
group 4. 

Dave Loveman (DEC) has produced a document on FORALL statements 
(inorporated in Sections~\ref{forall-stmt} and \ref{forall-construct})
which
summarizes the discussion. Marc Snir proposed some extensions. These
constructs allow to describe SIMD extensions in an extended way compared
to array assignments. 

A topic for working papers is the interface of HPF Fortran to program units
which execute in SPMD mode. Proposals for "Local Subroutines" have been
made
by Marc Snir and Guy Steele
(Chapter~\ref{foreign}). Both proposals
define local subroutines as program units, which are executed by all
processors independent of each other. Each processor has only access
to the data contained in its local memory. Parts of distributed data
objects
can be accessed and updated by calls to a special library. Any
message-passing
library might be used for synchronization and communication.
This approach does not really integrate MIMD-support into HPF programming.

The MPP Fortran proposal by Douglas M. Pase, Tom MacDonald, Andrew Meltzer
(CRAY)
contained the following features in order to support integrated MIMD
features:
\begin{itemize}
   \item  parallel directive
   \item  shared loops 
   \item  private variables
   \item  barrier synchronization
   \item  no-barrier directive for removing synchronization
   \item  locks, events, critical sections and atomic update
   \item  functions, to examine the mapping of data objects.
\end{itemize}

Steele's "Proposal for loops in HPF" (02.04.92) included a proposal for a 
directive "!HPF$ INDEPENDENT( integer_variable_list)", which specifies
for the next set of nested loops, that the loops with the specified
loop variables can be executed independent from each other.
(Sectin~\ref{do-independent} is a short version of this proposal.) 

Charles Koelbel gave an overview on different styles for parallel loops
in "Parallel Loops Position Paper". No specific proposal was made.

Min-You Wu "Proposal for FORALL, May 1992" extended Guy Steele's 
"!HPF$ INDEPENDENT" proposal to use the directive in a block style.

Clemens-August Thole "A vote for explicit MIMD support" contains 3 examples
from different application areas, which seem to require MIMD support for
efficient execution. 

\paragraph{Summary}

In contrast to FORALL extensions MIMD support is currently not
well-established
as part of HPF Fortran. The examples in "A vote for explicit MIMD support"
show clearly the need for such features. Local subroutines do not fulfill
the requirements because they force to use a distributed memory programming
model,
which should not be necessary in most cases.

With the exception of parallel sections all interesting features
are contained in the MPP-proposal. I would like to split the discussion
on specifying parallelism, synchronization and mapping into three different
topics. Furthermore I would like to see corresponding features to be
expessed
in the style of of the current X3H5 proposal, if possible, in order to
be in line with upcoming standards.


\subsubsection{Proposal for MIMD support}

In order to support the spezification of MIMD-type of parallelism the
following
features are taken from the "Fortran 77 Binding of X3H5 Model for 
Parallel Programming Constructs": 
\begin{itemize}
    \item   PARALLEL DO construct/directive
    \item   PARALLEL SECTIONS worksharing construct/directive
    \item   NEW statement/directive
\end{itemize}

These constructs are not used with PCF like options for mapping or 
sysnchronisation but are combined with the ON clause for mapping operations
onto the parallel architecture. 

\paragraph{PARALLEL DO}

\subparagraph{Explicit Syntax}

The PARALLEL DO construct specifies parallelism among the 
iterations of a block of code. The PARALLEL DO construct has the same
syntax as a DO statement. For a directive approach the directive
!HPF$ PARALLEL can be used in front of a do statement.
After the PARALLEL DO statement a new-declaration may be inserted.

A PARALLEL DO construct might be nested with other parallel constructs. 

\subparagraph{Interpretation}

The PARALLEL DO is used to specify parallel execution of the iterations of
a block of code. Each iteration of a PARALLEL DO is an independent unit
of work. The iterations of PARALLEL DO must be data independent. Iterations
are data independent if the storage sequence accociated with each variable
are array element that is assigned a value by each iteration is not
referenced
by any other iteration. 

A program is not HPF conforming, if for any iteration a statement is
executed,
which causes a transfer of control out of the block defined by the PARALLEL
DO construct. 

The value of the loop index of a PARALLEL DO is undefined outside the scope
of the PARALLEL DO construct. 


\paragraph{PARALLEL SECTIONS}

The parallel sections construct is used to specify parallelism among
sections
of code.

\subparagraph{Explicit Syntax}


                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        !HPF$ SECTION
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
structured as
                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        [new-declaration-stmt-list]
        [section-block]
        [section-block-list]
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
where [section-block] is
                                                              \CODE
        !HPF$ SECTION
        [execution-part]
                                                              \EDOC

\subparagraph{Interpretation}

The parallel sections construct is used to specify parallelism among
sections
of code. Each section of the code is an independent unit of work. A program
is not standard conforming if during the execution of any parallel sections
construct a transfer of control out of the blocks defined by the Parallel
Sections construct is performed. 
In a standard conforming program the sections of code shall be data 
independent. Sections are data independent if the storage sequence
accociated 
with each variable are array element that is assigned a value by each
section
is not referenced by any other section. 


\paragraph{Data scoping}

Data objects, which are local to a subroutine, are different between 
distinct units of work, even if the execute the same subroutine.


\paragraph{NEW statement/directive}

The NEW statement/directive allows the user to generate new instances of 
objects with the same name as an object, which can currently be referenced.


\subparagraph{Explicit Syntax}

A [new-declaration-stmt] is
                                                                \CODE
       !HPF$ NEW variable-name-list
                                                                \EDOC

\subparagraph{Coding rules}

A [varable-name] shall not be
\begin{itemize} 
\item    the name of an assumed size array, dummy argument, common block, 
function or entry point
\item    of type character with an assumed length
\item    specified in a SAVE of DATA statement
\item    associated with any object that is shared for this parallel
construct.
\end{itemize}

\subparagraph{Interpretation}
 
Listing a variable on a NEW statement causes the object to be explicitly
private for the parallel construct. For each unit of work of the parallel 
construct a new instance of the object is created and referenced with the
specific name. 

\end{document}

From henk@ph.tn.tudelft.nl  Fri Oct  2 11:30:52 1992
Received: from ph.tn.tudelft.nl by cs.rice.edu (AA26488); Fri, 2 Oct 92 11:30:52 CDT
Received: from [192.31.126.72] (hsmac.ph.tn.tudelft.nl) by ph.tn.tudelft.nl (4.1/HB-1.18)
	id AA22326; Fri, 2 Oct 92 17:29:07 +0100
Message-Id: <9210021629.AA22326@ph.tn.tudelft.nl>
Date: Fri, 2 Oct 1992 17:31:52 +0100
To: Tin-Fook Ngai <ngai@hpltfn.hpl.hp.com>
From: henk@ph.tn.tudelft.nl
Subject: Re: A proposal for EXECUTE-ON directive
Cc: hpff-forall@cs.rice.edu, hpff-distribute@cs.rice.edu

 
>I have sent the following proposal to the FORALL subgroup. Since I 
>strongly believe that the proposed (or alike) feature should be in HPF, I 
>wish you will seriously consider it for inclusion in the first official 
>draft.
>
>
>Tin-Fook
>

I'm not sure that the FORALL group is the right forum for your proposal.
The EXECUTE-ON is a more eleborate form of the "ON HOME xx" type of
annotations as in FORTRAN-D. In fact it is a form of *computation
annotation* as opposite of *data annotation*. As far as I know only the
Booster language is completely orthogonal in this sense. FORTRAN-D only
allows a limited form within a forall. 

As the proposal also relies on templates, I think the distribute-subgroup
is the one who should discuss this. 


There are a few remarks on the proposal: 

1. The execution model of HPFF (not completely approved yet) states: 

-: The code compiled by an HPF compiler ought do no worse than code
compiled using the owner compute rule.

This is more relaxed than saying "it uses the owner compute rule". Your
EXECUTE ON is much more specific towards fixing execution on a specified
processor. 
 

2. A template is not executed, so one can't say EXECUTE x ON T. Something
like 
EXECUTE_ON_HOME T should be adopted (Since templates are currently no
objects one could even deny this)


3. Adopting EXECUTE x ON on DO loops without any indepence requirements, as

your proposal seems to allow, can yield all kind of intricate
synchronization schemes, when iterations are not independent (or must be
assumed to be dependent). This seems to go further than the first simple
step, which HPFF ought to be.


4. Binding iterations to templates can currently only be done statically,
since 
the current draft does not allow dynamic templates. So iterations
boundaries must be known at compile time. One has to apply the subroutine
trick to allow this, which is not very neat.


5. Allowing EXECUTE x ON on groups of statements, gives a scoping issue, so
there should also be something like END EXECUTE x ON, do undo the
annotation. 


6. Again we have complicated scoping problems. How about this example:


Template T1(N), T2(N)
DO I=1,N
   EXECUTE (I) ON T1(I)
   C(I) = D(I)
  DO J=1,N
    EXECUTE (J) ON T2(J)
    A(I,J) =  A(I,J) + B(I,J) 

This example satisfies the constraint only if by entering the J-loop, the
I-index is dereferenced from the assertion just after the I-loop. Although
logical, it might be confusing to users. However, in the program

Template T1(N), T2(N)
DO I=1,N
   EXECUTE (I) ON T1(I)
   C(I) = D(I)
  DO J=1,N
    A(I,J) =  A(I,J) + B(I,J) 

there is no such dereferencing. 
Now if the J-loop is embedded in a subroutine call, we are in trouble:

Template T1(N), T2(N)
DO I=1,N
   EXECUTE (I) ON T1(I)
   C(I) = D(I)
  DO J=1,N
    CALL FOO(A(I,J),B(I,J))
     
SUBROUTINE FOO(X,Y)
X = X + Y

We cannot compile this subroutine independently. So we could forbid
subroutine calls.

The question then is what kind of restrictions do we have to impose and how
many are there ?


7. The wrap feature of templates will probably be deleted from the draft.
The same thing (shifting data each iteration) can reached by using CSHIFT
or the subroutine trick and making the template as large as the ieteration
space.


With these remarks I hope to have shown that there are many strings
attached to your proposal.


- henk
 

================================================================
Henk J. Sips

Delft University of Technology
Lorentzweg 1
2628 CJ  Delft, the Netherlands
tel: +31.15.783191                     fax: +31.15.626740
e-mail: henk@ph.tn.tudelft.nl


From ngai@hpltfn.hpl.hp.com  Fri Oct  2 17:47:14 1992
Received: from hplms2.hpl.hp.com by cs.rice.edu (AA06972); Fri, 2 Oct 92 17:47:14 CDT
Received: from hpltfn.hpl.hp.com by hplms2.hpl.hp.com with SMTP
	(16.5/15.5+IOS 3.20) id AA02475; Fri, 2 Oct 92 15:47:11 -0700
Received: by hpltfn.hpl.hp.com
	(16.6/15.5+IOS 3.14) id AA07009; Fri, 2 Oct 92 15:47:05 -0700
Date: Fri, 2 Oct 92 15:47:05 -0700
From: Tin-Fook Ngai <ngai@hpltfn.hpl.hp.com>
Message-Id: <9210022247.AA07009@hpltfn.hpl.hp.com>
To: henk@ph.tn.tudelft.nl
Cc: hpff-forall@cs.rice.edu, hpff-distribute@cs.rice.edu
In-Reply-To: henk@ph.tn.tudelft.nl's message of Fri, 2 Oct 1992 17:31:52 +0100 <9210021629.AA22326@ph.tn.tudelft.nl>
Subject: A proposal for EXECUTE-ON directive


Thanks for the comments.  I like to see more discussions and to find a
better solution to the proposed or alike feature.

You wrote:

> 1. The execution model of HPFF (not completely approved yet) states: 

> -: The code compiled by an HPF compiler ought do no worse than code
> compiled using the owner compute rule.

> This is more relaxed than saying "it uses the owner compute rule". Your
> EXECUTE ON is much more specific towards fixing execution on a specified
> processor. 
>  

?? (The EXECUTE ON is a directive.)

> 2. A template is not executed, so one can't say EXECUTE x ON T. Something
> like 
> EXECUTE_ON_HOME T should be adopted (Since templates are currently no
> objects one could even deny this)

Agree.  I never feel comfortable with the key words I used.  I don't have
objection to "EXECUTE x ON_HOME T".  Any other suggestions are also
welcome.


> 3. Adopting EXECUTE x ON on DO loops without any indepence requirements, as

> your proposal seems to allow, can yield all kind of intricate
> synchronization schemes, when iterations are not independent (or must be
> assumed to be dependent). This seems to go further than the first simple
> step, which HPFF ought to be.

Clearly, the proposed feature is primarily intended for INDEPENDENT DO,
FORALL and other parallel indexed assignments.  Before making the
proposal, I have also thought about ordinary DO loops as you pointed out
here. Code generation seems not a problem: If the user choose to specify
execution location of an iteration of an ordinary DO loops, simple
compilation requires only one synchronication at the end of each iteration
to ensure the DO sequential semantics. This naive compilation looks dumb
but the user may still gain due to the already data distribution.  (A
smarter compiler of course can do a better job but definitely is not
required.)  That is why I don't restrict EXECUTE ON to INDEPENDENT DOs and
make the rule simpler.


> 4. Binding iterations to templates can currently only be done statically,
> since 
> the current draft does not allow dynamic templates. So iterations
> boundaries must be known at compile time. One has to apply the subroutine
> trick to allow this, which is not very neat.

That is the intention of the proposal:  Only static binding is allowed.
Even the loop index is bounded by runtime variable, the binding to
template node is still static.


> 5. Allowing EXECUTE x ON on groups of statements, gives a scoping issue, so
> there should also be something like END EXECUTE x ON, do undo the
> annotation. 

The current proposal seems sufficient in this issue.  The scope for single
statement (FORALL statement, single statement in FORALL construct, and array
assignment statement) is clear.  For groups of statements, EXECUTE ON can
only applies to either the entire body within a FORALL construct or the
entire iteration of a DO loop.


> 6. Again we have complicated scoping problems. How about this example:

> Template T1(N), T2(N)
> DO I=1,N
>    EXECUTE (I) ON T1(I)
>    C(I) = D(I)
>   DO J=1,N
>     EXECUTE (J) ON T2(J)
>     A(I,J) =  A(I,J) + B(I,J) 

> This example satisfies the constraint only if by entering the J-loop, the
> I-index is dereferenced from the assertion just after the I-loop. Although
> logical, it might be confusing to users. However, in the program

> Template T1(N), T2(N)
> DO I=1,N
>    EXECUTE (I) ON T1(I)
>    C(I) = D(I)
>   DO J=1,N
>     A(I,J) =  A(I,J) + B(I,J) 

> there is no such dereferencing. 

Good examples.  This bug surely needs to be fixed.  Here is my solution:

- For nested EXECUTE ON directives, only the immediate enclosed EXECUTE ON
  directive is effective.

In the former example, the statement "C(I) = D(I)" will be executed on the
home of T1(I) while the statement "A(I,J) =  A(I,J) + B(I,J)" will be
executed on home of T2(J) for all I.  In the latter case, the entire
I-loop body that includes the DO J loop is executed on the home of T1(I).


> Now if the J-loop is embedded in a subroutine call, we are in trouble:

> Template T1(N), T2(N)
> DO I=1,N
>    EXECUTE (I) ON T1(I)
>    C(I) = D(I)
>   DO J=1,N
>     CALL FOO(A(I,J),B(I,J))
>      
> SUBROUTINE FOO(X,Y)
> X = X + Y

> We cannot compile this subroutine independently. So we could forbid
> subroutine calls.

Why can't FOO be compile independently?  The EXECUTE ON directive means
that FOO is called on the home of T1(I).  


> The question then is what kind of restrictions do we have to impose and how
> many are there ?

Hope there doesn't need many restrictions for this important feature.


> 7. The wrap feature of templates will probably be deleted from the draft.
> The same thing (shifting data each iteration) can reached by using CSHIFT
> or the subroutine trick and making the template as large as the ieteration
> space.

I and Wolfe discussed the example (Example 6 in the proposal) long before
our revision on the wrap feature.  Sorry for any confusion from this
example.  (However, this example also illustrates the use of wrap in data
distribution -- we should come up with a cleaner solution next meeting.)


> With these remarks I hope to have shown that there are many strings
> attached to your proposal.

> - henk
>  

Appreciate. 


Tin-Fook


From henk@ph.tn.tudelft.nl  Mon Oct  5 05:31:06 1992
Received: from ph.tn.tudelft.nl by cs.rice.edu (AA29235); Mon, 5 Oct 92 05:31:06 CDT
Received: from [192.31.126.72] (hsmac.ph.tn.tudelft.nl) by ph.tn.tudelft.nl (4.1/HB-1.18)
	id AA14672; Mon, 5 Oct 92 11:29:24 +0100
Message-Id: <9210051029.AA14672@ph.tn.tudelft.nl>
Date: Mon, 5 Oct 1992 11:32:18 +0100
To: Tin-Fook Ngai <ngai@hpltfn.hpl.hp.com>
From: henk@ph.tn.tudelft.nl
Subject: Re: A proposal for EXECUTE-ON directive
Cc: hpff-forall@cs.rice.edu, hpff-distribute@cs.rice.edu

>
>> We cannot compile this subroutine independently. So we could forbid
>> subroutine calls.
>
>Why can't FOO be compile independently?  The EXECUTE ON directive means
>that FOO is called on the home of T1(I).  
>
>


Sorry, it was friday-afternoon. The example I was thinking of is:


Template T1(N)
  DO I=1,N
    EXECUTE (I) ON T1(I)
    C(I) = D(I)
   DO J=1,N
     A(I,J) = B(I,J)

Here A(I,J) is calculated in T(I). If we encapsulate the J-loop into a
subroutine we get something like:


Template T1(N)
  DO I=1,N
    EXECUTE (I) ON T1(I)
    C(I) = D(I)
    CALL FOO(A(I,:),B(I,:))
     
 SUBROUTINE FOO(AA,BB)
 ALIGN AA,BB with *
    DO J=1,N
    AA(J) = BB(J)

the dummy arguments AA and BB are aligned to the incoming array sections.
Normally, without any EXECUTE_ON, the subroutine would execute AA(J), which
is equivalent to A(I,J), on home A(I,J). However, with an EXECUTE_ON in the
main program there is no way the subroutine can know that AA(J) should be
executed on T(I). The consequence is that any subroutine call is to *undo*
the EXECUTE_ON on entry and *redo* the EXECUTE_ON on return.

- henk
  

================================================================
Henk J. Sips

Delft University of Technology
Lorentzweg 1
2628 CJ  Delft, the Netherlands
tel: +31.15.783191                     fax: +31.15.626740
e-mail: henk@ph.tn.tudelft.nl


From gmap11@f1ibmsv2.gmd.de  Tue Oct  6 09:06:09 1992
Received: from gmdzi.gmd.de by cs.rice.edu (AA24755); Tue, 6 Oct 92 09:06:09 CDT
Received: from f1ibmsv2.gmd.de (f1ibmsv2) by gmdzi.gmd.de with SMTP id AA24614
  (5.65c/IDA-1.4.4 for <hpff-forall@cs.rice.edu>); Tue, 6 Oct 1992 15:05:45 +0100
Received: by f1ibmsv2.gmd.de id AA32934; Tue, 6 Oct 92 15:05:51 GMT
Date: Tue, 6 Oct 92 15:05:51 GMT
From: Clemens-August.Thole@gmd.de (C.A. Thole)
Message-Id: <9210061505.AA32934@f1ibmsv2.gmd.de>
To: hpff-forall@cs.rice.edu
Subject: ON-Clause / Execute
Cc: gmap11@f1ibmsv2.gmd.de


I agree with Tin-Fook, that something like the on-clause should be 
contained in HPF. I brought a proposal with me to the last HPF meeting 
which was distributed by Chuck, but neither the FORALL working group
nor the plenary had time to discuss the proposal.

I enclosed a slightly modified version of the original proposal.
Tin-Fock's proposal and my proposal both use templates as reference
objects.
Tin-Focks proposal is restricted to DO constructs and indexed
parallel assignments while my proposal deals with any statement 
(in particular array assignment statements). 
Tin-Focks proposal has the optional LOCAL directive. 
This proposal introduces statement blocks for more compact 
spezification of on-clauses. 

I would appreciate comments on the different features.

Clemens


------------------------------------------------------------------

Proposal for a statement grouping syntax and ON clause

original:          September 9, 1992
slightly modified: October 5, 1992

Clemens-August Thole
GMD-I1.T
Schloss Birlinghoven
D-5205 Sankt Augustin 1

e-mail: thole@gmd.de


1. Introduction

This proposal introduces an extension to HPF to group several 
statements in order to be able to specify properties for a whole block
of statement at once. A block of statements is called HPF-section.
HPF-sections can be used to describe properties for independent execution
between blocks of statements aswell as the mapping of their execution.

For the spezification of a specific mapping of the execution of statements
or HPF-sections the ON-clause is introduced. A subset of a template is used
as reference object onto which the statements are mapped in an canonical
manner. The careful selection of the reference template allows to specify,
how the execution of the code is mapped onto the parallel architecture.


2. HPF-sections

The HPF directives SECTIONS, SECTION, and END SECTIONS are used to specify
grouping of statements. SECTIONS and END SECTIONS specify the beginning
and end of a list of HPF-sections and SECTION the beginning of the next 
HPF-section. The syntax is as follows:

	!HPF$ SECTION
		[HPF-section-list]
	!HPF$ END SECTIONS

where HPF-section is

	!HPF$ SECTION
		[execution-part]

Constraint: For any HPF-section under no circumstances a transfer of control
is performed during the execution of the code outside of its execution-part.

Example:
	
	!HPF$ SECTIONS
	!HPF$ SECTION
		A = A + B
		B = C + D
	!HPF$ SECTION
		E = B
		IF (E.GT.F) GOTO 10
			E = 0D0
	 10	CONTINUE
	!HPF$ END SECTIONS

This example specifies a list of two HPF-sections. The control statement in
the second HPF-section is valid because after the transfer of control the
execution continues in the same HPF-section.


2. ON-clause

The ON-clause specifies a subsection of a template, which is used as a reference
object for the execution of the next statement, construct, of HPF-section.
If the left-hand-side of an assignment coinsides in shape with the reference
object, the evaluation of the right-hand-side and the assignment for 
a specific element of the left-hand-side is performed at that processor, onto
which the corresponding element of the reference object is mapped.

Syntax:

Add the following rules:

[executable-construct] is

	!HPF$ ON [on-spec]
		[executable-construct]

and

[HPF-section] is

	!HPF$ ON [on-spec]
		[HPF-section]
	
where [on-spec] is

	[align-spec].

The [executable-construct] of [HPF-section] is called on-clause-target.

Constraints:

1. No [executable-construct] may be used as object of the on-clause, which
   generates any transfer of control out of the construct itself. This
   includes the entry-statement. 
2. [statement-block]s used in constructs must fulfill the constraints of
   HPF-sections.
3. The shape of the [on-spec] must cover in each dimension the shape of
   of any left-hand-side of an assignment statement, which is target of an
   on-clause. If a "*" is used in the [on-spec], this dimension is skipped
   for constructing the shape of the [on-spec].
4. If an on-clause is contained in the on-clause-target, the new [on-spec]
   must be a subsection of the [on-spec] of the outer on-clause.

Example:

		REAL, DIMENSION(n) :: a, b, c, d
	!HPF$	TEMPLATE grid(n)
	!HPF$	ALIGN WITH grid :: a, b, c, d

	!HPF$	ON grid(2:n)
		a(1:n-1) = a(2:n) + b(2:n) + c(2:n)

The on-clause indicates, that the evaluation of the right-hand-side is 
performed on that processors, which hold the data elements of the 
right-hand-side. For the assignment to the left-hand-side data movement is
necessary.

Interpretation:

The interpretation of the on-clause depends on the type of the on-clause-target.

If the on-clause-target is an assignment statement the [on-spec] is used to
determine where the assignment statement is executed. If the shape of the 
right-hand-side is identically to the shape of [on-spec], the computation for
a specific element of the assignment statement is performed where the 
corresponding element of the [on-spec] is mapped to. If the shape of the 
[on-spec] is larger, the compiler may use any sufficient larger subsection.
The use of "*" in the [on-spec] specifies, that the same computations are
mapped onto the corresponding line of processors and several processors
will do the same update. This may save communication operations.
The the case of the where-statement, the forall-statement, and the 
forall-construct the same mapping is applied to the evaluation of the 
conditions and each assignment.

If the on-clause is placed in front of the if-construct, that case-construct,
or the do-construct, the [on-spec] is used for the evaluations of the 
conditions as well as the loop bounds and the execution of the statement-blocks,
which are part of the construct. For the statement-blocks the interpretation 
rules for HPF-sections apply.

With respect to the allocate, deallocate, nullify, and I/O related statements
the [on-spec] is used for the evaluation of the parameters of the statements
and the evaluation of I/O objects. 

In the case of subroutine calls and functions the [on-spec] is used for the
evaluation of the parameters. It determines also the mapping of the resulting 
object. The [on-spec] determines also the set of processors, which will be
used for the evaluation of the subroutine. 

In the case of HPF-sections the on-clause is applied to each statement of the
execution part. Control transfer statements are allowed in this case and the 
constraints ensure, that the context on the same [on-spec] is not lost.

Additional example:

		REAL, DIMENSION(n,n) :: a, b, c, d
	!HPF$	TEMPLATE grid(n,n)
	!HPF$	ALIGN WITH grid :: a, b, c, d

	!HPF$	ON grid(2:n,2:n)
		DO i=2,n
	!HPF$		ON grid(i,2:n)
			DO j=2,n
	!HPF$			ON grid(i,j)
				a(i-1,j-1) = a(i,j) + b(i,j)*c(i,j)
			ENDDO
		ENDDO

Comment:

The compiler should be able to adjust the span of the loops to the local extend 
due to the restrictions on the specifiers of the sections of the [on-spec].

From ngai@hpltfn.hpl.hp.com  Wed Oct 14 14:47:59 1992
Received: from hplms2.hpl.hp.com by cs.rice.edu (AA05207); Wed, 14 Oct 92 14:47:59 CDT
Received: from hpltfn.hpl.hp.com by hplms2.hpl.hp.com with SMTP
	(16.5/15.5+IOS 3.20) id AA18449; Wed, 14 Oct 92 12:47:55 -0700
Received: by hpltfn.hpl.hp.com
	(16.6/15.5+IOS 3.14) id AA11644; Wed, 14 Oct 92 12:47:42 -0700
Date: Wed, 14 Oct 92 12:47:42 -0700
From: Tin-Fook Ngai <ngai@hpltfn.hpl.hp.com>
Message-Id: <9210141947.AA11644@hpltfn.hpl.hp.com>
To: chk@cs.rice.edu
Cc: hpff-forall@cs.rice.edu, hpff-core@cs.rice.edu
Subject: Revised proposal on EXECUTE-ON


Here is my revised proposal on the EXECUTE-ON-HOME directive.  (Thanks to
Hank for his comments.)  I will submit later a Latex version for inclusion
into the HPF Language Specification as required.

A remark regarding Clement's ON-Clause: I wrote my proposal after I read
Clement's proposal.  As pointed out by Clement, the application of my
EXECUTE-ON-HOME is more restricted than his ON-Clause.  But the
EXECUTE-ON-HOME is more specific and useful for those parallel executions
supported in the current HPF proposal. The EXECUTE-ON-HOME directive
allows explicit mapping of indexed executions to template nodes in a way
similar to the current HPF alignment, while Clement's ON-Clause relies on
the underlying shape to determine the mapping.  I believe the explicit
mapping in EXECUTE-ON-HOME is more general and useful.


Tin-Fook

------------------

==========
Changes:
1. Change keyword from EXECUTE ON to EXECUTE ON_HOME 
2. Allow nested EXECUTE-ON-HOME directives.  Only the immediately preceding
   directive is effective.
3. EXECUTE-ON-HOME directives have effect only on the caller of a subroutine
   call not on the called subroutine.  
4. Rewrite Example 6 to conform with current HPFF proposal that does not
   support automatic wrap-around mapping
==========


A PROPOSAL FOR EXECUTE-ON-HOME DIRECTIVE IN HPF 

Originated: September 14, 1992
Revised: October 14, 1992

Tin-Fook Ngai
Hewlett-Packard Laboratories
Email: ngai@hpl.hp.com


The proposed EXECUTE-ON-HOME directive is used to suggest where an
iteration of a DO construct or an indexed parallel assignment should be
executed.  The directive informs the compiler which data access should be
local and which data access may be remote.


SYNTAX

!HPF$ EXECUTE (subscript-list) ON_HOME align-spec [; LOCAL array-name-list]


CONSTRAINT

Each point in the index space must be executed on only one template
node.


USAGE AND SCOPE

1. The EXECUTE-ON-HOME directive must immediately precede the
corresponding DO loop body, array assignment, FORALL statement, FORALL
construct or individual assignment statement in a FORALL construct.

2. The scope of an EXECUTE-ON-HOME directive is the entire loop body of
the enclosing DO construct, or the following array assignment, FORALL
statement, FORALL construct or assignment statement in a FORALL construct.

3. EXECUTE-ON-HOME directives can be nested, but only the immediately
preceding EXECUTE-ON-HOME directive is effective.


INTERPRETATION

The subscript-list identifies a distinct iteration index or an indexed
parallel assignment.  The align-spec identifies a template node.  The
EXECUTE-ON-HOME directive suggests that the iteration or parallel
assignment should be executed on the processor to where the template node
is mapped.  When the EXECUTE-ON-HOME directive is applied to a subroutine
call, it affects only the execution location of the caller but not the
execution location of the called subroutine.

The optional LOCAL directive informs the compiler that all data accesses
to the specified array-name-list can be handled as local data accesses if
the related HPF data mapping directives are honored.


EXAMPLES

Example 1

      REAL A(N), B(N)
!HPF$ TEMPLATE T(N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ DISTRIBUTE T(CYCLIC(2))

!HPF$ INDEPENDENT            
      DO I = 1, N/2 
!HPF$ EXECUTE (I) ON_HOME T(2*I); LOCAL A, B, C
      ! we know that P(2*I-1) and P(2*I) is a permutation of 2*I-1 and 2*I
        A(P(2*I - 1)) = B(2*I - 1) + C(2*I - 1)    
        A(P(2*I)) = B(2*I) + C(2*I)
      END DO


Example 2

      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ EXECUTE (I,J) ON_HOME T(I+1,J-1)
      FORALL (I=1:N-1, J=2:N)   A(I,J) = A(I+1,J-1) + B(I+1,J-1)


Example 3

      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ EXECUTE (I,J) ON_HOME T(I,J)   ! applies to the entire FORALL construct
      FORALL (I=1:N-1, J=2:N) 
	A(I,J) = A(I+1,J-1) + B(I+1,J-1)
	B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL


Example 4

      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
      FORALL (I=1:N-1, J=2:N) 
!HPF$ EXECUTE (I,J) ON_HOME T(I,J)  ! applies only to the following assignment
	A(I,J) = A(I+1,J-1) + B(I+1,J-1)
	B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
   

Example 5 

      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

!HPF$ EXECUTE (I,J) ON_HOME T(I+1,J-1)
      A(1:N-1,2:N) = A(2:N,1:N-1) + B(2:N,1:N-1)

   
Example 6 
 
      !* Original program due to Michael Wolfe of Oregan Graduate Institute

      !* This program performs matrix multiplication C = A x B
      !* In each step, array B is rotated by row-blocks, multiplied
      !* diagonal-block-wise in parallel with A, results are accumulated in C 

      !* Note: without the EXECUTE-ON-HOME and LOCAL directive, the compiler
      !* will have a hard time to figure out all A, B and C accesses are 
      !* actual local, thus unable to generate the best efficient code 
      !* (i.e. communication-free and no runtime checking in the parallel 
      !* loop body).
 
      REAL A(N,N), B(N,N), C(N,N)

      PARAMETER(NOP = NUMBER_OF_PROCESSORS())
!HPF$ REALIGNABLE B
!HPF$ TEMPLATE T(2*N,N)                 !* to allow wrap around mapping
!HPF$ ALIGN (I,J) WITH T(I,J):: A, C      
!HPF$ ALIGN B(I,J) WITH T(N+I,J)
!HPF$ DISTRIBUTE T(CYCLIC(N/NOP),*)     !* A,B,C are distributed by row blocks

      IB = N/NOP

      DO IT = 0, NOP-1

!HPF$ REALIGN B(I,J) WITH T(N-IT*IB+I,J)  !* in effect, rotate by row-blocks

!HPF$ INDEPENDENT                       !* data parallel loop
        DO IP = 0, NOP-1     
!HPF$ EXECUTE (IP) ON_HOME T(IP*IB+1,1); LOCAL A, B, C
          ITP = MOD( IT+IP, NOP )

          DO I = 1, IB
            DO J = 1, N
              DO K = 1, IB
                C(IP*IB+I,J) = C(IP*IB+I,J) + A(IP*IB+I,ITP*IB+K)*B(ITP*IB+K,J)
              ENDDO  !* K
            ENDDO  !* J
          ENDDO  !* I
        ENDDO  !* IP

      ENDDO  !* IT


END OF PROPOSAL

------------------------------------------------------------------------------


From chk@erato.cs.rice.edu  Wed Oct 14 15:56:21 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA07745); Wed, 14 Oct 92 15:56:21 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA14106); Wed, 14 Oct 92 15:56:09 CDT
Message-Id: <9210142056.AA14106@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Word-Of-The-Day: nugatory : (adj) having little or no consequence
Subject: new draft proposal
Date: Wed, 14 Oct 92 15:56:07 -0500
From: chk@erato.cs.rice.edu

Below is the latest and greatest version of the HPF FORALL chapter.
Unless major errors or new proposals come up before the weekend (and I
have reason to believe they will), this will be the version presented
at the HPFF meeting next week.  If you have comments, please send them
to the group before next Tuesday so the meeting attendees have a
chance of hearing them.

Changes from the last draft:
1. Substantially rewritten PURE functions - thanks/blame goes to John
Merlin
2. Included ON clause proposals from Tin-Fook Ngai and Clemens Thole.
3. Included nested WHERE proposal from Guy Steele.
4. Minor reorganization of "Other Proposals"

						Chuck

---------------- cut here -------------

%chapter-head.tex

%Version of August 5, 1992 - David Loveman, Digital Equipment Corporation

\documentstyle[twoside,11pt]{report}
\pagestyle{headings}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin=.25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin=-.5in
\textwidth=6.0in
\textheight=9.0in
\parindent=2em

%the file syntax-macs.tex is physically included below

%syntax-macs.tex

%Version of July 29, 1992 - Guy Steele, Thinking Machines

\newdimen\bnfalign         \bnfalign=2in
\newdimen\bnfopwidth       \bnfopwidth=.3in
\newdimen\bnfindent        \bnfindent=.2in
\newdimen\bnfsep           \bnfsep=6pt
\newdimen\bnfmargin        \bnfmargin=0.5in
\newdimen\codemargin       \codemargin=0.5in
\newdimen\intrinsicmargin  \intrinsicmargin=3em
\newdimen\casemargin       \casemargin=0.75in
\newdimen\argumentmargin   \argumentmargin=1.8in

\def\IT{\it}
\def\RM{\rm}
\let\CHAR=\char
\let\CATCODE=\catcode
\let\DEF=\def
\let\GLOBAL=\global
\let\RELAX=\relax
\let\BEGIN=\begin
\let\END=\end


\def\FUNNYCHARACTIVE{\CATCODE`\a=13 \CATCODE`\b=13 \CATCODE`\c=13 \CATCODE`\d=13
		     \CATCODE`\e=13 \CATCODE`\f=13 \CATCODE`\g=13 \CATCODE`\h=13
		     \CATCODE`\i=13 \CATCODE`\j=13 \CATCODE`\k=13 \CATCODE`\l=13
		     \CATCODE`\m=13 \CATCODE`\n=13 \CATCODE`\o=13 \CATCODE`\p=13
		     \CATCODE`\q=13 \CATCODE`\r=13 \CATCODE`\s=13 \CATCODE`\t=13
		     \CATCODE`\u=13 \CATCODE`\v=13 \CATCODE`\w=13 \CATCODE`\x=13
		     \CATCODE`\y=13 \CATCODE`\z=13 \CATCODE`\[=13 \CATCODE`\]=13
                     \CATCODE`\-=13}

\def\RETURNACTIVE{\CATCODE`\
=13}

\makeatletter
\def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus 
 -.2ex}{2.3ex plus .2ex}{\large\sf}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-3.25ex plus -1ex minus 
 -.2ex}{1.5ex plus .2ex}{\large\sf}}
\def\alternative#1 #2#3{\def\@tempa{#1}\def\@tempb{A}\ifx\@tempa\@tempb\else
    \expandafter\@altbumpdown\string#2\@foo\fi
    #2{Version #1: #3}}
\def\@altbumpdown#1#2\@foo{\global\expandafter\advance\csname c@#2\endcsname-1}

\def\@ifpar#1#2{\let\@tempe\par \def\@tempa{#1}\def\@tempb{#2}\futurelet
    \@tempc\@ifnch}

\def\?#1.{\begingroup\def\@tempq{#1}\list{}{\leftmargin\intrinsicmargin}\relax
  \item[]{\bf\@tempq.} \@intrinsictest}
\def\@intrinsictest{\@ifpar{\@intrinsicpar\@intrinsicdesc}{\@intrinsicpar\relax}}
\long\def\@intrinsicdesc#1{\list{}{\relax
  \def\@tempb{ Arguments}\ifx\@tempq\@tempb
			  \leftmargin\argumentmargin
			  \else \leftmargin\casemargin \fi
  \labelwidth\leftmargin  \advance\labelwidth -\labelsep
  \parsep 4pt plus 2pt minus 1pt
  \let\makelabel\@intrinsiclabel}#1\endlist}
\long\def\@intrinsicpar#1#2\\{#1{#2}\@ifstar{\@intrinsictest}{\endlist\endgroup}}
\def\@intrinsiclabel#1{\setbox0=\hbox{\rm #1}\ifnum\wd0>\labelwidth
  \box0 \else \hbox to \labelwidth{\box0\hfill}\fi}
\def\Case(#1):{\item[{\it Case (#1):}]}
\def\ {\@ifnextchar({\def\@tempq{#1}\@intrinsicopt}{\item[#1]}}
\def\@intrinsicopt(#1){\item[{\@tempq} (#1)]}

\def\MATRIX#1{\relax
    \@ifnextchar,{\@MATRIXTABS{}#1,\@FOO, \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar;{\@MATRIXTABS{}#1,\@FOO; \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar:{\@MATRIXTABS{}#1,\@FOO: \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar.{\hfill\penalty1\null\penalty10000\hskip0pt plus 1filll
		  \@MATRIXTABS{}#1,\@FOO.\penalty-50\@gobble
  }{\@MATRIXTABS{}#1,\@FOO{ }\hskip0pt plus 1filll\penalty-1}}}}}

\def\@MATRIXTABS#1#2,{\@ifnextchar\@FOO{\@MATRIX{#1#2}}{\@MATRIXTABS{#1#2&}}}
\def\@MATRIX#1\@FOO{\(\left[\begin{array}{rrrrrrrrrr}#1\end{array}\right]\)}

\def\@IFSPACEORRETURNNEXT#1#2{\def\@tempa{#1}\def\@tempb{#2}\futurelet\@tempc\@ifspnx}

{
\FUNNYCHARACTIVE
\GLOBAL\DEF\FUNNYCHARDEF{\RELAX
    \DEFa{{\IT\CHAR"61}}\DEFb{{\IT\CHAR"62}}\DEFc{{\IT\CHAR"63}}\RELAX
    \DEFd{{\IT\CHAR"64}}\DEFe{{\IT\CHAR"65}}\DEFf{{\IT\CHAR"66}}\RELAX
    \DEFg{{\IT\CHAR"67}}\DEFh{{\IT\CHAR"68}}\DEFi{{\IT\CHAR"69}}\RELAX
    \DEFj{{\IT\CHAR"6A}}\DEFk{{\IT\CHAR"6B}}\DEFl{{\IT\CHAR"6C}}\RELAX
    \DEFm{{\IT\CHAR"6D}}\DEFn{{\IT\CHAR"6E}}\DEFo{{\IT\CHAR"6F}}\RELAX
    \DEFp{{\IT\CHAR"70}}\DEFq{{\IT\CHAR"71}}\DEFr{{\IT\CHAR"72}}\RELAX
    \DEFs{{\IT\CHAR"73}}\DEFt{{\IT\CHAR"74}}\DEFu{{\IT\CHAR"75}}\RELAX
    \DEFv{{\IT\CHAR"76}}\DEFw{{\IT\CHAR"77}}\DEFx{{\IT\CHAR"78}}\RELAX
    \DEFy{{\IT\CHAR"79}}\DEFz{{\IT\CHAR"7A}}\DEF[{{\RM\CHAR"5B}}\RELAX
    \DEF]{{\RM\CHAR"5D}}\DEF-{\@IFSPACEORRETURNNEXT{{\CHAR"2D}}{{\IT\CHAR"2D}}}}
}

%%% Warning!  Devious return-character machinations in the next several lines!
%%%           Don't even *breathe* on these macros!
{\RETURNACTIVE\global\def\RETURNDEF{\def
{\@ifnextchar\FNB{}{\@stopline\@ifnextchar
{\@NEWBNFRULE}{\penalty\@M\@startline\ignorespaces}}}}\global\def\@NEWBNFRULE
{\vskip\bnfsep\@startline\ignorespaces}\global\def\@ifspnx{\ifx\@tempc\@sptoken \let\@tempd\@tempa \else \ifx\@tempc
\let\@tempd\@tempa \else \let\@tempd\@tempb \fi\fi \@tempd}}
%%% End of bizarro return-character machinations.

\def\IS{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth  \hskip-\bnfalign
  \hbox to \bnfalign{\unhbox\@curfield\hfill}\hbox to \bnfopwidth{\bf is \hfill}}
\def\OR{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth \hbox to \bnfopwidth{\bf or \hfill}}
\def\R#1 {\hbox to 0pt{\hskip-\bnfmargin R#1\hfill}}
\def\XBNF{\FUNNYCHARDEF\FUNNYCHARACTIVE\RETURNDEF\RETURNACTIVE
  \def\@underbarchar{{\char"5F}}\tt\frenchspacing
  \advance\@totalleftmargin\bnfmargin \tabbing
  \hskip\bnfalign\hskip\bnfopwidth\hskip\bnfindent\=\kill\>\+\@gobblecr}
\def\endXBNF{\-\endtabbing}

\def\BNF{\BEGIN{XBNF}}
\def\FNB{\END{XBNF}}

\begingroup \catcode `|=0 \catcode`\\=12
|gdef|@XCODE#1\EDOC{#1|endtrivlist|end{tt}}
|endgroup

\def\CODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces \@XCODE}
\def\ICODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces
   \FUNNYCHARDEF\FUNNYCHARACTIVE \UNDERBARACTIVE\UNDERBARDEF \@XCODE}

\def\@underbarsub#1{{\ifmmode _{#1}\else {$_{#1}$}\fi}}
\let\@underbarchar\_
\def\@underbar{\let\@tempq\@underbarsub\if\@tempz A\let\@tempq\@underbarchar\fi
  \if\@tempz B\let\@tempq\@underbarchar\fi\if\@tempz C\let\@tempq\@underbarchar\fi
  \if\@tempz D\let\@tempq\@underbarchar\fi\if\@tempz E\let\@tempq\@underbarchar\fi
  \if\@tempz F\let\@tempq\@underbarchar\fi\if\@tempz G\let\@tempq\@underbarchar\fi
  \if\@tempz H\let\@tempq\@underbarchar\fi\if\@tempz I\let\@tempq\@underbarchar\fi
  \if\@tempz J\let\@tempq\@underbarchar\fi\if\@tempz K\let\@tempq\@underbarchar\fi
  \if\@tempz L\let\@tempq\@underbarchar\fi\if\@tempz M\let\@tempq\@underbarchar\fi
  \if\@tempz N\let\@tempq\@underbarchar\fi\if\@tempz O\let\@tempq\@underbarchar\fi
  \if\@tempz P\let\@tempq\@underbarchar\fi\if\@tempz Q\let\@tempq\@underbarchar\fi
  \if\@tempz R\let\@tempq\@underbarchar\fi\if\@tempz S\let\@tempq\@underbarchar\fi
  \if\@tempz T\let\@tempq\@underbarchar\fi\if\@tempz U\let\@tempq\@underbarchar\fi
  \if\@tempz V\let\@tempq\@underbarchar\fi\if\@tempz W\let\@tempq\@underbarchar\fi
  \if\@tempz X\let\@tempq\@underbarchar\fi\if\@tempz Y\let\@tempq\@underbarchar\fi
  \if\@tempz Z\let\@tempq\@underbarchar\fi\@tempq}
\def\@under{\futurelet\@tempz\@underbar}

\def\UNDERBARACTIVE{\CATCODE`\_=13}
\UNDERBARACTIVE
\def\UNDERBARDEF{\def_{\protect\@under}}
\UNDERBARDEF

\catcode`\$=11  

%the following line would allow derived-type component references 
%FOO%BAR in running text, but not allow LaTeX comments
%without this line, write FOO\%BAR
%\catcode`\%=11 

\makeatother

%end of file syntax-macs.tex


\title{{\em D R A F T} \\High Performance Fortran \\ FORALL Proposal}
\author{High Performance Fortran Forum}
\date{October 14, 1992}

\hyphenation{RE-DIS-TRIB-UT-ABLE sub-script Wil-liam-son}

\begin{document}

\maketitle

\newpage

\pagenumbering{roman}

\vspace*{4.5in}

This is the result of a LaTeX run of a draft of a single chapter of 
the HPFF Final Report document.

\vspace*{3.0in}

\copyright 1992 Rice University, Houston Texas.  Permission to copy 
without fee all or part of this material is granted, provided the 
Rice University copyright notice and the title of this document 
appear, and notice is given that copying is by permission of Rice 
University.

\tableofcontents

\newpage

\pagenumbering{arabic}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%put text of chapter here


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%put \end{document} here
%statements.tex


%Revision history:
%August 2, 1992 - Original version of David Loveman, Digital Equipment
%	Corporation and Charles Koelbel, Rice University
%August 19, 1992 - chk - cleaned up discrepancies with Fortran 90 array 
%	expressions
%August 20, 1992 - chk - added DO INDEPENDENT section, Guy Steele's 
%	pointer proposals
%August 24, 1992 - chk - ELEMENTAL functions proposal
%August 31, 1992 - chk - PURE functions proposal
%September 3, 1992 - chk - reorganized sections
%September 21, 1992 - chk - began incorporating updates from Sept
%	10-11 meeting
%October 14, 1992 - chk - Incorporated ON and revised PURE 


\newenvironment{constraints}{
        \begin{list}{Constraint:}{
                \settowidth{\labelwidth}{Constraint:}
                \settowidth{\labelsep}{w}
                \settowidth{\leftmargin}{Constraint:w}
                \setlength{\rightmargin}{0cm}
        }
}{
        \end{list}
}


\chapter{Statements}
\label{statements}

\section{Overview}

\footnote{Version of September 21, 1992 
- Charles Koelbel, Rice University.}
The purpose of the FORALL construct is to provide a convenient syntax for 
simultaneous assignments to large groups of array elements.
In this respect it is very similar to the functionality provided by array 
assignments and WHERE constructs.
FORALL differs from these constructs primarily in its syntax, which is 
intended to be more suggestive of local operations on each element of an 
array.
It is also possible to specify slightly more general array regions than 
are allowed by the basic array triplet notation.
Both single-statement and block FORALLs are defined in this proposal.

The FORALL statement, in both its single-statment and block forms, was
accepted by the High Performance Fortran Forum working group on its
second reading September 10, 1992.
This vote was contingent on a more complete definition of PURE
functions.
The idea of PURE functions was accepted by the HPFF working group at
its first reading on September 10, 1992.
However, the definition at that time was not completely acceptable due to
technical errors; those errors discussed at that time have been
revised in this draft.
The single-statement form of FORALL was accepted by the HPFF working
group as part of the official HPF subset in a first reading on
September 11, 1992; the block FORALL was excluded from the subset at
the same time.

The purpose of the INDEPENDENT directive is to allow the programmer to
give additional information to the compiler.
The user can assert that no data object is defined by one iteration of
a loop and used (read or written) by another; similar information can
be provided about the combinations of index values in a FORALL
statement.
A compiler may rely on this information to make optimizations, such as
parallelization or reorganizing communication.
If the assertion is true, the semantics of the program are not
changed; if it is false, the program is not standard-conforming and
has no defined meaning.
The ``Other Proposals'' section contains a number of additional
assertions with this flavor.

The INDEPENDENT assertion was accepted by the High Performance Fortran
Forum working group on its second reading on September 10, 1992.
The group also directed the FORALL subgroup to further explore methods for
allowing reduction operations to be accomplished in INDEPENDENT loops.

The following proposals are designed as a modification of the Fortran 90 
standard; all references to rule numbers and section numbers pertain to 
that document unless otherwise noted.


\section{Element Array Assignment - FORALL}
 

\label{forall-stmt}

\footnote{Version of September 21, 1992 - David
Loveman, Digital Equipment Corporation and Charles Koelbel, Rice
University.
Approved at second reading on September 10, 1992.}
The element array
assignment statement (FORALL statement) is used to specify an array
assignment in terms of array elements or groups of array sections.
The element array assignment may be
masked with a scalar logical expression.  
In functionality, it is similar to array assignment statements;
however, more general array sections can be assigned in FORALL.

Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsection{General Form of Element Array Assignment}

                                                                       \BNF
forall-stmt          \IS FORALL (forall-triplet-spec-list
                       [,scalar-mask-expr ]) forall-assignment

forall-triplet-spec  \IS subscript-name = subscript : subscript 
                          [ : stride]
                                                                       \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

                                                                       \BNF
forall-assignment    \IS array-element = expr
                     \OR array-element => target
                     \OR array-section = expr
                                                                       \FNB

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: In the cases of simple assignment, the {\it array-element} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent
Constraint: In the case of pointer assignment, the {\it array-element} 
and {\it target} have the same constraints as the {\it pointer-object} 
and {\it target}, respectively, in a {\it pointer-assignment-stmt}.

\noindent
Constraint: In the cases of array section assignment, the {\it 
array-section} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent Constraint: If any subexpression in {\it expr}, {\it 
array-element}, or {\it array-section} is a {\it function-reference}, 
then the {\it function-name} must be a ``pure'' function as defined in
Section~\ref{forall-pure}.


For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
1}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
forall-assignment} is not executed.

A ``pure'' function is defined in Section~\ref{forall-pure}; the
intuition is that a pure function cannot have side effects.
The PURE declaration places syntactic constraints on the function to
ensure this.

Examples of element array assignments are:

                                                                  \CODE
REAL H(N,N), X(N,N), Y(N,N)
TYPE MONARCH
    INTEGER, POINTER :: P
END TYPE MONARCH
TYPE(MONARCH) :: A(N)
INTEGER B(N)
      ...
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, Y(I,J) .NE. 0.0) X(I,J) = 1.0 / Y(I,J)

! Set up a butterfly pattern
FORALL (J=1:N)  A(J)%P => B(1+IEOR(J-1,2**K))
                                                                  \EDOC 

\subsection{Interpretation of Element Array Assignments}  

Execution of an element array assignment consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Evaluation in any order of the {\it expr} or {\it target} and all 
subscripts contained in the 
{\it array-element} or {\it array-section} in the {\it forall-assignment} 
for all active combinations of {\em subscript-name} values.
In the case of pointer assignment where the {\it target} is not a 
pointer, the evaluation consists of identifying the object referenced 
rather than computing its value.

\item Assignment of the computed {\it expr} values to the corresponding 
elements specified by {\it array-element} or {\it array-section}.
The assignments may be made in any order.
In the case of a pointer assignment where the {\it target} is not a 
pointer, this assignment consists of associating the {\it array-element} 
with the object referenced.

\end{enumerate}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 


The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.

Note that if a function called in a FORALL is declared PURE, then it 
is impossible for that function's evaluation to affect other expressions' 
evaluations, either for the same combination of 
{\it subscript-name} values or for a different combination.
In addition, it is possible that the compiler can perform 
more extensive optimizations when all functions are declared PURE.


\subsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

                                                                   \CODE
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn , mask ) &
      a(e1,...,em) = rhs
                                                                   \EDOC

\noindent
is equivalent to the following standard Fortran 90 code:

\raggedbottom
                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
	  ...
  END DO
END DO

!then evaluate the expr in the forall-assignment for all valid 
!combinations of subscript names for which the scalar mask 
!expression is true (it is safe to avoid saving the subscript 
!expressions because of the conditions on FORALL expressions)

DO v1=templ1,tempu1,temps1
  DO v2=tel2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
	  ...
  END DO
END DO

!then perform the assignment of these values to the corresponding 
!elements of the array being assigned to

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(e1,...,em) = temprhs(v1,v2,...,vn)
        END IF
      END DO
	  ...
  END DO
END DO
                                                                      \EDOC
\flushbottom

\subsection{Consequences of the Definition of the FORALL Statement}

This section should be moved to the comments chapter in the final
draft.

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  
(This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Right-hand sides and subscripts on the left hand side of a {\it 
forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\item The intent of ``pure'' functions is to provide a class of
functions without side-effects, and to allow this side-effect freedom
to be checked syntactically.

\end{itemize}


\section{FORALL Construct}

\label{forall-construct}

\footnote{Version of August 20, 1992 -
David Loveman, Digital Equipment Corporation and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992.}
The FORALL construct is a generalization of the element array
assignment statement allowing multiple assignments, masked array 
assignments, and nested FORALL statements to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it
forall-construct}.

\subsection{General Form of the FORALL Construct}

                                                                    \BNF
forall-construct   \IS FORALL (forall-triplet-spec-list [,scalar-mask-expr ])
                               forall-body-stmt-list
                            END FORALL

forall-body-stmt     \IS forall-assignment
                     \OR where-stmt
                     \OR forall-stmt
                     \OR forall-construct
                                                                    \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\noindent
Constraint:  Any left-hand side {\it array-section} or {\it 
array-element} in any {\it forall-body-stmt}
must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
within a {\it forall-construct}, then the inner FORALL may not redefine 
any {\it subscript-name} used in the outer {\it forall-construct}.
This rule applies recursively in the event of multiple nesting levels.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
1)}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
forall-assignment}s are not  executed.

Examples of the FORALL construct are:

                                                                 \CODE
FORALL ( I = 2:N-1, J = 2:N-1 )
  A(I,J) = A(I,J-1) + A(I,J+1) + A(I-1,J) + A(I+1,J)
  B(I,J) = A(I,J)
END FORALL

FORALL ( I = 1:N-1 )
  FORALL ( J = I+1:N )
    A(I,J) = A(J,I)
  END FORALL
END FORALL

FORALL ( I = 1:N, J = 1:N )
  A(I,J) = MERGE( A(I,J), A(I,J)**2, I.EQ.J )
  WHERE ( .NOT. DONE(I,J,1:M) )
    B(I,J,1:M) = B(I,J,1:M)*X
  END WHERE
END FORALL
                                                                \EDOC


\subsection{Interpretation of the FORALL Construct}

Execution of a FORALL construct consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Execute the {\it forall-body-stmts} in the order they appear.
Each statement is executed completely (that is, for all active 
combinations of {\it subscript-name} values) according to the following 
interpretation:

\begin{enumerate}

\item Assignment statements, pointer assignment statements, and array
assignment statements (i.e.
statements in the {\it forall-assignment} category) evaluate the 
right-hand side {\it expr} and any left-and side subscripts for all 
active {\it subscript-name} values,
then assign those results to the corresponding left-hand side references.

\item WHERE statements evaluate their {\it mask-expr} for all active 
combinations of values of {\it subscript-name}s.
All elements of all masks may be evaluated in any order. 
The assignments within the WHERE branch of the statement are then 
executed in order using the above interpretation of array assignments 
within the FORALL, but the only array elements assigned are those 
selected by both the active {\it subscript-names} and the WHERE mask.
Finally, the assignments in the ELSEWHERE branch are executed if that 
branch is present.
The assignments here are also treated as array assignments, but elements 
are only assigned if they are selected by both the active combinations 
and by the negation of the WHERE mask.

\item FORALL statements and FORALL constructs first evaluate the 
subscript and stride expressions in 
the {\it forall-triplet-spec-list} for all active combinations of the 
outer FORALL constructs.
The set of valid combinations of {\it subscript-names} for the inner 
FORALL is then the union of the sets defined by these bounds and strides 
for each active combination of the outer {\it subscript-names}.
For example, the valid set of the inner FORALL in the second example in 
the last section is the upper triangle (not including the main diagonal) 
of the \(n \times n\) matrix a.
The scalar mask expression is then evaluated for all valid combinations 
of the inner FORALL's {\it subscript-names} to produce the set of active 
combinations.
If there is no scalar mask expression, it is assumed to be always true.
Each statement in the inner FORALL is then executed for each valid 
combination (of the inner FORALL), recursively following the 
interpretations given in this section.

\end{enumerate}

\end{enumerate}

If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL construct itself. 

A single assignment or array assignment statement in a {\it 
forall-construct} must obey the same restrictions as a {\it 
forall-assignment} in a simple {\it forall-stmt}.
(Note that the lowest level of nested statements must always be an 
assignment statement.)
For example, an assignment may not cause the same array element to be 
assigned more than once.
Different statements may, however, assign to the 
same array element, and assignments made in one
statement may affect the execution of a later statement.

\subsection{Scalarization of the FORALL Construct}

A {\it forall-construct} othe form:

                                                                \CODE
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
                                                                \EDOC

where each si is an assignment is equivalent to the following scalar code:

                                                                \CODE
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
                                                                \EDOC

A similar statement can be made using FORALL constructs when the 
si may be WHERE or FORALL constructs.

A {\it forall-construct} of the form:

                                                                \CODE
FORALL ( v1=l1:u1:s1, mask )
  WHERE ( mask2(l2:u2) )
    a(vi,l2:u2) = rhs1
  ELSEWHERE
    a(vi,l2:u2) = rhs2
  END WHERE
END FORALL
                                                                \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the masks for the WHERE

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    tmpl2(v1) = l2
    tmpu2(v1) = u2
    tempmask2(v1,tmpl2(v1):tmpu2(v1)) = mask2(tmpl2(v1):tmpu2(v1))
  END IF
END DO

!then evaluate the WHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs1(v1,tmpl2(v1):tmpu2(v1)) = rhs1
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs1(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO

!then evaluate the ELSEWHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs2(v1,tmpl2(v1):tmpu2(v1)) = rhs2
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs2(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO
                                                                   \EDOC


A {\it forall-construct} of the form:

                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  FORALL ( v2=l2:u2:s2, mask2 )
    a(e1,e2) = rhs1
	b(e3,e4) = rhs2
  END FORALL
END FORALL
                                                                  \EDOC

is equivalent to the following standard Fortran 90 code:


                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the inner FORALL bounds, etc

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    templ2(v1) = l2
    tempu2(v1) = u2
    temps2(v1) = s2
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      tempmask2(v1,v2) = mask2
    END DO
  END IF
END DO

!first statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs1(v1,v2) = rhs1
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        a(e1,e2) = temprhs1(v1,v2)
      END IF
    END DO
  END IF
END DO

!second statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs2(v1,v2) = rhs2
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        b(e3,e4) = temprhs2(v1,v2)
      END IF
    END DO
  END IF
END DO
                                                                   \EDOC


\subsection{Consequences of the Definition of the FORALL Construct}

This section should be moved to the comments chapter of the final
draft.

\begin{itemize}

\item A block FORALL means roughly the same as replicating the FORALL
header in front of each array assignment statement in the block, except
that any expressions in the FORALL header are evaluated only once,
rather than being re-evaluated before each of the statements in the body.
(The exceptions to this rule are nested FORALL statements and WHERE
statements, which introduce syntactic and functional complications
into the copying.)

\item One may think of a block FORALL as synchronizing twice per
contained assignment statement: once after handling the rhs and other 
expressions
but before performing assignments, and once after all assignments have
been performed but before commencing the next statement.  (In practice,
appropriate dependence analysis will often permit the compiler to
eliminate unnecessary synchronizations.)

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item In general, any expression in a FORALL is evaluated only for valid 
combinations of all surrounding subscript names for which all the
scalar mask eressions are true.

\item Nested FORALL bounds and strides can depend on outer FORALL {\it 
subscript-names}.  They cannot redefine those names, even temporarily (if 
they did there  would be no way to avoid multiple assignments to the same 
array element).

\item Dependences are allowed from one statement to later statements, but 
never from an assignment statement to itself.

\end{itemize}


\section{Pure Procedures and Elemental Reference}

\label{forall-pure}

\footnote{Version of October 14, 
1992 - John Merlin, University of Southampton, 
and Charles Koelbel, Rice University. Approved at first reading on
September 10, 1992, subject to technical revisions for correctness.
The suggestions made there have been incorporated in this draft.}
A {\it pure function\/} is one that produces no side effects.  This
means that the only effect of a pure function reference on the state 
of a program is to return a result---it does not modify the values, 
pointer associations or data mapping of any of its arguments or global 
data, and performs no I/O.
A {\em pure subroutine\/} is one that produces no side effects
except for modifying the values and/or pointer associations of certain
arguments.  

A pure procedure (i.e.\ function or subroutine) may be used in any way 
that a normal procedure can.
In addition, a procedure is required to be pure if it is used in any 
of the following contexts:
\begin{itemize}
        \item a FORALL statement or construct;
        \item an elemental reference (see section \ref{elem-ref-of-pure-procs});
        \item within the body of a pure procedure;
        \item as an actual argument in a pure procedure reference.
\end{itemize}

The side-effect freedom of a pure function ensures that it can be invoked
concurrently in a FORALL or elemental reference without undesirable
consequences such as non-determinism, and additionally assists the efficient
implementation of concurrent execution.  A pure subroutine can be
invoked concurrently in an elemental reference, and since its side effects
are limited to a known subset of its arguments (as we shall see later), 
an implementation can check that a reference obeys Fortran~90's restrictions 
on argument association and is consequently deterministic.


\subsection{Pure procedure declaration and interface}

If a non-intrinsic procedure is used in a context that requires it to be 
pure, then its interface must be explicit in the scope of that use, 
and both its interface body (if provided) and its definition must contain 
the PURE declaration.  The form of this declaration is 
a directive immediately after the {\it function-stmt\/} or {\it
subroutine-stmt\/} of the procedure interface body or definition:
                                                                 \BNF
pure-directive \IS !HPF$ PURE [procedure-name]
                                                                 \FNB

Intrinsic functions, including HPF intrinsic functions, are always pure 
and require no explicit declaration of this fact;  intrinsic subroutines 
are pure if they are elemental (e.g.\ MVBITS) but not otherwise.
A statement function is pure if and only if all functions that it
references are pure.

\subsubsection{Pure function definition}

To define pure functions, Rule~R1215 of the Fortran~90 standard is changed 
to:
                                                                 \BNF
function-subprogram \IS         function-stmt
                                [pure-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-function-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.2 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it pure-directive\/}, it must match the {\it function-name\/} in the 
{\it function-stmt\/}.

        \item In a pure function, a local variable must not have the 
SAVE attribute. (Note that this means that a local variable cannot be 
initialised in a {\it type-declaration-stmt\/} or a
{\it data-stmt\/}, which imply the SAVE attribute.)

        \item A pure function must not use a dummy argument, a global 
variable, or an object that is storage associated with a global variable,
or a subobject thereof, in the following contexts:
        \begin{itemize}
                \item as the assignment variable of an {\it assignment-stmt\/}
or {\it forall-assignment\/};
                \item as a DO variable or implied DO variable, or as a 
{\it subscript-name\/} in a {\it forall-triplet-spec\/};
                \item in an {\it assign-stmt\/};
                \item as the {\it pointer-object\/} or {\it target\/}
of a {\it pointer-assignment-stmt\/};
                \item as the {\it expr\/} of an {\it assignment-stmt\/}
or {\it forall-assignment\/} whose assignment variable is of a derived 
type, or is a pointer to a derived type, that has a pointer component 
at any level of component selection;
                \item as an {\it allocate-object\/} or {\it stat-variable\/}
in an {\it allocate-stmt\/} or {\it deallocate-stmt\/}, or as a
{\it pointer-object\/} in a {\it nullify-stmt\/};
                \item as an actual argument associated with a dummy 
argument with INTENT (OUT) or (INOUT) or with the POINTER attribute.
        \end{itemize}

        \item Any procedure referenced in a pure function, including
one referenced via a defined operation or assignment, must be pure.

        \item In a pure function, a dummy argument or local variable 
must not appear in an {\it align-directive\/}, {\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a pure function, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A pure function must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a function is pure, a {\it pure-directive\/} must be given.

The above constraints are designed to guarantee that a pure function
is free from side effects (i.e.\ modifications of data visible outside
the function), which means that it is safe to reference concurrently, 
as explained earlier.

The second constraint ensures that a pure function does not retain
an internal state between calls, which would allow side-effects between 
calls to the same procedure.

The third constraint ensures that dummy arguments and global variables
are not modified by the function.
In the case of a dummy or global pointer, this applies to both its 
pointer association and its target value, so it cannot be subject to 
a pointer assignment or to an ALLOCATE, DEALLOCATE or NULLIFY
statement.
Incidentally, these constraints imply that only local variables and the
dummy result variable can be subject to assignment or pointer assignment.

In addition, a dummy or global data object cannot be the {\it target\/}
of a pointer assignment (i.e.\ it cannot be used as the right hand side
of a pointer assignment to a local pointer or to the result variable), 
for then its value could be modified via the pointer.

In connection with the last point, it should be noted that an ordinary 
(as opposed to pointer) assignment to a variable of derived type that has 
a pointer component at any level of component selection may result in a 
{\em pointer\/} assignment to the pointer component of the variable.
That is certainly the case for an intrinsic assignment.  In that case
the expression on the right hand side of the assignment has the same type 
as the assignment variable, and the assignment results in a pointer 
assignment of the pointer components of the expression result to the
corresponding components of the variable (see section 7.5.1.5 of the 
Fortran~90 standard).  However, it may also be the case for a 
{\em defined\/} assignment to such a variable, even if the data type of 
the expression has no pointer components;  the defined assignment may still 
involve pointer assignment of part or all of the expression result to the 
pointer components of the assignment variable.  Therefore, a dummy or 
global object cannot be used as the right hand side of any assignment to 
a variable of derived type with pointer components, for then it, or part 
of it, might be the target of a pointer assignment, in violation of the 
restriction mentioned above.

(Incidentally, the last two paragraphs only prevent the reference of 
a dummy or global object as the {\em only\/} object on the right hand
side of a pointer assignment or an assignment to a variable with pointer
components.  There are no constraints on its reference as an operand, 
actual argument, subscript expression, etc.\ in these circumstances).

Finally, a dummy or global data object cannot be used in a procedure 
reference as an actual argument associated with a dummy argument of
INTENT (OUT) or (INOUT) or with a dummy pointer, for then it may be
modified by the procedure reference.  
This constraint, like the others, can be statically checked, since any
procedure referenced within a pure function must be either a pure 
function, which does not modify its arguments, or a pure subroutine, 
whose interface must specify the INTENT or POINTER attributes of its 
arguments (see below).
Incidentally, notice that in this context it is assumed that an actual 
argument associated with a dummy pointer is modified, since Fortran~90 
does not allow its intent to be specified.

Constraint 4 ensures that all procedures called from a pure function 
are themselves pure and hence side effect free, except, in the case of
subroutines, for modifying actual arguments associated with dummy pointers 
or dummy arguments with INTENT(OUT) or (INOUT).  As we have just 
explained, it can be checked that global or dummy objects are not used
in such arguments, which would violate the required side-effect freedom.

Constraints 5 and 6 protect dummy and global data objects from realignment 
and redistribution (another type of side effect).  
In addition, constraint 5 prevents explicit declaration of the mapping 
(i.e.\ alignment and distribution) of dummy arguments and local variables.  
This is because the function may be invoked concurrently, with each 
invocation operating on a segment of data whose distribution is specific 
to that invocation.  Thus, the distribution of a dummy object must be 
`assumed' from the corresponding actual argument.  
Also, it is left to the implementation to determine a suitable mapping 
of the local variables, which would typically depend on the mapping of 
the dummy arguments.

Constraint 7 prevents I/O, whose order would be non-deterministic in 
the context of concurrent execution.  A PAUSE statement requires input
and so is disallowed for the same reason.


\subsubsection{Pure subroutine definition}

To define pure subroutines, Rule~R1219 is changed to:
                                                                 \BNF
subroutine-subprogram \IS       subroutine-stmt
                                [pure-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-subroutine-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.3 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it pure-directive\/}, it must match the {\it sub\-rou\-tine-name\/} in the 
{\it subroutine-stmt\/}.

        \item The {\it specification-part\/} of a pure subroutine must 
specify the intents of all non-pointer and non-procedure dummy arguments.

        \item In a pure subroutine, a local variable must not have the 
SAVE attribute. (Note that this means they cannot be initialised in a 
{\it type-declaration-stmt\/} or a {\it data-stmt\/}.)

        \item A pure subroutine must not use a dummy parameter with 
        INTENT(IN), a global variable, or an 
object that is storage associated with a global variable, or a subobject 
thereof, in the following contexts:
        \begin{itemize}
                \item as the assignment variable of an {\it assignment-stmt\/}
or {\it forall-assignment\/};
                \item as a DO variable or implied DO variable, or as a 
{\it subscript-name\/} in a {\it forall-triplet-spec\/};
                \item in an {\it assign-stmt\/};
                \item as the {\it pointer-object\/} or {\it target\/}
of a {\it pointer-assignment-stmt\/};
                \item as the {\it expr\/} of an {\it assignment-stmt\/}
or {\it forall-assignment\/} whose assignment variable is of a derived 
type, or is a pointer to a derived type, that has a pointer component 
at any level of component selection;
                \item as an {\it allocate-object\/} or {\it stat-variable\/}
in an {\it allocate-stmt\/} or {\it deallocate-stmt\/}, or as a
{\it pointer-object\/} in a {\it nullify-stmt\/};
                \item as an actual argument associated with a dummy 
argument with INTENT (OUT) or (INOUT) or with the POINTER attribute.
        \end{itemize}

        \item Any procedure referenced in a pure subroutine, including
one referenced via a defined operation or assignment, must be pure.

        \item In a pure subroutine, a dummy argument or local variable 
must not appear in an {\it align-directive\/}, {\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a pure subroutine, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A pure subroutine must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a subroutine is pure, a {\it pure-directive\/} must be
given.

The constraints for pure subroutines are based on the same principles 
as for pure functions, except that now side effects to dummy arguments 
are permitted.  


\subsubsection{Pure procedure interfaces}
\label{pure-proc-interface}

To define interface specifications for pure procedures, Rule~R1204 is 
changed to:
                                                                \BNF
interface-body \IS      function-stmt
                        [pure-directive]
                        [specification-part]
                        end-function-stmt
                \OR     subroutine-stmt
                        [pure-directive]
                        [specification-part]
                        end-subroutine-stmt
                                                                \FNB
with the following constraint in addition to those in
Section~12.3.2.1 of the Fortran~90 standard:
\begin{constraints}

        \item An {\it interface-body\/} of a pure subroutine must specify
the intents of all non-pointer and non-procedure dummy arguments.

\end{constraints}

The procedure characteristics defined by an interface body must be
consistent with the procedure's definition.
Regarding pure procedures, this is interpreted as follows:
\begin{enumerate}
        \item A procedure that is declared pure at its definition may be
declared pure in an interface block, but this is not required.
        \item A procedure that is not declared pure at its definition must 
not be declared pure in an interface block.
\end{enumerate}
That is, if an interface body contains a {\it pure-directive\/}, then the 
corresponding procedure definition must also contain it, though the 
reverse is not true.
When a procedure definition with a {\it pure-directive\/}
is compiled, the compiler may check that it satisfies the necessary 
constraints.


\subsection{Pure procedure reference}
To define pure procedure references, the following extra constraint is 
added to Section~12.4.1 of the Fortran~90 standard:
\begin{constraints}

        \item In a reference to a pure procedure, a {\it procedure-name\/} 
{\it actual-arg\/} must be the name of a pure procedure.

\end{constraints}


\subsection{Elemental reference of pure procedures}
\label{elem-ref-of-pure-procs}

Fortran 90 introduces the concept of `elemental procedures', which are 
defined for scalar arguments but may also be applied to conforming 
array-valued arguments.  The latter type of reference to an elemental 
procedure is called an `elemental' reference.    For an elemental function, 
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g.\ SIN(X).  For an elemental 
subroutine, the effect on each element of an INTENT(OUT) or INTENT(INOUT) 
array argument is as would be obtained by calling the subroutine with 
the corresponding elements of the arguments.  An example is the intrinsic 
subroutine MVBITS.

However, Fortran~90 restricts elemental reference to a subset of 
the intrinsic procedures --- programmers cannot define their own 
elemental procedures.  Obviously, elemental invocation is equivalent 
to concurrent invocation, so extra constraints beyond those for normal 
Fortran procedures are required to allow this to be done safely
(e.g.\ deterministically).  Appropriate constraints in this case are
the same as for function calls in FORALL;  indeed, the latter are 
virtually equivalent to elemental reference of the function in an 
array assignment, given the close correspondence between FORALL and 
array assignment.  Hence, pure procedures may also be referenced 
elementally, subject to certain additional constraints given below.

\subsubsection{Elemental reference of pure functions}

A non-intrinsic pure function may be referenced {\em elementally\/} 
in array expressions, with a similar interpretation to the elemental
reference of Fortran~90 elemental intrinsic functions, provided it
satisfies the additional constraints that:
\begin{enumerate}
        \item Its non-procedure dummy arguments and dummy result are 
scalar and do not have the POINTER attribute.
        \item The length of any character dummy argument or result is 
independent of argument values (though it may be assumed, or depend on the 
lengths of other character arguments and/or a character result).
\end{enumerate}
We call non-intrinsic pure functions that satisfy these constraints 
`elemental non-intrinsic functions'.

The interpretation of an elemental reference of such a function is as 
follows (adapted from Section 12.4.3 of the Fortran~90 standard):
\begin{quotation}

A reference to an elemental non-intrinsic function is an elemental
reference if one or more non-procedure actual arguments are arrays
and all array arguments have the same shape.  If any actual argument 
is a function, its result must have the same shape as that of the 
corresponding function dummy procedure.  A reference to an elemental 
intrinsic function is an elemental reference if one or more actual 
arguments are arrays and all arrays have the same shape.

The result of such a reference has the same shape as the array arguments,
and the value of each element of the result, if any, is obtained by 
evaluating the function using the scalar and procedure arguments and
the corresponding elements of the array arguments.  The elements of
the result may be evaluated in any order.

For example, if \verb@foo@ is a pure function with the following interface:
                                                \CODE
    INTERFACE
      REAL FUNCTION foo (x, y, z, dummy_func)
        !HPF$ PURE foo
        REAL, INTENT(IN) :: x, y, z
        INTERFACE        ! interface for 'dummy_func'
          REAL FUNCTION dummy_func (x)
            !HPF$ PURE dummy_func
            REAL, INTENT(IN) :: x
          END FUNCTION dummy_func
        END INTERFACE
      END FUNCTION foo
    END INTERFACE
                                                \EDOC
and \verb@a@ and \verb@b@ are arrays of shape \verb@(m,n)@ and \verb@sin@
is the Fortran~90 elemental intrinsic function, then:
                                                \CODE
    foo (a, 0.0, b, sin)
                                                \EDOC
is an array expression of shape \verb@(m,n)@ whose \verb@(i,j)@ element
has the value:
                                                \CODE
    foo (a(i,j), 0.0, b(i,j), sin)
                                                \EDOC
\end{quotation}

To define elemental references of elemental non-intrinsic functions, 
the following extra constraints are added after Rule~R1209 
({\it function-reference\/}):
\begin{constraints}

        \item A non-intrinsic function that is referenced elementally 
must be a pure function with an explicit interface, and must satisfy 
the following additional constraints:
        \begin{itemize}
                \item Its non-procedure dummy arguments and dummy result
must be scalar and must not have the POINTER attribute.
                \item The length of any character dummy argument or a 
character dummy result must not depend on argument values (though it may 
be assumed, or depend on the lengths of other character arguments and/or a 
character result).
        \end{itemize}

        \item In an elemental reference of a non-intrinsic function,
a {\it function-name\/} {\it actual-arg\/} must have a result whose shape 
agrees with that of the corresponding function dummy procedure.

\end{constraints}

The reasons for these constraints are explained in the next section.


\subsubsection{Elemental reference of pure subroutines}

A non-intrinsic pure subroutine may be referenced {\em elementally\/}, 
with a similar interpretation to the elemental reference of Fortran~90 
elemental intrinsic subroutines, provided it satisfies the additional 
constraints that:
\begin{enumerate}
        \item Its non-procedure dummy arguments are scalar and do not 
have the POINTER attribute.
        \item The length of any character dummy argument is independent 
of argument values (though it may be assumed, or depend on the lengths of 
other character arguments).
\end{enumerate}
We call non-intrinsic pure subroutines that satisfy these constraints 
`elemental non-intrinsic subroutines'.

The interpretation of an elemental reference of such a subroutine 
is as follows (adapted from Section 12.4.5 of the Fortran~90 standard):
\begin{quotation}

A reference to an elemental non-intrinsic subroutine is an elemental
reference if all actual arguments corresponding to INTENT(OUT) and
INTENT(INOUT) dummy arguments are arrays that have the same shape 
and the remaining non-procedure actual arguments are conformable with 
them.  If any actual argument is a function, its result must have the 
same shape as that of the corresponding function dummy procedure.
A reference to an elemental intrinsic subroutine is an elemental 
reference if all actual arguments corresponding to INTENT(OUT) and 
(INTENT(INOUT) dummy arguments are arrays that have the same shape and 
the remaining actual arguments are conformable with them.

The values of the elements of the arrays that correspond to INTENT(OUT)
and INTENT(INOUT) dummy arguments are the same as if the subroutine were 
invoked separately, in any order, using the scalar and procedure arguments 
and corresponding elements of the array arguments.

\end{quotation}

To define elemental references of elemental non-intrinsic subroutines, 
the following constraints are added after Rule~R1210 ({\it call-stmt\/}):
\begin{constraints}

        \item A non-intrinsic subroutine that is referenced elementally 
must be a pure subroutine with an explicit interface, and must satisfy 
the following additional constraints:
        \begin{itemize}
                \item Its non-procedure dummy arguments must be scalar 
and must not have the POINTER attribute.
                \item The length of any character dummy argument must 
not depend on argument values (though it may be assumed, or depend on 
the lengths of other character arguments).
        \end{itemize}

        \item In an elemental reference of a non-intrinsic subroutine,
a {\it function-name\/} {\it actual-arg\/} must have a result whose shape 
agrees with that of the corresponding function dummy procedure.

\end{constraints}

It is perhaps worth outlining the reasons for the extra constraints 
imposed on pure procedures in order for them to be referenced elementally.  

The dummy result of a function or `output' arguments of a subroutine
are not allowed to have the POINTER attribute because of a Fortran~90
technicality, namely, that under elemental reference the corresponding 
actual arguments must be array variables, and Fortran~90 does not permit 
an array of pointers to be referenced.\footnote{
        See the final constraint after Rule~R613 of the Fortran~90 standard.
Note the difference between an {\em array of pointers\/}, which cannot 
be declared or referenced in Fortran~90, and a {\em pointer array\/},
which can.
}
The `input' arguments of an elemental reference are prohibited from 
having the POINTER attribute for consistency with the output arguments 
or result.  However, this last constraint does not impose 
any real restrictions on an elemental reference, as the corresponding 
actual arguments {\em can\/} be pointers, in which case they are 
`de-referenced' and their targets are associated with the dummy arguments.  
In fact, the only reason for a dummy argument to be a pointer is so that
its pointer association can be changed, which is not allowed for `input'
arguments.  (Incidentally, since a pure function has only `input' 
arguments, there would be no loss of generality in disallowing dummy 
pointers in pure functions generally.)  Note that the prohibition of 
dummy pointers in pure subroutines that are elementally referenced means 
that all their non-procedure dummy arguments can have their intent 
explicitly specified (and indeed this is required by the constraints for 
pure subroutine interfaces---see Section \ref{pure-proc-interface}) which 
assists the checking of argument usage.

In an elemental reference, any actual argument that is a function
must have a result whose shape agrees with that of the corresponding 
function dummy procedure.  That is, elemental usage does not extend to 
function arguments, as Fortran~90 does not support the concept of an `array' 
of functions.
Naively it might appear that a function actual argument that is associated 
with a scalar dummy function could return an array result provided it 
conforms with the other array arguments of the elemental reference.  
However, this is not meaningful under elemental reference, as an 
array-valued function cannot be decomposed into an `array' of scalar 
function references, as would be required in this context.

Finally, the length of any character dummy argument or a character
dummy result cannot depend on argument {\em values\/} (though it can
be assumed, or depend on the lengths of other character arguments and/or
a character result).  This ensures that under elemental reference, all 
elements of an array argument or result of character type will have the 
same length, as required by Fortran~90.


\subsection{Examples of pure procedure usage}

\subsubsection{FORALL statements and constructs}

Pure functions may be used in expressions in FORALL statements and 
constructs, unlike general functions.  
Because a {\it forall-assignment}
may be an {\it array-assignment} the pure function can have an array
result.  
For example:
                                                              \CODE
INTERFACE
  FUNCTION f (x)
    !HPF$ PURE f
    REAL, DIMENSION(3) :: f, x
  END FUNCTION f
END INTERFACE
REAL  v (3,10,10)
...
FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 
                                                              \EDOC


\subsubsection{Elemental references}
Examples of elemental function usage are
                                                              \CODE
INTERFACE 
  REAL FUNCTION foo (x, y, z)
    !HPF$ PURE foo
    REAL, INTENT(IN) :: x, y, z
  END FUNCTION foo
END INTERFACE

REAL a(100), b(100), c(100)
REAL p, q, r

a(1:n) = foo (a(1:n), b(1:n), c(1:n))
a(1:n) = foo (a(1:n), q, r)
a = sin(b)
                                                              \EDOC
An example involving a WHERE-ELSEWHERE construct is
                                                              \CODE
INTERFACE
  REAL FUNCTION f_egde (x)
    !HPF$ PURE
    REAL x
  END FUNCTION f_edge
  REAL FUNCTION f_interior (x)
    !HPF$ PURE
    REAL x
  END FUNCTION f_interior
END INTERFACE

REAL a (10,10)
LOGICAL edges (10,10)

WHERE (edges)
  a = f_egde (a)
ELSE WHERE
  a = f_interior (a)
END WHERE
                                                          \EDOC

Examples of elemental subroutine usage are
                                                                \CODE
INTERFACE 
  SUBROUTINE solve_simul(tol, y, z)
    !HPF$ PURE solve_simul
    REAL, INTENT(IN) :: tol
    REAL, INTENT(INOUT) :: y, z
  END SUBROUTINE
END INTERFACE

REAL a(100), b(100), c(100)
INTEGER bits(10)

CALL solve_simul( 0.1, a, b )
CALL solve_simul( c, a, b )
CALL mvbits( bits, 0, 4, bits, 4) ! Fortran 90 elemental intrinsic
                                                                \EDOC

User-defined elemental procedures have several potential advantages.
They are a convenient programming tool, as the same procedure 
can be applied to actual arguments of any rank.

In addition, the implementation of an elemental function returning an
array-valued result in an array expression is likely to be more 
efficient than that of an equivalent array function.  One reason is 
that it requires less temporary storage for the result (i.e.\ storage 
for a single result versus storage for the entire array of results).  
Another is that it saves on looping if an array expression is 
implemented by sequential iteration over the component elemental 
expressions (as may be done for the `segment' of the array expression 
local to each process).  This is because, in the sequential version, 
the elemental function can be invoked elementally in situ within the 
expression.  The array function, on the other hand, must be executed 
before the expression is evaluated, storing its result in a temporary 
array for use within the expression.  Looping is then required during 
the execution of the array function body as well as the expression 
evaluation.


\subsection{MIMD parallelism via pure procedures}

We have seen that a pure procedure may be invoked concurrently at each
`element' of an array if it is referenced elementally or in a FORALL 
statement or construct (where an `element' may itself be an array in
a non-elemental reference).  In these cases, a limited form of MIMD 
parallelism can be obtained by means of branches within the pure procedure 
which depend on arguments associated with array elements or their 
subscripts (the latter especially in a FORALL context).  For example:
                                                              \CODE
    FUNCTION f (x, i)
      !HPF$ PURE f
      REAL x       ! associated with array element
      INTEGER i    ! associated with array subscript
      IF (x > 0.0) THEN     ! content-based conditional
        ...
      ELSE IF (i==1 .OR. i==n) THEN    ! subscript-based conditional
        ...
      ENDIF
    END FUNCTION

    ...
    REAL a(n)
    INTEGER i
    ...
    FORALL (i=1:n)  a(i) = f( a(i), i)
    ...
    a = f( a, (/i,i=1,n/) )     ! an elemental reference equivalent
                                ! to the above FORALL

                                                              \EDOC
This may sometimes provide an alternative to using
WHERE-ELSEWHERE constructs or sequences of masked FORALLs with their 
potential synchronisation overhead. 


\subsection{Comments}

This section should be moved to the comments chapter of the final draft.

\subsubsection{Pure procedures}

\begin{itemize}

\item The constraints for a pure procedure guarantee
freedom from side-effects, thus ensuring that it can be invoked
concurrently at each
`element' of an array (where an ``element'' may itself be a data-structure, 
including an array).

\item All constraints can be statically checked, thus providing safety
for the programmer.

Of course, a price that must be paid for this additional security is
that the constraints must be quite rigorous, which means that it
is possible to write a function that is side-effect free in behaviour
but which nevertheless fails to satisfy the constraints 
(e.g.\ a function that contains an assignment to a global variable,
but in a branch that is not executed in any invocation of the function
during a particular program execution).


\item It is expected that most High Performance Fortran library 
procedures will conform to the constraints required of pure procedures
(by the very nature of library procedures), and so can be declared pure 
and referenced in FORALL statements and constructs (if they are functions) 
and within user-defined pure procedures.  It is also anticipated that 
most library procedures will not reference global data, whose use may 
sometimes inhibit concurrent execution (see below).

The constraints on pure procedures are limited to those necessary 
for statically checkable side-effect freedom and the elimination 
of saved internal state.  Subject to these restrictions, maximum 
functionality has been preserved in the definition of pure procedures.
This has been done to make elemental reference and function calls in 
FORALL as widely available as possible, and so that quite general library 
procedures can be classified as pure.  

A drawback of this flexibility is that pure procedures permit certain 
features whose use may hinder, and in the worst case prevent, concurrent 
execution in FORALL and elemental references (that is, such references 
may have to be implemented by sequentialisation).  
Foremost among these features are the access of global data, particularly 
distributed global data, and the fact that the arguments and, for a pure 
function, the result may be pointers or data structures with pointer 
components, including recursive data structures such as lists and trees.
The programmer should be aware of the potential performance penalties 
of using such features.


\item An earlier draft of this proposal contained a constraint disallowing 
pure procedures from accessing global data objects, particularly
distributed data objects.
This constraint has been dropped as inessential to the side-effect freedom 
that the HPF committee requested.
However, it may well be that some machines will have great difficulty 
implementing FORALL without this constraint.


\item One of us (JHM) is still in favour of disallowing access to global 
variables for a number of reasons: 
\begin{enumerate}
\item Aesthetically, it is in keeping with the
nature of a `pure' function, i.e. a function in the mathematical
sense, and in practical terms it imposes no real restrictions on the 
programmer, as global data can be passed-in via the argument list; 
\item Without this constraint HPF programs can no longer be implemented 
by pure message-passing, or at least not efficiently, i.e. without
sequentialising FORALL statements containing function calls and greatly
complicating their implementation; 
\item Absence of this restriction may inhibit optimisation of FORALLs
and array assignments, as the optimisation of assigning the {\it expr\/}
directly to the assignment variable rather than to a temporary intermediate
array now requires interprocedural analysis rather than just local 
analysis.
\end{enumerate}

\end{itemize}

\subsubsection{Elemental references}

\begin{itemize}

\item The original draft proposed allowing pure procedures 
to be invoked elementally even if their dummy arguments or results 
were array-valued.  These provisions have been dropped to avoid 
promoting storage order to a higher level in Fortran~90
(i.e.\ to avoid introducing the concept of `arrays-if-arrays', 
which Fortran~90 seems to strenuously avoid!)   In practical terms,
the current proposal provides the same functionality as the original 
one for functions, though not for subroutines.  If a programmer wants 
elemental function behaviour, but also wants the `elements' to be
array-valued, this can be achieved using FORALL.

\item In typical FORALL or elemental implementation, a pure procedure 
would be called independently in each process, and its dummy arguments 
would be associated with `elements' local to that process. 
This is the reason for disallowing data mapping directives for
local and dummy variables within the bodies of such procedures.
Note that, particularly in elemental invocations, the actual arguments
can be distributed arrays which need not be `co-distributed'; if not,
a typical implementation would in general perform all data communications 
prior to calling the procedure, and would then pass-in the required 
elements locally via its argument list.

However, access to large global data structures such as look-up tables
is often useful within functions that are otherwise mathematically pure,
and these are allowed to be distributed.

\end{itemize}


\section{The INDEPENDENT Directive}

\label{do-independent}

\footnote{Version of August 20, 1992
 - Guy Steele, Thinking Machines Corporation, and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992; however, the INDEPENDENT subgroup was directed to
examine methods of allowing reductions to be performed within
INDEPENDENT constructs.}
The INDEPENDENT directive can procede a DO loop or FORALL statement or
construct.
Intuitively, it asserts to the compiler that the operations in the
following construct
may be executed independently--that is, in any order, or
interleaved, or concurrently--without changing the semantics
of the program.

The syntax of the INDEPENDENT directive is
                                                  \BNF
independent-dir	\IS	!HPF$INDEPENDENT [ (integer-variable-list) ]
                                                  \FNB

\noindent
Constraint: An {\it independent-dir\/} must immediately precede a DO or FORALL
statement.

\noindent
Constraint: If the {\it integer-variable-list\/} is present, then the
variables named must be the index variables of set of perfectly nested
DO loops or indices from the same FORALL header.

The directive is said to apply to the indices named in its {\it
integer-variable-list}, or equivalently to the loops or FORALL indexed
by those variables.
If no {\it integer-variable-list\/} is present, then it is as if it
were present and contained the index variable for the DO or FORALL
imediately following the directive.


When applied to a nest of DO loops, an INDEPENDENT directive is an
assertion by the programmer that no iteration may affect any other
iteration, either directly or indirectly.
This implies that there are no no exits from the construct other than
normal loop termination, and no I/O is performed by the loop.
A sufficient condition for ensuring this is that
during
the execution of the loop(s), no iteration assigns to any scalar
data object which is 
accessed (i.e.\ read or written) by any other iteration.
The directive is purely advisory and a compiler is free
to ignore them if it cannot make use of the information.


For example:
                                                  \CODE
!HPF$INDEPENDENT
      DO I=1,100
        A(P(I)) = B(I)
      END DO
                                                  \EDOC
asserts that the array P does not have any repeated entries (else they
would cause interference when A was assigned).
It also limits how A and B may be storage associated.
(The remaining examples in this
section assume that no variables are storage or sequence associated.)

Another example:
                                                  \CODE
!HPF$INDEPENDENT (I1,I2,I3)
      DO I1 = 1,N1
        DO I2 = 1,N2
          DO I3 = 1,N3
            DO I4 = 1,N4   !The inner loop is not independent!
              A(I1,I2,I3) = A(I1,I2,I3) + B(I1,I2,I4)*C(I2,I3,I4)
            END DO
          END DO
        END DO
      END DO
                                                  \EDOC
The inner loop is not independent because each element of A is
assigned repeatedly.
However, the three outer loops are independent because they access
different elements of A.
It is not relevant that the outer loops read the same elements from B
and C, because those arrays are not assigned.

The interpretation of INDEPENDENT for FORALL is similar to that for
DO: it asserts that no combination of the indices that INDEPENDENT
applies to may affect another combination.
This is only possible if one combination of index values assigns to a
scalar data object accessed by another
combination.
A DO and a FORALL with the same body are equivalent if they both
have the INDEPENDENT directive.
In the case of a FORALL, any of the variables may be mentioned in the
INDEPENDENT directive:
                                                                \CODE
!HPF$INDEPENDENT (I1,I3)
    FORALL(I1=1:N1,I2=1:N2,I3=1:N3) 
      A(I1,I2,I3) = A(I1,I2-1,I3)
    END FORALL
                                                                \EDOC
This means that for any given values for I1 and I3,
all the right-hand sides for all values of I2 must
be computed before any assignment are done for that
specific pair of (I1,I3) values; but assignments for
one pair of (I1,I3) values need not wait for rhs
evaluation for a different pair of (I1,I3) values.

Graphically, the INDEPENDENT directive can be visualized as
eliminating edges from a precedence graph representing the program.
Figure~\ref{fig-dep} shows the dependences that may normally be
present in a DO an a FORALL.
An arrow from a left-hand-side node (for example, ``lhsa(1)'') 
to a right-hand-side node (e.g. ``rhsb(1)'') means that the RHS
computation may use values assigned in the LHS nodel; thus the
right-hand side must be computed after the left-hand side completes
its store.
Similarly, an arrow from a RHS node to a LHS node means that the LHS
may overwrite a value needed by the RHS computation, again forcing an
ordering.
Edges from the ``BEGIN'' and to the ``END'' nodes represent control
dependences.
The INDEPENDENT directive asserts that the only dependences that a
compiler need enforce are those in Figure~\ref{fig-indep}.
That is, the programmer who uses INDEPENDENT is certifying that if the
compiler only enforces these edges, then the resulting program will be
equivalent to the one in which all the edges are present.
Note that the set of asserted dependences is identical for INDEPENDENT
DO and FORALL constructs.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Here come the pictures!
%

{

%length for use in pictures
\setlength{\unitlength}{0.03in}

%nodes used in all pictures
\newsavebox{\nodes}
\savebox{\nodes}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(50.0,100){\makebox(0,0){BEGIN}}
    \put(20.0,80.0){\makebox(0,0){rhsa(1)}}
    \put(50.0,80.0){\makebox(0,0){rhsa(2)}}
    \put(80.0,80.0){\makebox(0,0){rhsa(3)}}
    \put(20.0,60.0){\makebox(0,0){lhsa(1)}}
    \put(50.0,60.0){\makebox(0,0){lhsa(2)}}
    \put(80.0,60.0){\makebox(0,0){lhsa(3)}}
    \put(20.0,40.0){\makebox(0,0){rhsb(1)}}
    \put(50.0,40.0){\makebox(0,0){rhsb(2)}}
    \put(80.0,40.0){\makebox(0,0){rhsb(3)}}
    \put(20.0,20.0){\makebox(0,0){lhsb(1)}}
    \put(50.0,20.0){\makebox(0,0){lhsb(2)}}
    \put(80.0,20.0){\makebox(0,0){lhsb(3)}}
    \put(50.0,0){\makebox(0,0){END}}
    \put(50.0,100){\oval(25,5)}
    \put(20.0,80.0){\oval(20,5)}
    \put(50.0,80.0){\oval(20,5)}
    \put(80.0,80.0){\oval(20,5)}
    \put(20.0,60.0){\oval(20,5)}
    \put(50.0,60.0){\oval(20,5)}
    \put(80.0,60.0){\oval(20,5)}
    \put(20.0,40.0){\oval(20,5)}
    \put(50.0,40.0){\oval(20,5)}
    \put(80.0,40.0){\oval(20,5)}
    \put(20.0,20.0){\oval(20,5)}
    \put(50.0,20.0){\oval(20,5)}
    \put(80.0,20.0){\oval(20,5)}
    \put(50.0,0){\oval(25,5)}
    \put(50,97.5){\vector(-2,-1){30}}
    \put(50,97.5){\vector(0,-1){15}}
    \put(50,97.5){\vector(2,-1){30}}
    \put(20,17.5){\vector(2,-1){30}}
    \put(50,17.5){\vector(0,-1){15}}
    \put(80,17.5){\vector(-2,-1){30}}
    \end{picture}
}

\begin{figure}

\begin{minipage}{2.70in}
\CODE
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\web}
\savebox{\web}{
    \begin{picture}(60,15)(0,0)
    \put(0,15){\vector(0,-1){15}}
    \put(0,15){\vector(2,-1){30}}
    \put(0,15){\vector(4,-1){60}}
    \put(30,15){\vector(-2,-1){30}}
    \put(30,15){\vector(0,-1){15}}
    \put(30,15){\vector(2,-1){30}}
    \put(60,15){\vector(0,-1){15}}
    \put(60,15){\vector(-2,-1){30}}
    \put(60,15){\vector(-4,-1){60}}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,62.5){\usebox\web}
\put(20,42.5){\usebox\web}
\put(20,22.5){\usebox\web}
\end{picture}
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\chain}
\savebox{\chain}{
    \begin{picture}(20,70)(0,0)
    \put(2.5,2.5){\oval(5,5)[bl]}
    \put(2.5,0){\vector(1,0){5}}
    \put(7.5,2.5){\oval(5,5)[br]}
    \put(10,2.5){\vector(0,1){32.5}}
    \put(10,35){\line(0,1){32.5}}
    \put(12.5,67.5){\oval(5,5)[tl]}
    \put(12.5,70){\vector(1,0){5}}
    \put(17.5,67.5){\oval(5,5)[tr]}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,77.5){\vector(0,-1){15}}
\put(20,57.5){\vector(0,-1){15}}
\put(20,37.5){\vector(0,-1){15}}
\put(25,15){\usebox\chain}
\put(50,77.5){\vector(0,-1){15}}
\put(50,57.5){\vector(0,-1){15}}
\put(50,37.5){\vector(0,-1){15}}
\put(55,15){\usebox\chain}
\put(80,77.5){\vector(0,-1){15}}
\put(80,57.5){\vector(0,-1){15}}
\put(80,37.5){\vector(0,-1){15}}
\end{picture}
\end{minipage}

\caption{Dependences in DO and FORALL without
INDEPENDENT assertions}
\label{fig-dep}
\end{figure}

\begin{figure}

%Draw the picture once, use it twice
\newsavebox{\easy}
\savebox{\easy}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(10,-2.5){\usebox\nodes}
    \put(20,77.5){\vector(0,-1){15}}
    \put(20,57.5){\vector(0,-1){15}}
    \put(20,37.5){\vector(0,-1){15}}
    \put(50,77.5){\vector(0,-1){15}}
    \put(50,57.5){\vector(0,-1){15}}
    \put(50,37.5){\vector(0,-1){15}}
    \put(80,77.5){\vector(0,-1){15}}
    \put(80,57.5){\vector(0,-1){15}}
    \put(80,37.5){\vector(0,-1){15}}
    \end{picture}
}

\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\usebox\easy
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\usebox\easy
\end{minipage}

\caption{Dependences in DO and FORALL with
INDEPENDENT assertions}
\label{fig-indep}
\end{figure}

}

%
%
% End of pictures
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


The compiler is justified in producing
a warning if it can prove that one of these assertions is incorrect.
It is not required to do so, however.
A program containing any false assertion of this type is not
standard-conforming, and the compiler may take any action it deems necessary.


This directive is of course similar to the DOSHARED directive
of Cray MPP Fortran.  A different name is offered here to avoid
even the hint of commitment to execution by a shared memory machine.
Also, the "mechanism" syntax is omitted here, though we might want
to adopt it as further advice to the compiler about appropriate
implementation strategies, if we can agree on a desirable set
of options.


\section{Other Proposals}

The following are proposals made for modification or replacement of the 
above sections.

\subsection{A Proposal for MIMD Support in HPF}

\label{mimd-support}
	          

\subsubsection{Abstract}

\footnote{Version of July 18, 1992 - Clemens-August Thole, GMD I1.T.
In the interest of time, these features were not considered for inclusion 
in the first round of HPFF.}
This proposal tries to supply sufficient language support in order 
to deal with loosely sysnchronous programs, some of which have been 
identified in my "A vote for explicit MIMD support".
This is a proposal for the support of MIMD parallelism, which extends
Section~\ref{do-independent}. 
It is more oriented
towards the CRAY - MPP Fortran Programming Model and the PCF proposal. 
The fine-grain synchronization of PCF is not proposed for implementation.
Instead of the CRAY-mechanims for assigning work to processors an 
extension of the ON-clause is used.
Due to the lack of fine-grain synchronization the constructs can be
executed
on SIMD or sequential architectures just by ignoring the additional
information.


\subsubsection{Summary of the current situation of MIMD support as part of
HPF}

According to the Charles Koelbel's (Rice) mail dated March 20th "Working
Group 4 -
Issues for discussion" MIMD-support is a topic for discussion within
working
group 4. 

Dave Loveman (DEC) has produced a document on FORALL statements 
(inorporated in Sections~\ref{forall-stmt} and \ref{forall-construct})
which
summarizes the discussion. Marc Snir proposed some extensions. These
constructs allow to describe SIMD extensions in an extended way compared
to array assignments. 

A topic for working papers is the interface of HPF Fortran to program units
which execute in SPMD mode. Proposals for "Local Subroutines" have been
made
by Marc Snir and Guy Steele
(Chapter~\ref{foreign}). Both proposals
define local subroutines as program units, which are executed by all
processors independent of each other. Each processor has only access
to the data contained in its local memory. Parts of distributed data
objects
can be accessed and updated by calls to a special library. Any
message-passing
library might be used for synchronization and communication.
This approach does not really integrate MIMD-support into HPF programming.

The MPP Fortran proposal by Douglas M. Pase, Tom MacDonald, Andrew Meltzer
(CRAY)
contained the following features in order to support integrated MIMD
features:
\begin{itemize}
   \item  parallel directive
   \item  shared loops 
   \item  private variables
   \item  barrier synchronization
   \item  no-barrier directive for removing synchronization
   \item  locks, events, critical sections and atomic update
   \item  functions, to examine the mapping of data objects.
\end{itemize}

Steele's "Proposal for loops in HPF" (02.04.92) included a proposal for a 
directive "!HPF$ INDEPENDENT( integer_variable_list)", which specifies
for the next set of nested loops, that the loops with the specified
loop variables can be executed independent from each other.
(Sectin~\ref{do-independent} is a short version of this proposal.) 

Charles Koelbel gave an overview on different styles for parallel loops
in "Parallel Loops Position Paper". No specific proposal was made.

Min-You Wu "Proposal for FORALL, May 1992" extended Guy Steele's 
"!HPF$ INDEPENDENT" proposal to use the directive in a block style.

Clemens-August Thole "A vote for explicit MIMD support" contains 3 examples
from different application areas, which seem to require MIMD support for
efficient execution. 

\paragraph{Summary}

In contrast to FORALL extensions MIMD support is currently not
well-established
as part of HPF Fortran. The examples in "A vote for explicit MIMD support"
show clearly the need for such features. Local subroutines do not fulfill
the requirements because they force to use a distributed memory programming
model,
which should not be necessary in most cases.

With the exception of parallel sections all interesting features
are contained in the MPP-proposal. I would like to split the discussion
on specifying parallelism, synchronization and mapping into three different
topics. Furthermore I would like to see corresponding features to be
expessed
in the style of of the current X3H5 proposal, if possible, in order to
be in line with upcoming standards.


\subsubsection{Proposal for MIMD support}

In order to support the spezification of MIMD-type of parallelism the
following
features are taken from the "Fortran 77 Binding of X3H5 Model for 
Parallel Programming Constructs": 
\begin{itemize}
    \item   PARALLEL DO construct/directive
    \item   PARALLEL SECTIONS worksharing construct/directive
    \item   NEW statement/directive
\end{itemize}

These constructs are not used with PCF like options for mapping or 
sysnchronisation but are combined with the ON clause for mapping operations
onto the parallel architecture. 

\paragraph{PARALLEL DO}

\subparagraph{Explicit Syntax}

The PARALLEL DO construct specifies parallelism among the 
iterations of a block of code. The PARALLEL DO construct has the same
syntax as a DO statement. For a directive approach the directive
!HPF$ PARALLEL can be used in front of a do statement.
After the PARALLEL DO statement a new-declaration may be inserted.

A PARALLEL DO construct might be nested with other parallel constructs. 

\subparagraph{Interpretation}

The PARALLEL DO is used to specify parallel execution of the iterations of
a block of code. Each iteration of a PARALLEL DO is an independent unit
of work. The iterations of PARALLEL DO must be data independent. Iterations
are data independent if the storage sequence accociated with each variable
are array element that is assigned a value by each iteration is not
referenced
by any other iteration. 

A program is not HPF conforming, if for any iteration a statement is
executed,
which causes a transfer of control out of the block defined by the PARALLEL
DO construct. 

The value of the loop index of a PARALLEL DO is undefined outside the scope
of the PARALLEL DO construct. 


\paragraph{PARALLEL SECTIONS}

The parallel sections construct is used to specify parallelism among
sections
of code.

\subparagraph{Explicit Syntax}


                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        !HPF$ SECTION
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
structured as
                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        [new-declaration-stmt-list]
        [section-block]
        [section-block-list]
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
where [section-block] is
                                                              \CODE
        !HPF$ SECTION
        [execution-part]
                                                              \EDOC

\subparagraph{Interpretation}

The parallel sections construct is used to specify parallelism among
sections
of code. Each section of the code is an independent unit of work. A program
is not standard conforming if during the execution of any parallel sections
construct a transfer of control out of the blocks defined by the Parallel
Sections construct is performed. 
In a standard conforming program the sections of code shall be data 
independent. Sections are data independent if the storage sequence
accociated 
with each variable are array element that is assigned a value by each
section
is not referenced by any other section. 


\paragraph{Data scoping}

Data objects, which are local to a subroutine, are different between 
distinct units of work, even if the execute the same subroutine.


\paragraph{NEW statement/directive}

The NEW statement/directive allows the user to generate new instances of 
objects with the same name as an object, which can currently be referenced.


\subparagraph{Explicit Syntax}

A [new-declaration-stmt] is
                                                                \CODE
       !HPF$ NEW variable-name-list
                                                                \EDOC

\subparagraph{Coding rules}

A [varable-name] shall not be
\begin{itemize} 
\item    the name of an assumed size array, dummy argument, common block, 
function or entry point
\item    of type character with an assumed length
\item    specified in a SAVE of DATA statement
\item    associated with any object that is shared for this parallel
construct.
\end{itemize}

\subparagraph{Interpretation}
 
Listing a variable on a NEW statement causes the object to be explicitly
private for the parallel construct. For each unit of work of the parallel 
construct a new instance of the object is created and referenced with the
specific name. 


\subsection{Nested WHERE statements}

\label{nested-where}

\footnote{Version of September 15, 1992 - Guy Steele, Thinking Machines 
Corporation.  This section has not been discussed.}
Here is the text of a proposal once sent to X3J3:
\begin{quote}
Briefly put, the less WHERE is like IF, the more difficult it is to
translate existing serial codes into array notation.  Such codes tend to
have the general structure of one or more DO loops iterating over array
indices and surrounding a body of code to be applied to array elements.
Conversion to array notation frequently involves simply deleting the DO
loops and changing array element references to array sections or whole
array references.  If the loop body contains logical IF statements, these
are easily converted to WHERE statements.  The same is true for translating
IF-THEN constructs to WHERE constructs, except in two cases.  If the IF
constructs are nested (or contain IF statements), or if ELSE IF is used,
then conversion suddenly becomes disproportionately complex, requiring the
user to create temporary variables or duplicate mask expressions and to use
explicit .AND. operators to simulate the effects of nesting.

Users also find it confusing that ELSEWHERE is syntactically and
semantically analogous to ELSE rather than to ELSE IF.

We propose that the syntax of WHERE constructs be extended and
changed to have the form
                                                                \BNF
where-construct       \IS  where-construct-stmt
 				    [ where-body-construct ]...
 				  [ elsewhere-stmt
 				    [ where-body-construct ]... ]...
 				  [ where-else-stmt
 				    [ where-body-construct ]... ]
 				  end-where-stmt
 
 	where-construct-stmt  \IS  WHERE ( mask-expr )
 
 	elsewhere-stmt        \IS  ELSE WHERE ( mask-expr )
 
 	where-else-stmt       \IS  ELSE WHERE
 
 	end-where-stmt        \IS  END WHERE
 
 	mask-expr             \IS  logical-expr
 
 	where-body-construct  \IS  assignment-stmt
 			      \IS  where-stmt
 			      \IS  where-construct
                                                                \FNB                                                     	

\noindent Constraint: In each assignment-stmt, the mask-expr and the variable
being defined must be arrays of the same shape.  If a
where-construct contains a where-stmt, an elsewhere-stmt,
or another where-construct, then the two mask-expr's must
be arrays of the same shape.
 
The meaning of such statements may be understood by rewrite rules.  First
one may eliminate all occurrences of ELSE WHERE:
                                                                \CODE
WHERE (m1)		
    xxx			
ELSE WHERE (m2)		
    yyy				
END WHERE
	                                                            \EDOC
becomes
                                                                \CODE
WHERE (m1)
    xxx
ELSE
    WHERE (m2)
        yyy
    END WHERE
END WHERE
                                                                \EDOC
where xxx and yyy represent any sequences of statements, so long as the
original WHERE, ELSE WHERE, and END WHERE match, and the ELSE WHERE is the
first ELSE WHERE of the construct (that is, yyy may include additional ELSE
WHERE or ELSE statements of the construct).  Next one eliminates ELSE:
                                                                \CODE
WHERE (m)
    xxx
ELSE
    yyy
END WHERE				WHERE (.NOT. temp)
                                                                \EDOC
becomes
                                                                \CODE
temp = m
WHERE (temp)
    xxx
END WHERE
WHERE (.NOT. temp)
    yyy
END WHERE
                                                                \EDOC

Finally one eliminates nested WHERE constructs:
                                                                \CODE
WHERE (m1)
    xxx
    WHERE (m2)
        yyy
    END WHERE
    zzz
END WHERE
                                                                \EDOC
becomes
                                                                \CODE
temp = m1
WHERE (temp)
    xxx
END WHERE
WHERE (temp .AND. (m2))
    yyy
END WHERE
WHERE (temp)
    zzz
END WHERE
                                                                \EDOC
and similarly for nested WHERE statements.

The effects of these rules will surely be a familiar or obvious possibility
to all the members of the committee; I enumerate them explicitly here only
so that there can be no doubt as to the meaning I intend to support.

Such rewriting rules are simple for a compiler to apply, or the code may
easily be compiled even more directly.  But such transformations are
tedious for our users to make by hand and result in code that is
unnecessarily clumsy and difficult to maintain.

One might propose to make WHERE and IF even more similar by making two
other changes.  First, require the noise word THERE to appear in a WHERE
and ELSE WHERE statement after the parenthesized mask-expr, in exactly the
same way that the noise word THEN must appear in IF and ELSE IF statements.
(Read aloud, the results might sound a trifle old-fashioned--"Where knights
dare not go, there be dragons!"--but technically would be as grammatically
correct English as the results of reading an IF construct aloud.)  Second,
allow a WHERE construct to be named, and allow the name to appear in ELSE
WHERE, ELSE, and END WHERE statements.  I do not feel very strongly one way
or the other about these no doubt obvious points, but offer them for your
consideration lest the possibilities be overlooked.
\end{quote}

Now, for compatibility with Fortran 90, HPF should continue to
use ELSEWHERE instead of ELSE, but this causes no ambiguity:

      WHERE(...)
	...
      ELSE WHERE(...)
	...
      ELSEWHERE
	...
      END WHERE

is perfectly unambiguous, even when blanks are not significant.
Since X3J3 declined to adopt the keyword THERE, it should not be
used in HPF either (alas).

\alternative A
\subsection{
A Proposal for EXECUTE-ON Directive in HPF
}

\label{on-clause}

\footnote{Version of 
September 14, 1992
--
Tin-Fook Ngai,
Hewlett-Packard Laboratories.
This section has not been disussed.}
The proposed EXECUTE-ON directive is used to suggest where an iteration of
a DO construct or an indexed parallel assignment should be executed.  The
directive informs the compiler which data access should be local and
which data access may be remote.  


\subsubsection{Syntax}
                                                                \BNF
on-clause \IS !HPF$ EXECUTE (subscript-list) ON align-spec 
                [; LOCAL array-name-list]
                                                                \FNB

\noindent Constraint:
Each point in the index space must be executed on only one template node.


\subsubsection{Usage}

The EXECUTE-ON directive must immediately precede the corresponding DO
loop body, array assignment, FORALL statement, FORALL construct or
individual assignment statement in a FORALL construct.


\subsubsection{Interpretation}

The subscript-list identifies a distinct iteration index or an indexed
parallel assignment.  The align-spec identifies a template node.  The
EXECUTE-ON directive suggests that the iteration or parallel assignment
should be executed on the processor to where the template node is
mapped.  The optional LOCAL directive informs the compiler that all
data accesses to the specified array-name-list can be handled as local
data accesses if the related HPF data mapping directives are honored.


\subsubsection{Examples}

\paragraph{Example 1}
                                                                \CODE
      REAL A(N), B(N)
!HPF$ TEMPLATE T(N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ DISTRIBUTE T(CYCLIC(2))

      !HPF$ INDEPENDENT            
      DO I = 1, N/2 
      !HPF$ EXECUTE (I) ON T(2*I); LOCAL A, B, C
      ! we know that P(2*I-1) and P(2*I) is a permutation
      ! of 2*I-1 and 2*I
        A(P(2*I - 1)) = B(2*I - 1) + C(2*I - 1)    
        A(P(2*I)) = B(2*I) + C(2*I)
      END DO
                                                                \EDOC

\paragraph{Example 2}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

      !HPF$ EXECUTE (I,J) ON T(I+1,J-1)
      FORALL (I=1:N-1, J=2:N)   A(I,J) = A(I+1,J-1) + B(I+1,J-1)
                                                                \EDOC

\paragraph{Example 3}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

      !HPF$ EXECUTE (I,J) ON T(I,J)       
      ! applies to the entire FORALL construct
      FORALL (I=1:N-1, J=2:N) 
          A(I,J) = A(I+1,J-1) + B(I+1,J-1)
          B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
                                                                \EDOC

\paragraph{Example 4}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

      FORALL (I=1:N-1, J=2:N) 
      !HPF$ EXECUTE (I,J) ON T(I,J)
      ! applies only to the following assignment
          A(I,J) = A(I+1,J-1) + B(I+1,J-1)
          B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
                                                                \EDOC

\paragraph{Example 5}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

      !HPF$ EXECUTE (I,J) ON T(I+1,J-1)
      A(1:N-1,2:N) = A(2:N,1:N-1) + B(2:N,1:N-1)
                                                                \EDOC
   
\paragraph{Example 6} 

The original program for this example is due to Michael Wolfe of Oregon 
Graduate Institute.

This program performs matrix multiplication \(C = A \times B\)
In each step, array B is rotated by row-blocks, multiplied
diagonal-block-wise in parallel with A, results are accumulated in C 

Note that without the EXECUTE-ON and LOCAL directive, the compiler
will have a hard time to figure out all A, B and C accesses are 
actual local, thus unable to generate the best efficient code 
(i.e. communication-free and no runtime checking in the parallel 
loop body).
 
                                                                \CODE
      REAL A(N,N), B(N,N), C(N,N)
!HPF$ REALIGNABLE B

!* A,B,C are distributed by row blocks
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN (:,:) WITH T:: A, B, C      
!HPF$ DISTRIBUTE T(BLOCK,*)

      NOP = NUMBER_OF_PROCESSORS()
      IB = N/NOP

      DO IT = 0, NOP-1

!* assuming warp around realignment
!HPF$ REALIGN B(I,J) WITH T(I-IT*IB,J)  

        !HPF$ INDEPENDENT
        DO IP = 0, NOP-1     
          !HPF$ EXECUTE (IP) ON T(IP*IB+1,1); LOCAL A, B, C
          ITP = MOD( IT+IP, NOP )

          DO I = 1, IB
            DO J = 1, N
              DO K = 1, IB
                C(IP*IB+I,J) = C(IP*IB+I,J) +             &
                  A(IP*IB+I,ITP*IB+K)*B(ITP*IB+K,J)
              ENDDO  !* K
            ENDDO  !* J
          ENDDO  !* I
        ENDDO  !* IP

      ENDDO  !* IT
                                                                \EDOC

\subsubsection{Commentary}

The following is a discussion between Henk Sips and Tin-Fook Ngai from 
the mailing list.  It is included to clarify some issues in the preceeding.

\begin{enumerate}
\item Sips: The execution model of HPFF (not completely approved yet) states: 
\begin{quote}
The code compiled by an HPF compiler ought do no worse than code
compiled using the owner compute rule.
\end{quote}
This is more relaxed than saying "it uses the owner compute rule". Your
EXECUTE ON is much more specific towards fixing execution on a specified
processor. 

Ngai: What are you objecting to? The EXECUTE ON is a directive.

\item Sips: A template is not executed, so one can't say EXECUTE x ON T. 
Something like 
EXECUTE\_ON\_HOME T should be adopted (Since templates are currently no
objects one could even deny this)

Ngai: Agree.  I never feel comfortable with the key words I used.  I don't have
objection to ``EXECUTE x ON\_HOME T''.  Any other suggestions are also
welcome.


\item Sips: Adopting EXECUTE x ON on DO loops without any indepence requirements, as
your proposal seems to allow, can yield all kind of intricate
synchronization schemes, when iterations are not independent (or must be
assumed to be dependent). This seems to go further than the first simple
step, which HPFF ought to be.

Ngai: Clearly, the proposed feature is primarily intended for INDEPENDENT DO,
FORALL and other parallel indexed assignments.  Before making the
proposal, I have also thought about ordinary DO loops as you pointed out
here. Code generation seems not a problem: If the user choose to specify
execution location of an iteration of an ordinary DO loops, simple
compilation requires only one synchronication at the end of each iteration
to ensure the DO sequential semantics. This naive compilation looks dumb
but the user may still gain due to the already data distribution.  (A
smarter compiler of course can do a better job but definitely is not
required.)  That is why I don't restrict EXECUTE ON to INDEPENDENT DOs and
make the rule simpler.


\item Sips: Binding iterations to templates can currently only be done statically,
since 
the current draft does not allow dynamic templates. So iterations
boundaries must be known at compile time. One has to apply the subroutine
trick to allow this, which is not very neat.

Ngai: That is the intention of the proposal:  Only static binding is allowed.
Even the loop index is bounded by runtime variable, the binding to
template node is still static.


\item Sips: Allowing EXECUTE x ON on groups of statements, gives a scoping issue, so
there should also be something like END EXECUTE x ON, do undo the
annotation. 

Ngai: The current proposal seems sufficient in this issue.  The scope for single
statement (FORALL statement, single statement in FORALL construct, and array
assignment statement) is clear.  For groups of statements, EXECUTE ON can
only applies to either the entire body within a FORALL construct or the
entire iteration of a DO loop.


\item Sips: Again we have complicated scoping problems. How about this example:
                                                                \CODE
!HPF$ TEMPLATE T1(N), T2(N)

DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    DO J=1,N
      !HPF$ EXECUTE (J) ON T2(J)
      A(I,J) =  A(I,J) + B(I,J) 
    ENDDO
ENDDO
                                                                \EDOC
This example satisfies the constraint only if by entering the J-loop, the
I-index is dereferenced from the assertion just after the I-loop. Although
logical, it might be confusing to users. However, in the program
                                                                \CODE
!HPF$ TEMPLATE T1(N), T2(N)
DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    DO J=1,N
        A(I,J) =  A(I,J) + B(I,J) 
    ENDDO
ENDDO
                                                                \EDOC
there is no such dereferencing. 

Ngai: Good examples.  This bug surely needs to be fixed.  Here is my solution:
\begin{itemize}
\item For nested EXECUTE ON directives, only the immediate enclosed EXECUTE ON
  directive is effective.
\end{itemize}
In the former example, the statement ``C(I) = D(I)" will be executed on the
home of T1(I) while the statement ``A(I,J) =  A(I,J) + B(I,J)" will be
executed on home of T2(J) for all I.  In the latter case, the entire
I-loop body that includes the DO J loop is executed on the home of T1(I).

\item Sips: The wrap feature of templates will probably be deleted from the draft.
The same thing (shifting data each iteration) can reached by using CSHIFT
or the subroutine trick and making the template as large as the ieteration
space.

Ngai: I and Wolfe discussed the example (Example 6 in the proposal) long before
our revision on the wrap feature.  Sorry for any confusion from this
example.  (However, this example also illustrates the use of wrap in data
distribution -- we should come up with a cleaner solution next meeting.)

\item Sips: We cannot do separate compilation in some examples:
                                                                \CODE
!HPF$ TEMPLATE T1(N)
  DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    DO J=1,N
      A(I,J) = B(I,J)
    ENDDO
  ENDDO
                                                                \EDOC
Here A(I,J) is calculated in T(I). If we encapsulate the J-loop into a
subroutine we get something like:
                                                                \CODE
!HPF$ TEMPLATE T1(N)
  DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    CALL FOO(A(I,:),B(I,:))
  ENDDO

...

SUBROUTINE FOO(AA,BB)
!HPF$ ALIGN AA,BB with *
    DO J=1,N
      AA(J) = BB(J)
    ENDDO
                                                                \EDOC
The dummy arguments AA and BB are aligned to the incoming array sections.
Normally, without any EXECUTE\_ON, the subroutine would execute AA(J), which
is equivalent to A(I,J), on home A(I,J). However, with an EXECUTE\_ON in the
main program there is no way the subroutine can know that AA(J) should be
executed on T(I). The consequence is that any subroutine call is to *undo*
the EXECUTE\_ON on entry and {\em redo} the EXECUTE_ON on return.

(Ngai did not respond publically.)
\end{enumerate}

\alternative B
\subsection{Proposal for a Statement Grouping Syntax and ON Clause}

\footnote{Version of October 5, 1992 -- Clemens-August Thole, GMD-I1.T,
Sankt Augustin.
This section has not been discussed.}
I agree with Tin-Fook, that something like the on-clause should be 
contained in HPF. I brought a proposal with me to the last HPF meeting 
which was distributed by Chuck, but neither the FORALL working group
nor the plenary had time to discuss the proposal.

I would appreciate comments on the various features.

\subsubsection{Introduction}

This proposal introduces an extension to HPF to group several 
statements in order to be able to specify properties for a whole block
of statement at once. A block of statements is called HPF-section.
HPF-sections can be used to describe properties for independent execution
between blocks of statements aswell as the mapping of their execution.

For the specification of a specific mapping of the execution of statements
or HPF-sections the ON-clause is introduced. A subset of a template is used
as reference object onto which the statements are mapped in an canonical
manner. The careful selection of the reference template allows to specify,
how the execution of the code is mapped onto the parallel architecture.


\subsubsection{HPF-sections}

The HPF directives SECTIONS, SECTION, and END SECTIONS are used to specify
grouping of statements. SECTIONS and END SECTIONS specify the beginning
and end of a list of HPF-sections and SECTION the beginning of the next 
HPF-section. The syntax is as follows:
                                                                \BNF
hpf-block \IS        !HPF$ SECTION
                [HPF-section-list]
        !HPF$ END SECTIONS

hpf-section \IS        !HPF$ SECTION
                [execution-part]
                                                                \FNB
\noindent Constraint: For any {\em hpf-section} under no circumstances a 
transfer of control
is performed during the execution of the code outside of its 
{\em execution-part}.

\paragraph{Example}
                                                                \CODE
        !HPF$ SECTIONS
        !HPF$ SECTION
                A = A + B
                B = C + D
        !HPF$ SECTION
                E = B
                IF (E.GT.F) GOTO 10
                        E = 0D0
         10     CONTINUE
        !HPF$ END SECTIONS
                                                                \EDOC
This example specifies a list of two HPF-sections. The control statement in
the second HPF-section is valid because after the transfer of control the
execution continues in the same HPF-section.


\subsubsection{ON-clause}

The ON-clause specifies a subsection of a template, which is used as a reference
object for the execution of the next statement, construct, of HPF-section.
If the left-hand-side of an assignment coinsides in shape with the reference
object, the evaluation of the right-hand-side and the assignment for 
a specific element of the left-hand-side is performed at that processor, onto
which the corresponding element of the reference object is mapped.

\paragraph{Syntax}

Add the following rules:
                                                                \BNF
executable-construct \IS        !HPF$ ON on-spec
                executable-construct

hpf-section \IS        !HPF$ ON on-spec
                hpf-section
        
on-spec \IS        align-spec
                                                                \FNB
The {\it executable-construct} of {\it hpf-section} is called on-clause-target.

\paragraph{Constraints}
\begin{enumerate}
\item No {\it executable-construct} may be used as object of the on-clause, which
   generates any transfer of control out of the construct itself. This
   includes the entry-statement. 
\item {\it Statement-block}s used in constructs must fulfill the constraints of
   HPF-sections.
\item The shape of the {\it on-spec} must cover in each dimension the shape of
   of any left-hand-side of an assignment statement, which is target of an
   on-clause. If a "*" is used in the {\it on-spec}, this dimension is skipped
   for constructing the shape of the {\it on-spec}.
\item If an on-clause is contained in the on-clause-target, the new {\it on-spec}
   must be a subsection of the {\it on-spec} of the outer on-clause.
\end{enumerate}

\paragraph{Example}
                                                                \CODE
                REAL, DIMENSION(n) :: a, b, c, d
        !HPF$   TEMPLATE grid(n)
        !HPF$   ALIGN WITH grid :: a, b, c, d

        !HPF$   ON grid(2:n)
                a(1:n-1) = a(2:n) + b(2:n) + c(2:n)
                                                                \EDOC
The on-clause indicates, that the evaluation of the right-hand-side is 
performed on that processors, which hold the data elements of the 
right-hand-side. For the assignment to the left-hand-side data movement is
necessary.

\paragraph{Interpretation}

The interpretation of the on-clause depends on the type of the on-clause-target.

If the on-clause-target is an assignment statement the {\it on-spec} is used to
determine where the assignment statement is executed. If the shape of the 
right-hand-side is identically to the shape of {\it on-spec}, the computation for
a specific element of the assignment statement is performed where the 
corresponding element of the {\it on-spec} is mapped to. If the shape of the 
{\it on-spec} is larger, the compiler may use any sufficient larger subsection.
The use of "*" in the {\it on-spec} specifies, that the same computations are
mapped onto the corresponding line of processors and several processors
will do the same update. This may save communication operations.
The the case of the where-statement, the forall-statement, and the 
forall-construct the same mapping is applied to the evaluation of the 
conditions and each assignment.

If the on-clause is placed in front of the if-construct, that case-construct,
or the do-construct, the {\it on-spec} is used for the evaluations of the 
conditions as well as the loop bounds and the execution of the statement-blocks,
which are part of the construct. For the statement-blocks the interpretation 
rules for HPF-sections apply.

With respect to the allocate, deallocate, nullify, and I/O related statements
the {\it on-spec} is used for the evaluation of the parameters of the statements
and the evaluation of I/O objects. 

In the case of subroutine calls and functions the {\it on-spec} is used for the
evaluation of the parameters. It determines also the mapping of the resulting 
object. The {\it on-spec} determines also the set of processors, which will be
used for the evaluation of the subroutine. 

In the case of HPF-sections the on-clause is applied to each statement of the
execution part. Control transfer statements are allowed in this case and the 
constraints ensure, that the context on the same {\it on-spec} is not lost.

\paragraph{Additional example}
                                                                \CODE
        REAL, DIMENSION(n,n) :: a, b, c, d
!HPF$   TEMPLATE grid(n,n)
!HPF$   ALIGN WITH grid :: a, b, c, d

!HPF$   ON grid(2:n,2:n)
        DO i=2,n
!HPF$       ON grid(i,2:n)
            DO j=2,n
!HPF$           ON grid(i,j)
                a(i-1,j-1) = a(i,j) + b(i,j)*c(i,j)
            ENDDO
        ENDDO
                                                                \EDOC

\paragraph{Comment}

The compiler should be able to adjust the span of the loops to the local 
extent 
due to the restrictions on the specifiers of the sections of the {\it 
on-spec}.


\subsection{ALLOCATE in FORALL}

\label{forall-allocate}

\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation.
At the September 10-11 meeting, this was not included as part of the
FORALL because it seemed too big a leap from the allowed assignment
statements.}
Proposal:  ALLOCATE, DEALLOCATE, and NULLIFY statements may appear
	in the body of a FORALL.

Rationale: these are just another kind of assignment.  They may have
	a kind of side effect (storage management), but it is a
	benign side effect (even milder than random number generation).

Example:
                                                            \CODE
      TYPE SCREEN
        INTEGER, POINTER :: P(:,:)
      END TYPE SCREEN
      TYPE(SCREEN) :: S(N)
      INTEGER IERR(N)
      ...
!  Lots of arrays with different aspect ratios
      FORALL (J=1:N)  ALLOCATE(S(J)%P(J,N/J),STAT=IERR(J))
      IF(ANY(IERR)) GO TO 99999
                                                            \EDOC

\subsection{Generalized Data References}

\label{data-ref}

\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation.
This was not acted on at the September 10-11 meeting because the
FORALL subgroup wanted to minimize changes to the Fortran~90 standard.}
Proposal:  Delete the constraint in section 6.1.2 of the Fortran 90
	standard (page 63, lines 7 and 8):
\begin{quote}
	Constraint: In a data-ref, there must not be more than one
		part-ref with nonzero rank.  A part-name to the right
		of a part-ref with nonzero rank must not have the
		POINTER attribute.
\end{quote}

Rationale: further opportunities for parallelism.

Example:
                                                                     \CODE
TYPE(MONARCH) :: C(N), W(N)
      ...
! Munch that butterfly
C = C + W * A%P		! Illegal in Fortran 90
                                                                      \EDOC


\subsection{FORALL with INDEPENDENT Directives}
\label{begin-independent}

\footnote{Version of July 21, 1992) - Min-You Wu.
This was rejected at the FORALL subgroup meeting on September 9, 1992,
because it only offered syntactic sugar for capabilities already in
the FORALL INDEPENDENT.  It was also suggested that the BEGIN
INDEPENDENT syntax
should be reserved for other uses, such as MIMD features.}
This proposal is an extension of Guy Steele's INDEPENDENT proposal.
We propose a block FORALL with the directives for independent 
execution of statements.  The INDEPENDENT directives are used
in a block style.  

The block FORALL is in the form of
                                                         \CODE
      FORALL (...) [ON (...)]
        a block of statements
      END FORALL
                                                         \EDOC
where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:
                                                         \CODE
!HPF$BEGIN INDEPENDENT
!HPF$END INDEPENDENT
                                                         \EDOC
The two directives must be used in pair.  
A sub-block of statements 
parenthesized in the two directives is called an {\em asynchronous} 
sub-block or {\em independent} sub-block.  
The statements that are 
not in an asynchronous sub-block are in {\em synchronized} sub-blocks
or {\em non-independent} sub-block.  
The synchronized sub-block is 
the same as Guy Steele's synchronized FORALL statement, and the 
asynchronous sub-block is the same as the FORALL with the INDEPENDENT 
directive.  
Thus, the block FORALL
                                                          \CODE
      FORALL (e)
        b1
!HPF$BEGIN INDEPENDENT
        b2
!HPF$END INDEPENDENT
        b3
      END FORALL
                                                           \EDOC
means roughly the same as
                                                           \CODE
      FORALL (e)
        b1
      END FORALL
!HPF$INDEPENDENT
      FORALL (e)
        b2
      END FORALL
      FORALL (e)
        b3
      END FORALL
                                                          \EDOC
														  
Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronous sub-block are completely independent.
The INDEPENDENT directives indicates to the compiler there is no 
dependence and consequently, synchronizations are not necessary.
It is users' responsibility to ensure there is no dependence
between instances in an asynchronous sub-block.
A compiler can do dependence analysis for the asynchronous sub-blocks
and issue an error message when there exists a dependence or a warning
when it finds a possible dependence.

\subsubsection{What does ``no dependence between instances" mean?}

It means that no true dependence, anti-dependence,
or output dependence between instances.
Examples of these dependences are shown below:
\begin{enumerate}
\item True dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i) = ... 
        ...  = x(i+1)
      END FORALL
                                                            \EDOC
Notice that dependences in FORALL are different from that in a DO loop.
If the above example was a DO loop, that would be an anti-dependence.

\item Anti-dependence:
                                                            \CODE
      FORALL (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END FORALL
                                                            \EDOC

\item Output dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i+1) = ... 
        x(i) = ...
      END FORALL
                                                            \EDOC
\end{enumerate}

Independent does not imply no communication.  One instance may access 
data in the other instances, as long as it does not cause a dependence.  
The following example is an independent block:
                                                            \CODE
      FORALL (i = 1:N)
!HPF$BEGIN INDEPENDENT
        x(i) = a(i-1)
        y(i-1) = a(i+1)
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC

\subsubsection{Statements that can appear in FORALL}

FORALL statements, WHERE-ELSEWHERE statements, some intrinsic functions 
(and possibly elemental functions and subroutines) can appear in the
FORALL:
\begin{enumerate}
\item FORALL statement
                                                            \CODE
      FORALL (I = 1 : N)
        A(I,0) = A(I-1,0)
        FORALL (J = 1 : N)
!HPF$BEGIN INDEPENDENT
          A(I,J) = A(I,0) + B(I-1,J-1)
          C(I,J) = A(I,J)
!HPF$END INDEPENDENT
        END FORALL
      END FORALL
                                                            \EDOC

\item WHERE
                                                            \CODE
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
\end{enumerate}


\subsubsection{Rationale}

\begin{enumerate}
\item A FORALL with a single asynchronous sub-block as shown below is 
the same as a do independent (or doall, or doeach, or parallel do, etc.).
                                                            \CODE
      FORALL (e)
!HPF$BEGIN INDEPENDENT
        b1
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
A FORALL without any INDEPENDENT directive is the same as a tightly 
synchronized FORALL.  We only need to define one type of parallel 
constructs including both synchronized and asynchronous blocks.  
Furthermore, combining asynchronous and synchronized FORALLs, we 
have a loosely synchronized FORALL which is more flexible for many 
loosely synchronous applications.

\item With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  The INDEPENDENT directives can act 
as barrier synchronizations.  One may suggest a smart compiler 
that can recognize dependences and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependences.  
When the compiler cannot determine whether there is a dependence, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.
\end{enumerate}


\end{document}

From chk@erato.cs.rice.edu  Wed Oct 14 19:17:42 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA07745); Wed, 14 Oct 92 15:56:21 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA14106); Wed, 14 Oct 92 15:56:09 CDT
Message-Id: <9210142056.AA14106@erato.cs.rice.edu>
To: hpff-forall@erato.cs.rice.edu
Word-Of-The-Day: nugatory : (adj) having little or no consequence
Subject: new draft proposal
Date: Wed, 14 Oct 92 15:56:07 -0500
From: chk@erato.cs.rice.edu

Below is the latest and greatest version of the HPF FORALL chapter.
Unless major errors or new proposals come up before the weekend (and I
have reason to believe they will), this will be the version presented
at the HPFF meeting next week.  If you have comments, please send them
to the group before next Tuesday so the meeting attendees have a
chance of hearing them.

Changes from the last draft:
1. Substantially rewritten PURE functions - thanks/blame goes to John
Merlin
2. Included ON clause proposals from Tin-Fook Ngai and Clemens Thole.
3. Included nested WHERE proposal from Guy Steele.
4. Minor reorganization of "Other Proposals"

						Chuck

---------------- cut here -------------

%chapter-head.tex

%Version of August 5, 1992 - David Loveman, Digital Equipment Corporation

\documentstyle[twoside,11pt]{report}
\pagestyle{headings}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin=.25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin=-.5in
\textwidth=6.0in
\textheight=9.0in
\parindent=2em

%the file syntax-macs.tex is physically included below

%syntax-macs.tex

%Version of July 29, 1992 - Guy Steele, Thinking Machines

\newdimen\bnfalign         \bnfalign=2in
\newdimen\bnfopwidth       \bnfopwidth=.3in
\newdimen\bnfindent        \bnfindent=.2in
\newdimen\bnfsep           \bnfsep=6pt
\newdimen\bnfmargin        \bnfmargin=0.5in
\newdimen\codemargin       \codemargin=0.5in
\newdimen\intrinsicmargin  \intrinsicmargin=3em
\newdimen\casemargin       \casemargin=0.75in
\newdimen\argumentmargin   \argumentmargin=1.8in

\def\IT{\it}
\def\RM{\rm}
\let\CHAR=\char
\let\CATCODE=\catcode
\let\DEF=\def
\let\GLOBAL=\global
\let\RELAX=\relax
\let\BEGIN=\begin
\let\END=\end


\def\FUNNYCHARACTIVE{\CATCODE`\a=13 \CATCODE`\b=13 \CATCODE`\c=13 \CATCODE`\d=13
		     \CATCODE`\e=13 \CATCODE`\f=13 \CATCODE`\g=13 \CATCODE`\h=13
		     \CATCODE`\i=13 \CATCODE`\j=13 \CATCODE`\k=13 \CATCODE`\l=13
		     \CATCODE`\m=13 \CATCODE`\n=13 \CATCODE`\o=13 \CATCODE`\p=13
		     \CATCODE`\q=13 \CATCODE`\r=13 \CATCODE`\s=13 \CATCODE`\t=13
		     \CATCODE`\u=13 \CATCODE`\v=13 \CATCODE`\w=13 \CATCODE`\x=13
		     \CATCODE`\y=13 \CATCODE`\z=13 \CATCODE`\[=13 \CATCODE`\]=13
                     \CATCODE`\-=13}

\def\RETURNACTIVE{\CATCODE`\
=13}

\makeatletter
\def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus 
 -.2ex}{2.3ex plus .2ex}{\large\sf}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-3.25ex plus -1ex minus 
 -.2ex}{1.5ex plus .2ex}{\large\sf}}
\def\alternative#1 #2#3{\def\@tempa{#1}\def\@tempb{A}\ifx\@tempa\@tempb\else
    \expandafter\@altbumpdown\string#2\@foo\fi
    #2{Version #1: #3}}
\def\@altbumpdown#1#2\@foo{\global\expandafter\advance\csname c@#2\endcsname-1}

\def\@ifpar#1#2{\let\@tempe\par \def\@tempa{#1}\def\@tempb{#2}\futurelet
    \@tempc\@ifnch}

\def\?#1.{\begingroup\def\@tempq{#1}\list{}{\leftmargin\intrinsicmargin}\relax
  \item[]{\bf\@tempq.} \@intrinsictest}
\def\@intrinsictest{\@ifpar{\@intrinsicpar\@intrinsicdesc}{\@intrinsicpar\relax}}
\long\def\@intrinsicdesc#1{\list{}{\relax
  \def\@tempb{ Arguments}\ifx\@tempq\@tempb
			  \leftmargin\argumentmargin
			  \else \leftmargin\casemargin \fi
  \labelwidth\leftmargin  \advance\labelwidth -\labelsep
  \parsep 4pt plus 2pt minus 1pt
  \let\makelabel\@intrinsiclabel}#1\endlist}
\long\def\@intrinsicpar#1#2\\{#1{#2}\@ifstar{\@intrinsictest}{\endlist\endgroup}}
\def\@intrinsiclabel#1{\setbox0=\hbox{\rm #1}\ifnum\wd0>\labelwidth
  \box0 \else \hbox to \labelwidth{\box0\hfill}\fi}
\def\Case(#1):{\item[{\it Case (#1):}]}
\def\ {\@ifnextchar({\def\@tempq{#1}\@intrinsicopt}{\item[#1]}}
\def\@intrinsicopt(#1){\item[{\@tempq} (#1)]}

\def\MATRIX#1{\relax
    \@ifnextchar,{\@MATRIXTABS{}#1,\@FOO, \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar;{\@MATRIXTABS{}#1,\@FOO; \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar:{\@MATRIXTABS{}#1,\@FOO: \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar.{\hfill\penalty1\null\penalty10000\hskip0pt plus 1filll
		  \@MATRIXTABS{}#1,\@FOO.\penalty-50\@gobble
  }{\@MATRIXTABS{}#1,\@FOO{ }\hskip0pt plus 1filll\penalty-1}}}}}

\def\@MATRIXTABS#1#2,{\@ifnextchar\@FOO{\@MATRIX{#1#2}}{\@MATRIXTABS{#1#2&}}}
\def\@MATRIX#1\@FOO{\(\left[\begin{array}{rrrrrrrrrr}#1\end{array}\right]\)}

\def\@IFSPACEORRETURNNEXT#1#2{\def\@tempa{#1}\def\@tempb{#2}\futurelet\@tempc\@ifspnx}

{
\FUNNYCHARACTIVE
\GLOBAL\DEF\FUNNYCHARDEF{\RELAX
    \DEFa{{\IT\CHAR"61}}\DEFb{{\IT\CHAR"62}}\DEFc{{\IT\CHAR"63}}\RELAX
    \DEFd{{\IT\CHAR"64}}\DEFe{{\IT\CHAR"65}}\DEFf{{\IT\CHAR"66}}\RELAX
    \DEFg{{\IT\CHAR"67}}\DEFh{{\IT\CHAR"68}}\DEFi{{\IT\CHAR"69}}\RELAX
    \DEFj{{\IT\CHAR"6A}}\DEFk{{\IT\CHAR"6B}}\DEFl{{\IT\CHAR"6C}}\RELAX
    \DEFm{{\IT\CHAR"6D}}\DEFn{{\IT\CHAR"6E}}\DEFo{{\IT\CHAR"6F}}\RELAX
    \DEFp{{\IT\CHAR"70}}\DEFq{{\IT\CHAR"71}}\DEFr{{\IT\CHAR"72}}\RELAX
    \DEFs{{\IT\CHAR"73}}\DEFt{{\IT\CHAR"74}}\DEFu{{\IT\CHAR"75}}\RELAX
    \DEFv{{\IT\CHAR"76}}\DEFw{{\IT\CHAR"77}}\DEFx{{\IT\CHAR"78}}\RELAX
    \DEFy{{\IT\CHAR"79}}\DEFz{{\IT\CHAR"7A}}\DEF[{{\RM\CHAR"5B}}\RELAX
    \DEF]{{\RM\CHAR"5D}}\DEF-{\@IFSPACEORRETURNNEXT{{\CHAR"2D}}{{\IT\CHAR"2D}}}}
}

%%% Warning!  Devious return-character machinations in the next several lines!
%%%           Don't even *breathe* on these macros!
{\RETURNACTIVE\global\def\RETURNDEF{\def
{\@ifnextchar\FNB{}{\@stopline\@ifnextchar
{\@NEWBNFRULE}{\penalty\@M\@startline\ignorespaces}}}}\global\def\@NEWBNFRULE
{\vskip\bnfsep\@startline\ignorespaces}\global\def\@ifspnx{\ifx\@tempc\@sptoken \let\@tempd\@tempa \else \ifx\@tempc
\let\@tempd\@tempa \else \let\@tempd\@tempb \fi\fi \@tempd}}
%%% End of bizarro return-character machinations.

\def\IS{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth  \hskip-\bnfalign
  \hbox to \bnfalign{\unhbox\@curfield\hfill}\hbox to \bnfopwidth{\bf is \hfill}}
\def\OR{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth \hbox to \bnfopwidth{\bf or \hfill}}
\def\R#1 {\hbox to 0pt{\hskip-\bnfmargin R#1\hfill}}
\def\XBNF{\FUNNYCHARDEF\FUNNYCHARACTIVE\RETURNDEF\RETURNACTIVE
  \def\@underbarchar{{\char"5F}}\tt\frenchspacing
  \advance\@totalleftmargin\bnfmargin \tabbing
  \hskip\bnfalign\hskip\bnfopwidth\hskip\bnfindent\=\kill\>\+\@gobblecr}
\def\endXBNF{\-\endtabbing}

\def\BNF{\BEGIN{XBNF}}
\def\FNB{\END{XBNF}}

\begingroup \catcode `|=0 \catcode`\\=12
|gdef|@XCODE#1\EDOC{#1|endtrivlist|end{tt}}
|endgroup

\def\CODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces \@XCODE}
\def\ICODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces
   \FUNNYCHARDEF\FUNNYCHARACTIVE \UNDERBARACTIVE\UNDERBARDEF \@XCODE}

\def\@underbarsub#1{{\ifmmode _{#1}\else {$_{#1}$}\fi}}
\let\@underbarchar\_
\def\@underbar{\let\@tempq\@underbarsub\if\@tempz A\let\@tempq\@underbarchar\fi
  \if\@tempz B\let\@tempq\@underbarchar\fi\if\@tempz C\let\@tempq\@underbarchar\fi
  \if\@tempz D\let\@tempq\@underbarchar\fi\if\@tempz E\let\@tempq\@underbarchar\fi
  \if\@tempz F\let\@tempq\@underbarchar\fi\if\@tempz G\let\@tempq\@underbarchar\fi
  \if\@tempz H\let\@tempq\@underbarchar\fi\if\@tempz I\let\@tempq\@underbarchar\fi
  \if\@tempz J\let\@tempq\@underbarchar\fi\if\@tempz K\let\@tempq\@underbarchar\fi
  \if\@tempz L\let\@tempq\@underbarchar\fi\if\@tempz M\let\@tempq\@underbarchar\fi
  \if\@tempz N\let\@tempq\@underbarchar\fi\if\@tempz O\let\@tempq\@underbarchar\fi
  \if\@tempz P\let\@tempq\@underbarchar\fi\if\@tempz Q\let\@tempq\@underbarchar\fi
  \if\@tempz R\let\@tempq\@underbarchar\fi\if\@tempz S\let\@tempq\@underbarchar\fi
  \if\@tempz T\let\@tempq\@underbarchar\fi\if\@tempz U\let\@tempq\@underbarchar\fi
  \if\@tempz V\let\@tempq\@underbarchar\fi\if\@tempz W\let\@tempq\@underbarchar\fi
  \if\@tempz X\let\@tempq\@underbarchar\fi\if\@tempz Y\let\@tempq\@underbarchar\fi
  \if\@tempz Z\let\@tempq\@underbarchar\fi\@tempq}
\def\@under{\futurelet\@tempz\@underbar}

\def\UNDERBARACTIVE{\CATCODE`\_=13}
\UNDERBARACTIVE
\def\UNDERBARDEF{\def_{\protect\@under}}
\UNDERBARDEF

\catcode`\$=11  

%the following line would allow derived-type component references 
%FOO%BAR in running text, but not allow LaTeX comments
%without this line, write FOO\%BAR
%\catcode`\%=11 

\makeatother

%end of file syntax-macs.tex


\title{{\em D R A F T} \\High Performance Fortran \\ FORALL Proposal}
\author{High Performance Fortran Forum}
\date{October 14, 1992}

\hyphenation{RE-DIS-TRIB-UT-ABLE sub-script Wil-liam-son}

\begin{document}

\maketitle

\newpage

\pagenumbering{roman}

\vspace*{4.5in}

This is the result of a LaTeX run of a draft of a single chapter of 
the HPFF Final Report document.

\vspace*{3.0in}

\copyright 1992 Rice University, Houston Texas.  Permission to copy 
without fee all or part of this material is granted, provided the 
Rice University copyright notice and the title of this document 
appear, and notice is given that copying is by permission of Rice 
University.

\tableofcontents

\newpage

\pagenumbering{arabic}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%put text of chapter here


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%put \end{document} here
%statements.tex


%Revision history:
%August 2, 1992 - Original version of David Loveman, Digital Equipment
%	Corporation and Charles Koelbel, Rice University
%August 19, 1992 - chk - cleaned up discrepancies with Fortran 90 array 
%	expressions
%August 20, 1992 - chk - added DO INDEPENDENT section, Guy Steele's 
%	pointer proposals
%August 24, 1992 - chk - ELEMENTAL functions proposal
%August 31, 1992 - chk - PURE functions proposal
%September 3, 1992 - chk - reorganized sections
%September 21, 1992 - chk - began incorporating updates from Sept
%	10-11 meeting
%October 14, 1992 - chk - Incorporated ON and revised PURE 


\newenvironment{constraints}{
        \begin{list}{Constraint:}{
                \settowidth{\labelwidth}{Constraint:}
                \settowidth{\labelsep}{w}
                \settowidth{\leftmargin}{Constraint:w}
                \setlength{\rightmargin}{0cm}
        }
}{
        \end{list}
}


\chapter{Statements}
\label{statements}

\section{Overview}

\footnote{Version of September 21, 1992 
- Charles Koelbel, Rice University.}
The purpose of the FORALL construct is to provide a convenient syntax for 
simultaneous assignments to large groups of array elements.
In this respect it is very similar to the functionality provided by array 
assignments and WHERE constructs.
FORALL differs from these constructs primarily in its syntax, which is 
intended to be more suggestive of local operations on each element of an 
array.
It is also possible to specify slightly more general array regions than 
are allowed by the basic array triplet notation.
Both single-statement and block FORALLs are defined in this proposal.

The FORALL statement, in both its single-statment and block forms, was
accepted by the High Performance Fortran Forum working group on its
second reading September 10, 1992.
This vote was contingent on a more complete definition of PURE
functions.
The idea of PURE functions was accepted by the HPFF working group at
its first reading on September 10, 1992.
However, the definition at that time was not completely acceptable due to
technical errors; those errors discussed at that time have been
revised in this draft.
The single-statement form of FORALL was accepted by the HPFF working
group as part of the official HPF subset in a first reading on
September 11, 1992; the block FORALL was excluded from the subset at
the same time.

The purpose of the INDEPENDENT directive is to allow the programmer to
give additional information to the compiler.
The user can assert that no data object is defined by one iteration of
a loop and used (read or written) by another; similar information can
be provided about the combinations of index values in a FORALL
statement.
A compiler may rely on this information to make optimizations, such as
parallelization or reorganizing communication.
If the assertion is true, the semantics of the program are not
changed; if it is false, the program is not standard-conforming and
has no defined meaning.
The ``Other Proposals'' section contains a number of additional
assertions with this flavor.

The INDEPENDENT assertion was accepted by the High Performance Fortran
Forum working group on its second reading on September 10, 1992.
The group also directed the FORALL subgroup to further explore methods for
allowing reduction operations to be accomplished in INDEPENDENT loops.

The following proposals are designed as a modification of the Fortran 90 
standard; all references to rule numbers and section numbers pertain to 
that document unless otherwise noted.


\section{Element Array Assignment - FORALL}
 

\label{forall-stmt}

\footnote{Version of September 21, 1992 - David
Loveman, Digital Equipment Corporation and Charles Koelbel, Rice
University.
Approved at second reading on September 10, 1992.}
The element array
assignment statement (FORALL statement) is used to specify an array
assignment in terms of array elements or groups of array sections.
The element array assignment may be
masked with a scalar logical expression.  
In functionality, it is similar to array assignment statements;
however, more general array sections can be assigned in FORALL.

Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsection{General Form of Element Array Assignment}

                                                                       \BNF
forall-stmt          \IS FORALL (forall-triplet-spec-list
                       [,scalar-mask-expr ]) forall-assignment

forall-triplet-spec  \IS subscript-name = subscript : subscript 
                          [ : stride]
                                                                       \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

                                                                       \BNF
forall-assignment    \IS array-element = expr
                     \OR array-element => target
                     \OR array-section = expr
                                                                       \FNB

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: In the cases of simple assignment, the {\it array-element} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent
Constraint: In the case of pointer assignment, the {\it array-element} 
and {\it target} have the same constraints as the {\it pointer-object} 
and {\it target}, respectively, in a {\it pointer-assignment-stmt}.

\noindent
Constraint: In the cases of array section assignment, the {\it 
array-section} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent Constraint: If any subexpression in {\it expr}, {\it 
array-element}, or {\it array-section} is a {\it function-reference}, 
then the {\it function-name} must be a ``pure'' function as defined in
Section~\ref{forall-pure}.


For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
1}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
forall-assignment} is not executed.

A ``pure'' function is defined in Section~\ref{forall-pure}; the
intuition is that a pure function cannot have side effects.
The PURE declaration places syntactic constraints on the function to
ensure this.

Examples of element array assignments are:

                                                                  \CODE
REAL H(N,N), X(N,N), Y(N,N)
TYPE MONARCH
    INTEGER, POINTER :: P
END TYPE MONARCH
TYPE(MONARCH) :: A(N)
INTEGER B(N)
      ...
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, Y(I,J) .NE. 0.0) X(I,J) = 1.0 / Y(I,J)

! Set up a butterfly pattern
FORALL (J=1:N)  A(J)%P => B(1+IEOR(J-1,2**K))
                                                                  \EDOC 

\subsection{Interpretation of Element Array Assignments}  

Execution of an element array assignment consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Evaluation in any order of the {\it expr} or {\it target} and all 
subscripts contained in the 
{\it array-element} or {\it array-section} in the {\it forall-assignment} 
for all active combinations of {\em subscript-name} values.
In the case of pointer assignment where the {\it target} is not a 
pointer, the evaluation consists of identifying the object referenced 
rather than computing its value.

\item Assignment of the computed {\it expr} values to the corresponding 
elements specified by {\it array-element} or {\it array-section}.
The assignments may be made in any order.
In the case of a pointer assignment where the {\it target} is not a 
pointer, this assignment consists of associating the {\it array-element} 
with the object referenced.

\end{enumerate}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 


The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.

Note that if a function called in a FORALL is declared PURE, then it 
is impossible for that function's evaluation to affect other expressions' 
evaluations, either for the same combination of 
{\it subscript-name} values or for a different combination.
In addition, it is possible that the compiler can perform 
more extensive optimizations when all functions are declared PURE.


\subsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

                                                                   \CODE
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn , mask ) &
      a(e1,...,em) = rhs
                                                                   \EDOC

\noindent
is equivalent to the following standard Fortran 90 code:

\raggedbottom
                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
	  ...
  END DO
END DO

!then evaluate the expr in the forall-assignment for all valid 
!combinations of subscript names for which the scalar mask 
!expression is true (it is safe to avoid saving the subscript 
!expressions because of the conditions on FORALL expressions)

DO v1=templ1,tempu1,temps1
  DO v2=tel2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
	  ...
  END DO
END DO

!then perform the assignment of these values to the corresponding 
!elements of the array being assigned to

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(e1,...,em) = temprhs(v1,v2,...,vn)
        END IF
      END DO
	  ...
  END DO
END DO
                                                                      \EDOC
\flushbottom

\subsection{Consequences of the Definition of the FORALL Statement}

This section should be moved to the comments chapter in the final
draft.

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  
(This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Right-hand sides and subscripts on the left hand side of a {\it 
forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\item The intent of ``pure'' functions is to provide a class of
functions without side-effects, and to allow this side-effect freedom
to be checked syntactically.

\end{itemize}


\section{FORALL Construct}

\label{forall-construct}

\footnote{Version of August 20, 1992 -
David Loveman, Digital Equipment Corporation and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992.}
The FORALL construct is a generalization of the element array
assignment statement allowing multiple assignments, masked array 
assignments, and nested FORALL statements to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it
forall-construct}.

\subsection{General Form of the FORALL Construct}

                                                                    \BNF
forall-construct   \IS FORALL (forall-triplet-spec-list [,scalar-mask-expr ])
                               forall-body-stmt-list
                            END FORALL

forall-body-stmt     \IS forall-assignment
                     \OR where-stmt
                     \OR forall-stmt
                     \OR forall-construct
                                                                    \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\noindent
Constraint:  Any left-hand side {\it array-section} or {\it 
array-element} in any {\it forall-body-stmt}
must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
within a {\it forall-construct}, then the inner FORALL may not redefine 
any {\it subscript-name} used in the outer {\it forall-construct}.
This rule applies recursively in the event of multiple nesting levels.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
1)}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
forall-assignment}s are not  executed.

Examples of the FORALL construct are:

                                                                 \CODE
FORALL ( I = 2:N-1, J = 2:N-1 )
  A(I,J) = A(I,J-1) + A(I,J+1) + A(I-1,J) + A(I+1,J)
  B(I,J) = A(I,J)
END FORALL

FORALL ( I = 1:N-1 )
  FORALL ( J = I+1:N )
    A(I,J) = A(J,I)
  END FORALL
END FORALL

FORALL ( I = 1:N, J = 1:N )
  A(I,J) = MERGE( A(I,J), A(I,J)**2, I.EQ.J )
  WHERE ( .NOT. DONE(I,J,1:M) )
    B(I,J,1:M) = B(I,J,1:M)*X
  END WHERE
END FORALL
                                                                \EDOC


\subsection{Interpretation of the FORALL Construct}

Execution of a FORALL construct consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Execute the {\it forall-body-stmts} in the order they appear.
Each statement is executed completely (that is, for all active 
combinations of {\it subscript-name} values) according to the following 
interpretation:

\begin{enumerate}

\item Assignment statements, pointer assignment statements, and array
assignment statements (i.e.
statements in the {\it forall-assignment} category) evaluate the 
right-hand side {\it expr} and any left-and side subscripts for all 
active {\it subscript-name} values,
then assign those results to the corresponding left-hand side references.

\item WHERE statements evaluate their {\it mask-expr} for all active 
combinations of values of {\it subscript-name}s.
All elements of all masks may be evaluated in any order. 
The assignments within the WHERE branch of the statement are then 
executed in order using the above interpretation of array assignments 
within the FORALL, but the only array elements assigned are those 
selected by both the active {\it subscript-names} and the WHERE mask.
Finally, the assignments in the ELSEWHERE branch are executed if that 
branch is present.
The assignments here are also treated as array assignments, but elements 
are only assigned if they are selected by both the active combinations 
and by the negation of the WHERE mask.

\item FORALL statements and FORALL constructs first evaluate the 
subscript and stride expressions in 
the {\it forall-triplet-spec-list} for all active combinations of the 
outer FORALL constructs.
The set of valid combinations of {\it subscript-names} for the inner 
FORALL is then the union of the sets defined by these bounds and strides 
for each active combination of the outer {\it subscript-names}.
For example, the valid set of the inner FORALL in the second example in 
the last section is the upper triangle (not including the main diagonal) 
of the \(n \times n\) matrix a.
The scalar mask expression is then evaluated for all valid combinations 
of the inner FORALL's {\it subscript-names} to produce the set of active 
combinations.
If there is no scalar mask expression, it is assumed to be always true.
Each statement in the inner FORALL is then executed for each valid 
combination (of the inner FORALL), recursively following the 
interpretations given in this section.

\end{enumerate}

\end{enumerate}

If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL construct itself. 

A single assignment or array assignment statement in a {\it 
forall-construct} must obey the same restrictions as a {\it 
forall-assignment} in a simple {\it forall-stmt}.
(Note that the lowest level of nested statements must always be an 
assignment statement.)
For example, an assignment may not cause the same array element to be 
assigned more than once.
Different statements may, however, assign to the 
same array element, and assignments made in one
statement may affect the execution of a later statement.

\subsection{Scalarization of the FORALL Construct}

A {\it forall-construct} othe form:

                                                                \CODE
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
                                                                \EDOC

where each si is an assignment is equivalent to the following scalar code:

                                                                \CODE
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
                                                                \EDOC

A similar statement can be made using FORALL constructs when the 
si may be WHERE or FORALL constructs.

A {\it forall-construct} of the form:

                                                                \CODE
FORALL ( v1=l1:u1:s1, mask )
  WHERE ( mask2(l2:u2) )
    a(vi,l2:u2) = rhs1
  ELSEWHERE
    a(vi,l2:u2) = rhs2
  END WHERE
END FORALL
                                                                \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the masks for the WHERE

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    tmpl2(v1) = l2
    tmpu2(v1) = u2
    tempmask2(v1,tmpl2(v1):tmpu2(v1)) = mask2(tmpl2(v1):tmpu2(v1))
  END IF
END DO

!then evaluate the WHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs1(v1,tmpl2(v1):tmpu2(v1)) = rhs1
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs1(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO

!then evaluate the ELSEWHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs2(v1,tmpl2(v1):tmpu2(v1)) = rhs2
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs2(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO
                                                                   \EDOC


A {\it forall-construct} of the form:

                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  FORALL ( v2=l2:u2:s2, mask2 )
    a(e1,e2) = rhs1
	b(e3,e4) = rhs2
  END FORALL
END FORALL
                                                                  \EDOC

is equivalent to the following standard Fortran 90 code:


                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the inner FORALL bounds, etc

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    templ2(v1) = l2
    tempu2(v1) = u2
    temps2(v1) = s2
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      tempmask2(v1,v2) = mask2
    END DO
  END IF
END DO

!first statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs1(v1,v2) = rhs1
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        a(e1,e2) = temprhs1(v1,v2)
      END IF
    END DO
  END IF
END DO

!second statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs2(v1,v2) = rhs2
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        b(e3,e4) = temprhs2(v1,v2)
      END IF
    END DO
  END IF
END DO
                                                                   \EDOC


\subsection{Consequences of the Definition of the FORALL Construct}

This section should be moved to the comments chapter of the final
draft.

\begin{itemize}

\item A block FORALL means roughly the same as replicating the FORALL
header in front of each array assignment statement in the block, except
that any expressions in the FORALL header are evaluated only once,
rather than being re-evaluated before each of the statements in the body.
(The exceptions to this rule are nested FORALL statements and WHERE
statements, which introduce syntactic and functional complications
into the copying.)

\item One may think of a block FORALL as synchronizing twice per
contained assignment statement: once after handling the rhs and other 
expressions
but before performing assignments, and once after all assignments have
been performed but before commencing the next statement.  (In practice,
appropriate dependence analysis will often permit the compiler to
eliminate unnecessary synchronizations.)

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item In general, any expression in a FORALL is evaluated only for valid 
combinations of all surrounding subscript names for which all the
scalar mask eressions are true.

\item Nested FORALL bounds and strides can depend on outer FORALL {\it 
subscript-names}.  They cannot redefine those names, even temporarily (if 
they did there  would be no way to avoid multiple assignments to the same 
array element).

\item Dependences are allowed from one statement to later statements, but 
never from an assignment statement to itself.

\end{itemize}


\section{Pure Procedures and Elemental Reference}

\label{forall-pure}

\footnote{Version of October 14, 
1992 - John Merlin, University of Southampton, 
and Charles Koelbel, Rice University. Approved at first reading on
September 10, 1992, subject to technical revisions for correctness.
The suggestions made there have been incorporated in this draft.}
A {\it pure function\/} is one that produces no side effects.  This
means that the only effect of a pure function reference on the state 
of a program is to return a result---it does not modify the values, 
pointer associations or data mapping of any of its arguments or global 
data, and performs no I/O.
A {\em pure subroutine\/} is one that produces no side effects
except for modifying the values and/or pointer associations of certain
arguments.  

A pure procedure (i.e.\ function or subroutine) may be used in any way 
that a normal procedure can.
In addition, a procedure is required to be pure if it is used in any 
of the following contexts:
\begin{itemize}
        \item a FORALL statement or construct;
        \item an elemental reference (see section \ref{elem-ref-of-pure-procs});
        \item within the body of a pure procedure;
        \item as an actual argument in a pure procedure reference.
\end{itemize}

The side-effect freedom of a pure function ensures that it can be invoked
concurrently in a FORALL or elemental reference without undesirable
consequences such as non-determinism, and additionally assists the efficient
implementation of concurrent execution.  A pure subroutine can be
invoked concurrently in an elemental reference, and since its side effects
are limited to a known subset of its arguments (as we shall see later), 
an implementation can check that a reference obeys Fortran~90's restrictions 
on argument association and is consequently deterministic.


\subsection{Pure procedure declaration and interface}

If a non-intrinsic procedure is used in a context that requires it to be 
pure, then its interface must be explicit in the scope of that use, 
and both its interface body (if provided) and its definition must contain 
the PURE declaration.  The form of this declaration is 
a directive immediately after the {\it function-stmt\/} or {\it
subroutine-stmt\/} of the procedure interface body or definition:
                                                                 \BNF
pure-directive \IS !HPF$ PURE [procedure-name]
                                                                 \FNB

Intrinsic functions, including HPF intrinsic functions, are always pure 
and require no explicit declaration of this fact;  intrinsic subroutines 
are pure if they are elemental (e.g.\ MVBITS) but not otherwise.
A statement function is pure if and only if all functions that it
references are pure.

\subsubsection{Pure function definition}

To define pure functions, Rule~R1215 of the Fortran~90 standard is changed 
to:
                                                                 \BNF
function-subprogram \IS         function-stmt
                                [pure-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-function-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.2 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it pure-directive\/}, it must match the {\it function-name\/} in the 
{\it function-stmt\/}.

        \item In a pure function, a local variable must not have the 
SAVE attribute. (Note that this means that a local variable cannot be 
initialised in a {\it type-declaration-stmt\/} or a
{\it data-stmt\/}, which imply the SAVE attribute.)

        \item A pure function must not use a dummy argument, a global 
variable, or an object that is storage associated with a global variable,
or a subobject thereof, in the following contexts:
        \begin{itemize}
                \item as the assignment variable of an {\it assignment-stmt\/}
or {\it forall-assignment\/};
                \item as a DO variable or implied DO variable, or as a 
{\it subscript-name\/} in a {\it forall-triplet-spec\/};
                \item in an {\it assign-stmt\/};
                \item as the {\it pointer-object\/} or {\it target\/}
of a {\it pointer-assignment-stmt\/};
                \item as the {\it expr\/} of an {\it assignment-stmt\/}
or {\it forall-assignment\/} whose assignment variable is of a derived 
type, or is a pointer to a derived type, that has a pointer component 
at any level of component selection;
                \item as an {\it allocate-object\/} or {\it stat-variable\/}
in an {\it allocate-stmt\/} or {\it deallocate-stmt\/}, or as a
{\it pointer-object\/} in a {\it nullify-stmt\/};
                \item as an actual argument associated with a dummy 
argument with INTENT (OUT) or (INOUT) or with the POINTER attribute.
        \end{itemize}

        \item Any procedure referenced in a pure function, including
one referenced via a defined operation or assignment, must be pure.

        \item In a pure function, a dummy argument or local variable 
must not appear in an {\it align-directive\/}, {\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a pure function, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A pure function must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a function is pure, a {\it pure-directive\/} must be given.

The above constraints are designed to guarantee that a pure function
is free from side effects (i.e.\ modifications of data visible outside
the function), which means that it is safe to reference concurrently, 
as explained earlier.

The second constraint ensures that a pure function does not retain
an internal state between calls, which would allow side-effects between 
calls to the same procedure.

The third constraint ensures that dummy arguments and global variables
are not modified by the function.
In the case of a dummy or global pointer, this applies to both its 
pointer association and its target value, so it cannot be subject to 
a pointer assignment or to an ALLOCATE, DEALLOCATE or NULLIFY
statement.
Incidentally, these constraints imply that only local variables and the
dummy result variable can be subject to assignment or pointer assignment.

In addition, a dummy or global data object cannot be the {\it target\/}
of a pointer assignment (i.e.\ it cannot be used as the right hand side
of a pointer assignment to a local pointer or to the result variable), 
for then its value could be modified via the pointer.

In connection with the last point, it should be noted that an ordinary 
(as opposed to pointer) assignment to a variable of derived type that has 
a pointer component at any level of component selection may result in a 
{\em pointer\/} assignment to the pointer component of the variable.
That is certainly the case for an intrinsic assignment.  In that case
the expression on the right hand side of the assignment has the same type 
as the assignment variable, and the assignment results in a pointer 
assignment of the pointer components of the expression result to the
corresponding components of the variable (see section 7.5.1.5 of the 
Fortran~90 standard).  However, it may also be the case for a 
{\em defined\/} assignment to such a variable, even if the data type of 
the expression has no pointer components;  the defined assignment may still 
involve pointer assignment of part or all of the expression result to the 
pointer components of the assignment variable.  Therefore, a dummy or 
global object cannot be used as the right hand side of any assignment to 
a variable of derived type with pointer components, for then it, or part 
of it, might be the target of a pointer assignment, in violation of the 
restriction mentioned above.

(Incidentally, the last two paragraphs only prevent the reference of 
a dummy or global object as the {\em only\/} object on the right hand
side of a pointer assignment or an assignment to a variable with pointer
components.  There are no constraints on its reference as an operand, 
actual argument, subscript expression, etc.\ in these circumstances).

Finally, a dummy or global data object cannot be used in a procedure 
reference as an actual argument associated with a dummy argument of
INTENT (OUT) or (INOUT) or with a dummy pointer, for then it may be
modified by the procedure reference.  
This constraint, like the others, can be statically checked, since any
procedure referenced within a pure function must be either a pure 
function, which does not modify its arguments, or a pure subroutine, 
whose interface must specify the INTENT or POINTER attributes of its 
arguments (see below).
Incidentally, notice that in this context it is assumed that an actual 
argument associated with a dummy pointer is modified, since Fortran~90 
does not allow its intent to be specified.

Constraint 4 ensures that all procedures called from a pure function 
are themselves pure and hence side effect free, except, in the case of
subroutines, for modifying actual arguments associated with dummy pointers 
or dummy arguments with INTENT(OUT) or (INOUT).  As we have just 
explained, it can be checked that global or dummy objects are not used
in such arguments, which would violate the required side-effect freedom.

Constraints 5 and 6 protect dummy and global data objects from realignment 
and redistribution (another type of side effect).  
In addition, constraint 5 prevents explicit declaration of the mapping 
(i.e.\ alignment and distribution) of dummy arguments and local variables.  
This is because the function may be invoked concurrently, with each 
invocation operating on a segment of data whose distribution is specific 
to that invocation.  Thus, the distribution of a dummy object must be 
`assumed' from the corresponding actual argument.  
Also, it is left to the implementation to determine a suitable mapping 
of the local variables, which would typically depend on the mapping of 
the dummy arguments.

Constraint 7 prevents I/O, whose order would be non-deterministic in 
the context of concurrent execution.  A PAUSE statement requires input
and so is disallowed for the same reason.


\subsubsection{Pure subroutine definition}

To define pure subroutines, Rule~R1219 is changed to:
                                                                 \BNF
subroutine-subprogram \IS       subroutine-stmt
                                [pure-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-subroutine-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.3 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it pure-directive\/}, it must match the {\it sub\-rou\-tine-name\/} in the 
{\it subroutine-stmt\/}.

        \item The {\it specification-part\/} of a pure subroutine must 
specify the intents of all non-pointer and non-procedure dummy arguments.

        \item In a pure subroutine, a local variable must not have the 
SAVE attribute. (Note that this means they cannot be initialised in a 
{\it type-declaration-stmt\/} or a {\it data-stmt\/}.)

        \item A pure subroutine must not use a dummy parameter with 
        INTENT(IN), a global variable, or an 
object that is storage associated with a global variable, or a subobject 
thereof, in the following contexts:
        \begin{itemize}
                \item as the assignment variable of an {\it assignment-stmt\/}
or {\it forall-assignment\/};
                \item as a DO variable or implied DO variable, or as a 
{\it subscript-name\/} in a {\it forall-triplet-spec\/};
                \item in an {\it assign-stmt\/};
                \item as the {\it pointer-object\/} or {\it target\/}
of a {\it pointer-assignment-stmt\/};
                \item as the {\it expr\/} of an {\it assignment-stmt\/}
or {\it forall-assignment\/} whose assignment variable is of a derived 
type, or is a pointer to a derived type, that has a pointer component 
at any level of component selection;
                \item as an {\it allocate-object\/} or {\it stat-variable\/}
in an {\it allocate-stmt\/} or {\it deallocate-stmt\/}, or as a
{\it pointer-object\/} in a {\it nullify-stmt\/};
                \item as an actual argument associated with a dummy 
argument with INTENT (OUT) or (INOUT) or with the POINTER attribute.
        \end{itemize}

        \item Any procedure referenced in a pure subroutine, including
one referenced via a defined operation or assignment, must be pure.

        \item In a pure subroutine, a dummy argument or local variable 
must not appear in an {\it align-directive\/}, {\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a pure subroutine, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A pure subroutine must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a subroutine is pure, a {\it pure-directive\/} must be
given.

The constraints for pure subroutines are based on the same principles 
as for pure functions, except that now side effects to dummy arguments 
are permitted.  


\subsubsection{Pure procedure interfaces}
\label{pure-proc-interface}

To define interface specifications for pure procedures, Rule~R1204 is 
changed to:
                                                                \BNF
interface-body \IS      function-stmt
                        [pure-directive]
                        [specification-part]
                        end-function-stmt
                \OR     subroutine-stmt
                        [pure-directive]
                        [specification-part]
                        end-subroutine-stmt
                                                                \FNB
with the following constraint in addition to those in
Section~12.3.2.1 of the Fortran~90 standard:
\begin{constraints}

        \item An {\it interface-body\/} of a pure subroutine must specify
the intents of all non-pointer and non-procedure dummy arguments.

\end{constraints}

The procedure characteristics defined by an interface body must be
consistent with the procedure's definition.
Regarding pure procedures, this is interpreted as follows:
\begin{enumerate}
        \item A procedure that is declared pure at its definition may be
declared pure in an interface block, but this is not required.
        \item A procedure that is not declared pure at its definition must 
not be declared pure in an interface block.
\end{enumerate}
That is, if an interface body contains a {\it pure-directive\/}, then the 
corresponding procedure definition must also contain it, though the 
reverse is not true.
When a procedure definition with a {\it pure-directive\/}
is compiled, the compiler may check that it satisfies the necessary 
constraints.


\subsection{Pure procedure reference}
To define pure procedure references, the following extra constraint is 
added to Section~12.4.1 of the Fortran~90 standard:
\begin{constraints}

        \item In a reference to a pure procedure, a {\it procedure-name\/} 
{\it actual-arg\/} must be the name of a pure procedure.

\end{constraints}


\subsection{Elemental reference of pure procedures}
\label{elem-ref-of-pure-procs}

Fortran 90 introduces the concept of `elemental procedures', which are 
defined for scalar arguments but may also be applied to conforming 
array-valued arguments.  The latter type of reference to an elemental 
procedure is called an `elemental' reference.    For an elemental function, 
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g.\ SIN(X).  For an elemental 
subroutine, the effect on each element of an INTENT(OUT) or INTENT(INOUT) 
array argument is as would be obtained by calling the subroutine with 
the corresponding elements of the arguments.  An example is the intrinsic 
subroutine MVBITS.

However, Fortran~90 restricts elemental reference to a subset of 
the intrinsic procedures --- programmers cannot define their own 
elemental procedures.  Obviously, elemental invocation is equivalent 
to concurrent invocation, so extra constraints beyond those for normal 
Fortran procedures are required to allow this to be done safely
(e.g.\ deterministically).  Appropriate constraints in this case are
the same as for function calls in FORALL;  indeed, the latter are 
virtually equivalent to elemental reference of the function in an 
array assignment, given the close correspondence between FORALL and 
array assignment.  Hence, pure procedures may also be referenced 
elementally, subject to certain additional constraints given below.

\subsubsection{Elemental reference of pure functions}

A non-intrinsic pure function may be referenced {\em elementally\/} 
in array expressions, with a similar interpretation to the elemental
reference of Fortran~90 elemental intrinsic functions, provided it
satisfies the additional constraints that:
\begin{enumerate}
        \item Its non-procedure dummy arguments and dummy result are 
scalar and do not have the POINTER attribute.
        \item The length of any character dummy argument or result is 
independent of argument values (though it may be assumed, or depend on the 
lengths of other character arguments and/or a character result).
\end{enumerate}
We call non-intrinsic pure functions that satisfy these constraints 
`elemental non-intrinsic functions'.

The interpretation of an elemental reference of such a function is as 
follows (adapted from Section 12.4.3 of the Fortran~90 standard):
\begin{quotation}

A reference to an elemental non-intrinsic function is an elemental
reference if one or more non-procedure actual arguments are arrays
and all array arguments have the same shape.  If any actual argument 
is a function, its result must have the same shape as that of the 
corresponding function dummy procedure.  A reference to an elemental 
intrinsic function is an elemental reference if one or more actual 
arguments are arrays and all arrays have the same shape.

The result of such a reference has the same shape as the array arguments,
and the value of each element of the result, if any, is obtained by 
evaluating the function using the scalar and procedure arguments and
the corresponding elements of the array arguments.  The elements of
the result may be evaluated in any order.

For example, if \verb@foo@ is a pure function with the following interface:
                                                \CODE
    INTERFACE
      REAL FUNCTION foo (x, y, z, dummy_func)
        !HPF$ PURE foo
        REAL, INTENT(IN) :: x, y, z
        INTERFACE        ! interface for 'dummy_func'
          REAL FUNCTION dummy_func (x)
            !HPF$ PURE dummy_func
            REAL, INTENT(IN) :: x
          END FUNCTION dummy_func
        END INTERFACE
      END FUNCTION foo
    END INTERFACE
                                                \EDOC
and \verb@a@ and \verb@b@ are arrays of shape \verb@(m,n)@ and \verb@sin@
is the Fortran~90 elemental intrinsic function, then:
                                                \CODE
    foo (a, 0.0, b, sin)
                                                \EDOC
is an array expression of shape \verb@(m,n)@ whose \verb@(i,j)@ element
has the value:
                                                \CODE
    foo (a(i,j), 0.0, b(i,j), sin)
                                                \EDOC
\end{quotation}

To define elemental references of elemental non-intrinsic functions, 
the following extra constraints are added after Rule~R1209 
({\it function-reference\/}):
\begin{constraints}

        \item A non-intrinsic function that is referenced elementally 
must be a pure function with an explicit interface, and must satisfy 
the following additional constraints:
        \begin{itemize}
                \item Its non-procedure dummy arguments and dummy result
must be scalar and must not have the POINTER attribute.
                \item The length of any character dummy argument or a 
character dummy result must not depend on argument values (though it may 
be assumed, or depend on the lengths of other character arguments and/or a 
character result).
        \end{itemize}

        \item In an elemental reference of a non-intrinsic function,
a {\it function-name\/} {\it actual-arg\/} must have a result whose shape 
agrees with that of the corresponding function dummy procedure.

\end{constraints}

The reasons for these constraints are explained in the next section.


\subsubsection{Elemental reference of pure subroutines}

A non-intrinsic pure subroutine may be referenced {\em elementally\/}, 
with a similar interpretation to the elemental reference of Fortran~90 
elemental intrinsic subroutines, provided it satisfies the additional 
constraints that:
\begin{enumerate}
        \item Its non-procedure dummy arguments are scalar and do not 
have the POINTER attribute.
        \item The length of any character dummy argument is independent 
of argument values (though it may be assumed, or depend on the lengths of 
other character arguments).
\end{enumerate}
We call non-intrinsic pure subroutines that satisfy these constraints 
`elemental non-intrinsic subroutines'.

The interpretation of an elemental reference of such a subroutine 
is as follows (adapted from Section 12.4.5 of the Fortran~90 standard):
\begin{quotation}

A reference to an elemental non-intrinsic subroutine is an elemental
reference if all actual arguments corresponding to INTENT(OUT) and
INTENT(INOUT) dummy arguments are arrays that have the same shape 
and the remaining non-procedure actual arguments are conformable with 
them.  If any actual argument is a function, its result must have the 
same shape as that of the corresponding function dummy procedure.
A reference to an elemental intrinsic subroutine is an elemental 
reference if all actual arguments corresponding to INTENT(OUT) and 
(INTENT(INOUT) dummy arguments are arrays that have the same shape and 
the remaining actual arguments are conformable with them.

The values of the elements of the arrays that correspond to INTENT(OUT)
and INTENT(INOUT) dummy arguments are the same as if the subroutine were 
invoked separately, in any order, using the scalar and procedure arguments 
and corresponding elements of the array arguments.

\end{quotation}

To define elemental references of elemental non-intrinsic subroutines, 
the following constraints are added after Rule~R1210 ({\it call-stmt\/}):
\begin{constraints}

        \item A non-intrinsic subroutine that is referenced elementally 
must be a pure subroutine with an explicit interface, and must satisfy 
the following additional constraints:
        \begin{itemize}
                \item Its non-procedure dummy arguments must be scalar 
and must not have the POINTER attribute.
                \item The length of any character dummy argument must 
not depend on argument values (though it may be assumed, or depend on 
the lengths of other character arguments).
        \end{itemize}

        \item In an elemental reference of a non-intrinsic subroutine,
a {\it function-name\/} {\it actual-arg\/} must have a result whose shape 
agrees with that of the corresponding function dummy procedure.

\end{constraints}

It is perhaps worth outlining the reasons for the extra constraints 
imposed on pure procedures in order for them to be referenced elementally.  

The dummy result of a function or `output' arguments of a subroutine
are not allowed to have the POINTER attribute because of a Fortran~90
technicality, namely, that under elemental reference the corresponding 
actual arguments must be array variables, and Fortran~90 does not permit 
an array of pointers to be referenced.\footnote{
        See the final constraint after Rule~R613 of the Fortran~90 standard.
Note the difference between an {\em array of pointers\/}, which cannot 
be declared or referenced in Fortran~90, and a {\em pointer array\/},
which can.
}
The `input' arguments of an elemental reference are prohibited from 
having the POINTER attribute for consistency with the output arguments 
or result.  However, this last constraint does not impose 
any real restrictions on an elemental reference, as the corresponding 
actual arguments {\em can\/} be pointers, in which case they are 
`de-referenced' and their targets are associated with the dummy arguments.  
In fact, the only reason for a dummy argument to be a pointer is so that
its pointer association can be changed, which is not allowed for `input'
arguments.  (Incidentally, since a pure function has only `input' 
arguments, there would be no loss of generality in disallowing dummy 
pointers in pure functions generally.)  Note that the prohibition of 
dummy pointers in pure subroutines that are elementally referenced means 
that all their non-procedure dummy arguments can have their intent 
explicitly specified (and indeed this is required by the constraints for 
pure subroutine interfaces---see Section \ref{pure-proc-interface}) which 
assists the checking of argument usage.

In an elemental reference, any actual argument that is a function
must have a result whose shape agrees with that of the corresponding 
function dummy procedure.  That is, elemental usage does not extend to 
function arguments, as Fortran~90 does not support the concept of an `array' 
of functions.
Naively it might appear that a function actual argument that is associated 
with a scalar dummy function could return an array result provided it 
conforms with the other array arguments of the elemental reference.  
However, this is not meaningful under elemental reference, as an 
array-valued function cannot be decomposed into an `array' of scalar 
function references, as would be required in this context.

Finally, the length of any character dummy argument or a character
dummy result cannot depend on argument {\em values\/} (though it can
be assumed, or depend on the lengths of other character arguments and/or
a character result).  This ensures that under elemental reference, all 
elements of an array argument or result of character type will have the 
same length, as required by Fortran~90.


\subsection{Examples of pure procedure usage}

\subsubsection{FORALL statements and constructs}

Pure functions may be used in expressions in FORALL statements and 
constructs, unlike general functions.  
Because a {\it forall-assignment}
may be an {\it array-assignment} the pure function can have an array
result.  
For example:
                                                              \CODE
INTERFACE
  FUNCTION f (x)
    !HPF$ PURE f
    REAL, DIMENSION(3) :: f, x
  END FUNCTION f
END INTERFACE
REAL  v (3,10,10)
...
FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 
                                                              \EDOC


\subsubsection{Elemental references}
Examples of elemental function usage are
                                                              \CODE
INTERFACE 
  REAL FUNCTION foo (x, y, z)
    !HPF$ PURE foo
    REAL, INTENT(IN) :: x, y, z
  END FUNCTION foo
END INTERFACE

REAL a(100), b(100), c(100)
REAL p, q, r

a(1:n) = foo (a(1:n), b(1:n), c(1:n))
a(1:n) = foo (a(1:n), q, r)
a = sin(b)
                                                              \EDOC
An example involving a WHERE-ELSEWHERE construct is
                                                              \CODE
INTERFACE
  REAL FUNCTION f_egde (x)
    !HPF$ PURE
    REAL x
  END FUNCTION f_edge
  REAL FUNCTION f_interior (x)
    !HPF$ PURE
    REAL x
  END FUNCTION f_interior
END INTERFACE

REAL a (10,10)
LOGICAL edges (10,10)

WHERE (edges)
  a = f_egde (a)
ELSE WHERE
  a = f_interior (a)
END WHERE
                                                          \EDOC

Examples of elemental subroutine usage are
                                                                \CODE
INTERFACE 
  SUBROUTINE solve_simul(tol, y, z)
    !HPF$ PURE solve_simul
    REAL, INTENT(IN) :: tol
    REAL, INTENT(INOUT) :: y, z
  END SUBROUTINE
END INTERFACE

REAL a(100), b(100), c(100)
INTEGER bits(10)

CALL solve_simul( 0.1, a, b )
CALL solve_simul( c, a, b )
CALL mvbits( bits, 0, 4, bits, 4) ! Fortran 90 elemental intrinsic
                                                                \EDOC

User-defined elemental procedures have several potential advantages.
They are a convenient programming tool, as the same procedure 
can be applied to actual arguments of any rank.

In addition, the implementation of an elemental function returning an
array-valued result in an array expression is likely to be more 
efficient than that of an equivalent array function.  One reason is 
that it requires less temporary storage for the result (i.e.\ storage 
for a single result versus storage for the entire array of results).  
Another is that it saves on looping if an array expression is 
implemented by sequential iteration over the component elemental 
expressions (as may be done for the `segment' of the array expression 
local to each process).  This is because, in the sequential version, 
the elemental function can be invoked elementally in situ within the 
expression.  The array function, on the other hand, must be executed 
before the expression is evaluated, storing its result in a temporary 
array for use within the expression.  Looping is then required during 
the execution of the array function body as well as the expression 
evaluation.


\subsection{MIMD parallelism via pure procedures}

We have seen that a pure procedure may be invoked concurrently at each
`element' of an array if it is referenced elementally or in a FORALL 
statement or construct (where an `element' may itself be an array in
a non-elemental reference).  In these cases, a limited form of MIMD 
parallelism can be obtained by means of branches within the pure procedure 
which depend on arguments associated with array elements or their 
subscripts (the latter especially in a FORALL context).  For example:
                                                              \CODE
    FUNCTION f (x, i)
      !HPF$ PURE f
      REAL x       ! associated with array element
      INTEGER i    ! associated with array subscript
      IF (x > 0.0) THEN     ! content-based conditional
        ...
      ELSE IF (i==1 .OR. i==n) THEN    ! subscript-based conditional
        ...
      ENDIF
    END FUNCTION

    ...
    REAL a(n)
    INTEGER i
    ...
    FORALL (i=1:n)  a(i) = f( a(i), i)
    ...
    a = f( a, (/i,i=1,n/) )     ! an elemental reference equivalent
                                ! to the above FORALL

                                                              \EDOC
This may sometimes provide an alternative to using
WHERE-ELSEWHERE constructs or sequences of masked FORALLs with their 
potential synchronisation overhead. 


\subsection{Comments}

This section should be moved to the comments chapter of the final draft.

\subsubsection{Pure procedures}

\begin{itemize}

\item The constraints for a pure procedure guarantee
freedom from side-effects, thus ensuring that it can be invoked
concurrently at each
`element' of an array (where an ``element'' may itself be a data-structure, 
including an array).

\item All constraints can be statically checked, thus providing safety
for the programmer.

Of course, a price that must be paid for this additional security is
that the constraints must be quite rigorous, which means that it
is possible to write a function that is side-effect free in behaviour
but which nevertheless fails to satisfy the constraints 
(e.g.\ a function that contains an assignment to a global variable,
but in a branch that is not executed in any invocation of the function
during a particular program execution).


\item It is expected that most High Performance Fortran library 
procedures will conform to the constraints required of pure procedures
(by the very nature of library procedures), and so can be declared pure 
and referenced in FORALL statements and constructs (if they are functions) 
and within user-defined pure procedures.  It is also anticipated that 
most library procedures will not reference global data, whose use may 
sometimes inhibit concurrent execution (see below).

The constraints on pure procedures are limited to those necessary 
for statically checkable side-effect freedom and the elimination 
of saved internal state.  Subject to these restrictions, maximum 
functionality has been preserved in the definition of pure procedures.
This has been done to make elemental reference and function calls in 
FORALL as widely available as possible, and so that quite general library 
procedures can be classified as pure.  

A drawback of this flexibility is that pure procedures permit certain 
features whose use may hinder, and in the worst case prevent, concurrent 
execution in FORALL and elemental references (that is, such references 
may have to be implemented by sequentialisation).  
Foremost among these features are the access of global data, particularly 
distributed global data, and the fact that the arguments and, for a pure 
function, the result may be pointers or data structures with pointer 
components, including recursive data structures such as lists and trees.
The programmer should be aware of the potential performance penalties 
of using such features.


\item An earlier draft of this proposal contained a constraint disallowing 
pure procedures from accessing global data objects, particularly
distributed data objects.
This constraint has been dropped as inessential to the side-effect freedom 
that the HPF committee requested.
However, it may well be that some machines will have great difficulty 
implementing FORALL without this constraint.


\item One of us (JHM) is still in favour of disallowing access to global 
variables for a number of reasons: 
\begin{enumerate}
\item Aesthetically, it is in keeping with the
nature of a `pure' function, i.e. a function in the mathematical
sense, and in practical terms it imposes no real restrictions on the 
programmer, as global data can be passed-in via the argument list; 
\item Without this constraint HPF programs can no longer be implemented 
by pure message-passing, or at least not efficiently, i.e. without
sequentialising FORALL statements containing function calls and greatly
complicating their implementation; 
\item Absence of this restriction may inhibit optimisation of FORALLs
and array assignments, as the optimisation of assigning the {\it expr\/}
directly to the assignment variable rather than to a temporary intermediate
array now requires interprocedural analysis rather than just local 
analysis.
\end{enumerate}

\end{itemize}

\subsubsection{Elemental references}

\begin{itemize}

\item The original draft proposed allowing pure procedures 
to be invoked elementally even if their dummy arguments or results 
were array-valued.  These provisions have been dropped to avoid 
promoting storage order to a higher level in Fortran~90
(i.e.\ to avoid introducing the concept of `arrays-if-arrays', 
which Fortran~90 seems to strenuously avoid!)   In practical terms,
the current proposal provides the same functionality as the original 
one for functions, though not for subroutines.  If a programmer wants 
elemental function behaviour, but also wants the `elements' to be
array-valued, this can be achieved using FORALL.

\item In typical FORALL or elemental implementation, a pure procedure 
would be called independently in each process, and its dummy arguments 
would be associated with `elements' local to that process. 
This is the reason for disallowing data mapping directives for
local and dummy variables within the bodies of such procedures.
Note that, particularly in elemental invocations, the actual arguments
can be distributed arrays which need not be `co-distributed'; if not,
a typical implementation would in general perform all data communications 
prior to calling the procedure, and would then pass-in the required 
elements locally via its argument list.

However, access to large global data structures such as look-up tables
is often useful within functions that are otherwise mathematically pure,
and these are allowed to be distributed.

\end{itemize}


\section{The INDEPENDENT Directive}

\label{do-independent}

\footnote{Version of August 20, 1992
 - Guy Steele, Thinking Machines Corporation, and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992; however, the INDEPENDENT subgroup was directed to
examine methods of allowing reductions to be performed within
INDEPENDENT constructs.}
The INDEPENDENT directive can procede a DO loop or FORALL statement or
construct.
Intuitively, it asserts to the compiler that the operations in the
following construct
may be executed independently--that is, in any order, or
interleaved, or concurrently--without changing the semantics
of the program.

The syntax of the INDEPENDENT directive is
                                                  \BNF
independent-dir	\IS	!HPF$INDEPENDENT [ (integer-variable-list) ]
                                                  \FNB

\noindent
Constraint: An {\it independent-dir\/} must immediately precede a DO or FORALL
statement.

\noindent
Constraint: If the {\it integer-variable-list\/} is present, then the
variables named must be the index variables of set of perfectly nested
DO loops or indices from the same FORALL header.

The directive is said to apply to the indices named in its {\it
integer-variable-list}, or equivalently to the loops or FORALL indexed
by those variables.
If no {\it integer-variable-list\/} is present, then it is as if it
were present and contained the index variable for the DO or FORALL
imediately following the directive.


When applied to a nest of DO loops, an INDEPENDENT directive is an
assertion by the programmer that no iteration may affect any other
iteration, either directly or indirectly.
This implies that there are no no exits from the construct other than
normal loop termination, and no I/O is performed by the loop.
A sufficient condition for ensuring this is that
during
the execution of the loop(s), no iteration assigns to any scalar
data object which is 
accessed (i.e.\ read or written) by any other iteration.
The directive is purely advisory and a compiler is free
to ignore them if it cannot make use of the information.


For example:
                                                  \CODE
!HPF$INDEPENDENT
      DO I=1,100
        A(P(I)) = B(I)
      END DO
                                                  \EDOC
asserts that the array P does not have any repeated entries (else they
would cause interference when A was assigned).
It also limits how A and B may be storage associated.
(The remaining examples in this
section assume that no variables are storage or sequence associated.)

Another example:
                                                  \CODE
!HPF$INDEPENDENT (I1,I2,I3)
      DO I1 = 1,N1
        DO I2 = 1,N2
          DO I3 = 1,N3
            DO I4 = 1,N4   !The inner loop is not independent!
              A(I1,I2,I3) = A(I1,I2,I3) + B(I1,I2,I4)*C(I2,I3,I4)
            END DO
          END DO
        END DO
      END DO
                                                  \EDOC
The inner loop is not independent because each element of A is
assigned repeatedly.
However, the three outer loops are independent because they access
different elements of A.
It is not relevant that the outer loops read the same elements from B
and C, because those arrays are not assigned.

The interpretation of INDEPENDENT for FORALL is similar to that for
DO: it asserts that no combination of the indices that INDEPENDENT
applies to may affect another combination.
This is only possible if one combination of index values assigns to a
scalar data object accessed by another
combination.
A DO and a FORALL with the same body are equivalent if they both
have the INDEPENDENT directive.
In the case of a FORALL, any of the variables may be mentioned in the
INDEPENDENT directive:
                                                                \CODE
!HPF$INDEPENDENT (I1,I3)
    FORALL(I1=1:N1,I2=1:N2,I3=1:N3) 
      A(I1,I2,I3) = A(I1,I2-1,I3)
    END FORALL
                                                                \EDOC
This means that for any given values for I1 and I3,
all the right-hand sides for all values of I2 must
be computed before any assignment are done for that
specific pair of (I1,I3) values; but assignments for
one pair of (I1,I3) values need not wait for rhs
evaluation for a different pair of (I1,I3) values.

Graphically, the INDEPENDENT directive can be visualized as
eliminating edges from a precedence graph representing the program.
Figure~\ref{fig-dep} shows the dependences that may normally be
present in a DO an a FORALL.
An arrow from a left-hand-side node (for example, ``lhsa(1)'') 
to a right-hand-side node (e.g. ``rhsb(1)'') means that the RHS
computation may use values assigned in the LHS nodel; thus the
right-hand side must be computed after the left-hand side completes
its store.
Similarly, an arrow from a RHS node to a LHS node means that the LHS
may overwrite a value needed by the RHS computation, again forcing an
ordering.
Edges from the ``BEGIN'' and to the ``END'' nodes represent control
dependences.
The INDEPENDENT directive asserts that the only dependences that a
compiler need enforce are those in Figure~\ref{fig-indep}.
That is, the programmer who uses INDEPENDENT is certifying that if the
compiler only enforces these edges, then the resulting program will be
equivalent to the one in which all the edges are present.
Note that the set of asserted dependences is identical for INDEPENDENT
DO and FORALL constructs.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Here come the pictures!
%

{

%length for use in pictures
\setlength{\unitlength}{0.03in}

%nodes used in all pictures
\newsavebox{\nodes}
\savebox{\nodes}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(50.0,100){\makebox(0,0){BEGIN}}
    \put(20.0,80.0){\makebox(0,0){rhsa(1)}}
    \put(50.0,80.0){\makebox(0,0){rhsa(2)}}
    \put(80.0,80.0){\makebox(0,0){rhsa(3)}}
    \put(20.0,60.0){\makebox(0,0){lhsa(1)}}
    \put(50.0,60.0){\makebox(0,0){lhsa(2)}}
    \put(80.0,60.0){\makebox(0,0){lhsa(3)}}
    \put(20.0,40.0){\makebox(0,0){rhsb(1)}}
    \put(50.0,40.0){\makebox(0,0){rhsb(2)}}
    \put(80.0,40.0){\makebox(0,0){rhsb(3)}}
    \put(20.0,20.0){\makebox(0,0){lhsb(1)}}
    \put(50.0,20.0){\makebox(0,0){lhsb(2)}}
    \put(80.0,20.0){\makebox(0,0){lhsb(3)}}
    \put(50.0,0){\makebox(0,0){END}}
    \put(50.0,100){\oval(25,5)}
    \put(20.0,80.0){\oval(20,5)}
    \put(50.0,80.0){\oval(20,5)}
    \put(80.0,80.0){\oval(20,5)}
    \put(20.0,60.0){\oval(20,5)}
    \put(50.0,60.0){\oval(20,5)}
    \put(80.0,60.0){\oval(20,5)}
    \put(20.0,40.0){\oval(20,5)}
    \put(50.0,40.0){\oval(20,5)}
    \put(80.0,40.0){\oval(20,5)}
    \put(20.0,20.0){\oval(20,5)}
    \put(50.0,20.0){\oval(20,5)}
    \put(80.0,20.0){\oval(20,5)}
    \put(50.0,0){\oval(25,5)}
    \put(50,97.5){\vector(-2,-1){30}}
    \put(50,97.5){\vector(0,-1){15}}
    \put(50,97.5){\vector(2,-1){30}}
    \put(20,17.5){\vector(2,-1){30}}
    \put(50,17.5){\vector(0,-1){15}}
    \put(80,17.5){\vector(-2,-1){30}}
    \end{picture}
}

\begin{figure}

\begin{minipage}{2.70in}
\CODE
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\web}
\savebox{\web}{
    \begin{picture}(60,15)(0,0)
    \put(0,15){\vector(0,-1){15}}
    \put(0,15){\vector(2,-1){30}}
    \put(0,15){\vector(4,-1){60}}
    \put(30,15){\vector(-2,-1){30}}
    \put(30,15){\vector(0,-1){15}}
    \put(30,15){\vector(2,-1){30}}
    \put(60,15){\vector(0,-1){15}}
    \put(60,15){\vector(-2,-1){30}}
    \put(60,15){\vector(-4,-1){60}}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,62.5){\usebox\web}
\put(20,42.5){\usebox\web}
\put(20,22.5){\usebox\web}
\end{picture}
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\chain}
\savebox{\chain}{
    \begin{picture}(20,70)(0,0)
    \put(2.5,2.5){\oval(5,5)[bl]}
    \put(2.5,0){\vector(1,0){5}}
    \put(7.5,2.5){\oval(5,5)[br]}
    \put(10,2.5){\vector(0,1){32.5}}
    \put(10,35){\line(0,1){32.5}}
    \put(12.5,67.5){\oval(5,5)[tl]}
    \put(12.5,70){\vector(1,0){5}}
    \put(17.5,67.5){\oval(5,5)[tr]}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,77.5){\vector(0,-1){15}}
\put(20,57.5){\vector(0,-1){15}}
\put(20,37.5){\vector(0,-1){15}}
\put(25,15){\usebox\chain}
\put(50,77.5){\vector(0,-1){15}}
\put(50,57.5){\vector(0,-1){15}}
\put(50,37.5){\vector(0,-1){15}}
\put(55,15){\usebox\chain}
\put(80,77.5){\vector(0,-1){15}}
\put(80,57.5){\vector(0,-1){15}}
\put(80,37.5){\vector(0,-1){15}}
\end{picture}
\end{minipage}

\caption{Dependences in DO and FORALL without
INDEPENDENT assertions}
\label{fig-dep}
\end{figure}

\begin{figure}

%Draw the picture once, use it twice
\newsavebox{\easy}
\savebox{\easy}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(10,-2.5){\usebox\nodes}
    \put(20,77.5){\vector(0,-1){15}}
    \put(20,57.5){\vector(0,-1){15}}
    \put(20,37.5){\vector(0,-1){15}}
    \put(50,77.5){\vector(0,-1){15}}
    \put(50,57.5){\vector(0,-1){15}}
    \put(50,37.5){\vector(0,-1){15}}
    \put(80,77.5){\vector(0,-1){15}}
    \put(80,57.5){\vector(0,-1){15}}
    \put(80,37.5){\vector(0,-1){15}}
    \end{picture}
}

\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\usebox\easy
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\usebox\easy
\end{minipage}

\caption{Dependences in DO and FORALL with
INDEPENDENT assertions}
\label{fig-indep}
\end{figure}

}

%
%
% End of pictures
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


The compiler is justified in producing
a warning if it can prove that one of these assertions is incorrect.
It is not required to do so, however.
A program containing any false assertion of this type is not
standard-conforming, and the compiler may take any action it deems necessary.


This directive is of course similar to the DOSHARED directive
of Cray MPP Fortran.  A different name is offered here to avoid
even the hint of commitment to execution by a shared memory machine.
Also, the "mechanism" syntax is omitted here, though we might want
to adopt it as further advice to the compiler about appropriate
implementation strategies, if we can agree on a desirable set
of options.


\section{Other Proposals}

The following are proposals made for modification or replacement of the 
above sections.

\subsection{A Proposal for MIMD Support in HPF}

\label{mimd-support}
	          

\subsubsection{Abstract}

\footnote{Version of July 18, 1992 - Clemens-August Thole, GMD I1.T.
In the interest of time, these features were not considered for inclusion 
in the first round of HPFF.}
This proposal tries to supply sufficient language support in order 
to deal with loosely sysnchronous programs, some of which have been 
identified in my "A vote for explicit MIMD support".
This is a proposal for the support of MIMD parallelism, which extends
Section~\ref{do-independent}. 
It is more oriented
towards the CRAY - MPP Fortran Programming Model and the PCF proposal. 
The fine-grain synchronization of PCF is not proposed for implementation.
Instead of the CRAY-mechanims for assigning work to processors an 
extension of the ON-clause is used.
Due to the lack of fine-grain synchronization the constructs can be
executed
on SIMD or sequential architectures just by ignoring the additional
information.


\subsubsection{Summary of the current situation of MIMD support as part of
HPF}

According to the Charles Koelbel's (Rice) mail dated March 20th "Working
Group 4 -
Issues for discussion" MIMD-support is a topic for discussion within
working
group 4. 

Dave Loveman (DEC) has produced a document on FORALL statements 
(inorporated in Sections~\ref{forall-stmt} and \ref{forall-construct})
which
summarizes the discussion. Marc Snir proposed some extensions. These
constructs allow to describe SIMD extensions in an extended way compared
to array assignments. 

A topic for working papers is the interface of HPF Fortran to program units
which execute in SPMD mode. Proposals for "Local Subroutines" have been
made
by Marc Snir and Guy Steele
(Chapter~\ref{foreign}). Both proposals
define local subroutines as program units, which are executed by all
processors independent of each other. Each processor has only access
to the data contained in its local memory. Parts of distributed data
objects
can be accessed and updated by calls to a special library. Any
message-passing
library might be used for synchronization and communication.
This approach does not really integrate MIMD-support into HPF programming.

The MPP Fortran proposal by Douglas M. Pase, Tom MacDonald, Andrew Meltzer
(CRAY)
contained the following features in order to support integrated MIMD
features:
\begin{itemize}
   \item  parallel directive
   \item  shared loops 
   \item  private variables
   \item  barrier synchronization
   \item  no-barrier directive for removing synchronization
   \item  locks, events, critical sections and atomic update
   \item  functions, to examine the mapping of data objects.
\end{itemize}

Steele's "Proposal for loops in HPF" (02.04.92) included a proposal for a 
directive "!HPF$ INDEPENDENT( integer_variable_list)", which specifies
for the next set of nested loops, that the loops with the specified
loop variables can be executed independent from each other.
(Sectin~\ref{do-independent} is a short version of this proposal.) 

Charles Koelbel gave an overview on different styles for parallel loops
in "Parallel Loops Position Paper". No specific proposal was made.

Min-You Wu "Proposal for FORALL, May 1992" extended Guy Steele's 
"!HPF$ INDEPENDENT" proposal to use the directive in a block style.

Clemens-August Thole "A vote for explicit MIMD support" contains 3 examples
from different application areas, which seem to require MIMD support for
efficient execution. 

\paragraph{Summary}

In contrast to FORALL extensions MIMD support is currently not
well-established
as part of HPF Fortran. The examples in "A vote for explicit MIMD support"
show clearly the need for such features. Local subroutines do not fulfill
the requirements because they force to use a distributed memory programming
model,
which should not be necessary in most cases.

With the exception of parallel sections all interesting features
are contained in the MPP-proposal. I would like to split the discussion
on specifying parallelism, synchronization and mapping into three different
topics. Furthermore I would like to see corresponding features to be
expessed
in the style of of the current X3H5 proposal, if possible, in order to
be in line with upcoming standards.


\subsubsection{Proposal for MIMD support}

In order to support the spezification of MIMD-type of parallelism the
following
features are taken from the "Fortran 77 Binding of X3H5 Model for 
Parallel Programming Constructs": 
\begin{itemize}
    \item   PARALLEL DO construct/directive
    \item   PARALLEL SECTIONS worksharing construct/directive
    \item   NEW statement/directive
\end{itemize}

These constructs are not used with PCF like options for mapping or 
sysnchronisation but are combined with the ON clause for mapping operations
onto the parallel architecture. 

\paragraph{PARALLEL DO}

\subparagraph{Explicit Syntax}

The PARALLEL DO construct specifies parallelism among the 
iterations of a block of code. The PARALLEL DO construct has the same
syntax as a DO statement. For a directive approach the directive
!HPF$ PARALLEL can be used in front of a do statement.
After the PARALLEL DO statement a new-declaration may be inserted.

A PARALLEL DO construct might be nested with other parallel constructs. 

\subparagraph{Interpretation}

The PARALLEL DO is used to specify parallel execution of the iterations of
a block of code. Each iteration of a PARALLEL DO is an independent unit
of work. The iterations of PARALLEL DO must be data independent. Iterations
are data independent if the storage sequence accociated with each variable
are array element that is assigned a value by each iteration is not
referenced
by any other iteration. 

A program is not HPF conforming, if for any iteration a statement is
executed,
which causes a transfer of control out of the block defined by the PARALLEL
DO construct. 

The value of the loop index of a PARALLEL DO is undefined outside the scope
of the PARALLEL DO construct. 


\paragraph{PARALLEL SECTIONS}

The parallel sections construct is used to specify parallelism among
sections
of code.

\subparagraph{Explicit Syntax}


                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        !HPF$ SECTION
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
structured as
                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        [new-declaration-stmt-list]
        [section-block]
        [section-block-list]
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
where [section-block] is
                                                              \CODE
        !HPF$ SECTION
        [execution-part]
                                                              \EDOC

\subparagraph{Interpretation}

The parallel sections construct is used to specify parallelism among
sections
of code. Each section of the code is an independent unit of work. A program
is not standard conforming if during the execution of any parallel sections
construct a transfer of control out of the blocks defined by the Parallel
Sections construct is performed. 
In a standard conforming program the sections of code shall be data 
independent. Sections are data independent if the storage sequence
accociated 
with each variable are array element that is assigned a value by each
section
is not referenced by any other section. 


\paragraph{Data scoping}

Data objects, which are local to a subroutine, are different between 
distinct units of work, even if the execute the same subroutine.


\paragraph{NEW statement/directive}

The NEW statement/directive allows the user to generate new instances of 
objects with the same name as an object, which can currently be referenced.


\subparagraph{Explicit Syntax}

A [new-declaration-stmt] is
                                                                \CODE
       !HPF$ NEW variable-name-list
                                                                \EDOC

\subparagraph{Coding rules}

A [varable-name] shall not be
\begin{itemize} 
\item    the name of an assumed size array, dummy argument, common block, 
function or entry point
\item    of type character with an assumed length
\item    specified in a SAVE of DATA statement
\item    associated with any object that is shared for this parallel
construct.
\end{itemize}

\subparagraph{Interpretation}
 
Listing a variable on a NEW statement causes the object to be explicitly
private for the parallel construct. For each unit of work of the parallel 
construct a new instance of the object is created and referenced with the
specific name. 


\subsection{Nested WHERE statements}

\label{nested-where}

\footnote{Version of September 15, 1992 - Guy Steele, Thinking Machines 
Corporation.  This section has not been discussed.}
Here is the text of a proposal once sent to X3J3:
\begin{quote}
Briefly put, the less WHERE is like IF, the more difficult it is to
translate existing serial codes into array notation.  Such codes tend to
have the general structure of one or more DO loops iterating over array
indices and surrounding a body of code to be applied to array elements.
Conversion to array notation frequently involves simply deleting the DO
loops and changing array element references to array sections or whole
array references.  If the loop body contains logical IF statements, these
are easily converted to WHERE statements.  The same is true for translating
IF-THEN constructs to WHERE constructs, except in two cases.  If the IF
constructs are nested (or contain IF statements), or if ELSE IF is used,
then conversion suddenly becomes disproportionately complex, requiring the
user to create temporary variables or duplicate mask expressions and to use
explicit .AND. operators to simulate the effects of nesting.

Users also find it confusing that ELSEWHERE is syntactically and
semantically analogous to ELSE rather than to ELSE IF.

We propose that the syntax of WHERE constructs be extended and
changed to have the form
                                                                \BNF
where-construct       \IS  where-construct-stmt
 				    [ where-body-construct ]...
 				  [ elsewhere-stmt
 				    [ where-body-construct ]... ]...
 				  [ where-else-stmt
 				    [ where-body-construct ]... ]
 				  end-where-stmt
 
 	where-construct-stmt  \IS  WHERE ( mask-expr )
 
 	elsewhere-stmt        \IS  ELSE WHERE ( mask-expr )
 
 	where-else-stmt       \IS  ELSE WHERE
 
 	end-where-stmt        \IS  END WHERE
 
 	mask-expr             \IS  logical-expr
 
 	where-body-construct  \IS  assignment-stmt
 			      \IS  where-stmt
 			      \IS  where-construct
                                                                \FNB                                                     	

\noindent Constraint: In each assignment-stmt, the mask-expr and the variable
being defined must be arrays of the same shape.  If a
where-construct contains a where-stmt, an elsewhere-stmt,
or another where-construct, then the two mask-expr's must
be arrays of the same shape.
 
The meaning of such statements may be understood by rewrite rules.  First
one may eliminate all occurrences of ELSE WHERE:
                                                                \CODE
WHERE (m1)		
    xxx			
ELSE WHERE (m2)		
    yyy				
END WHERE
	                                                            \EDOC
becomes
                                                                \CODE
WHERE (m1)
    xxx
ELSE
    WHERE (m2)
        yyy
    END WHERE
END WHERE
                                                                \EDOC
where xxx and yyy represent any sequences of statements, so long as the
original WHERE, ELSE WHERE, and END WHERE match, and the ELSE WHERE is the
first ELSE WHERE of the construct (that is, yyy may include additional ELSE
WHERE or ELSE statements of the construct).  Next one eliminates ELSE:
                                                                \CODE
WHERE (m)
    xxx
ELSE
    yyy
END WHERE				WHERE (.NOT. temp)
                                                                \EDOC
becomes
                                                                \CODE
temp = m
WHERE (temp)
    xxx
END WHERE
WHERE (.NOT. temp)
    yyy
END WHERE
                                                                \EDOC

Finally one eliminates nested WHERE constructs:
                                                                \CODE
WHERE (m1)
    xxx
    WHERE (m2)
        yyy
    END WHERE
    zzz
END WHERE
                                                                \EDOC
becomes
                                                                \CODE
temp = m1
WHERE (temp)
    xxx
END WHERE
WHERE (temp .AND. (m2))
    yyy
END WHERE
WHERE (temp)
    zzz
END WHERE
                                                                \EDOC
and similarly for nested WHERE statements.

The effects of these rules will surely be a familiar or obvious possibility
to all the members of the committee; I enumerate them explicitly here only
so that there can be no doubt as to the meaning I intend to support.

Such rewriting rules are simple for a compiler to apply, or the code may
easily be compiled even more directly.  But such transformations are
tedious for our users to make by hand and result in code that is
unnecessarily clumsy and difficult to maintain.

One might propose to make WHERE and IF even more similar by making two
other changes.  First, require the noise word THERE to appear in a WHERE
and ELSE WHERE statement after the parenthesized mask-expr, in exactly the
same way that the noise word THEN must appear in IF and ELSE IF statements.
(Read aloud, the results might sound a trifle old-fashioned--"Where knights
dare not go, there be dragons!"--but technically would be as grammatically
correct English as the results of reading an IF construct aloud.)  Second,
allow a WHERE construct to be named, and allow the name to appear in ELSE
WHERE, ELSE, and END WHERE statements.  I do not feel very strongly one way
or the other about these no doubt obvious points, but offer them for your
consideration lest the possibilities be overlooked.
\end{quote}

Now, for compatibility with Fortran 90, HPF should continue to
use ELSEWHERE instead of ELSE, but this causes no ambiguity:

      WHERE(...)
	...
      ELSE WHERE(...)
	...
      ELSEWHERE
	...
      END WHERE

is perfectly unambiguous, even when blanks are not significant.
Since X3J3 declined to adopt the keyword THERE, it should not be
used in HPF either (alas).

\alternative A
\subsection{
A Proposal for EXECUTE-ON Directive in HPF
}

\label{on-clause}

\footnote{Version of 
September 14, 1992
--
Tin-Fook Ngai,
Hewlett-Packard Laboratories.
This section has not been disussed.}
The proposed EXECUTE-ON directive is used to suggest where an iteration of
a DO construct or an indexed parallel assignment should be executed.  The
directive informs the compiler which data access should be local and
which data access may be remote.  


\subsubsection{Syntax}
                                                                \BNF
on-clause \IS !HPF$ EXECUTE (subscript-list) ON align-spec 
                [; LOCAL array-name-list]
                                                                \FNB

\noindent Constraint:
Each point in the index space must be executed on only one template node.


\subsubsection{Usage}

The EXECUTE-ON directive must immediately precede the corresponding DO
loop body, array assignment, FORALL statement, FORALL construct or
individual assignment statement in a FORALL construct.


\subsubsection{Interpretation}

The subscript-list identifies a distinct iteration index or an indexed
parallel assignment.  The align-spec identifies a template node.  The
EXECUTE-ON directive suggests that the iteration or parallel assignment
should be executed on the processor to where the template node is
mapped.  The optional LOCAL directive informs the compiler that all
data accesses to the specified array-name-list can be handled as local
data accesses if the related HPF data mapping directives are honored.


\subsubsection{Examples}

\paragraph{Example 1}
                                                                \CODE
      REAL A(N), B(N)
!HPF$ TEMPLATE T(N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ DISTRIBUTE T(CYCLIC(2))

      !HPF$ INDEPENDENT            
      DO I = 1, N/2 
      !HPF$ EXECUTE (I) ON T(2*I); LOCAL A, B, C
      ! we know that P(2*I-1) and P(2*I) is a permutation
      ! of 2*I-1 and 2*I
        A(P(2*I - 1)) = B(2*I - 1) + C(2*I - 1)    
        A(P(2*I)) = B(2*I) + C(2*I)
      END DO
                                                                \EDOC

\paragraph{Example 2}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

      !HPF$ EXECUTE (I,J) ON T(I+1,J-1)
      FORALL (I=1:N-1, J=2:N)   A(I,J) = A(I+1,J-1) + B(I+1,J-1)
                                                                \EDOC

\paragraph{Example 3}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

      !HPF$ EXECUTE (I,J) ON T(I,J)       
      ! applies to the entire FORALL construct
      FORALL (I=1:N-1, J=2:N) 
          A(I,J) = A(I+1,J-1) + B(I+1,J-1)
          B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
                                                                \EDOC

\paragraph{Example 4}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

      FORALL (I=1:N-1, J=2:N) 
      !HPF$ EXECUTE (I,J) ON T(I,J)
      ! applies only to the following assignment
          A(I,J) = A(I+1,J-1) + B(I+1,J-1)
          B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
                                                                \EDOC

\paragraph{Example 5}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

      !HPF$ EXECUTE (I,J) ON T(I+1,J-1)
      A(1:N-1,2:N) = A(2:N,1:N-1) + B(2:N,1:N-1)
                                                                \EDOC
   
\paragraph{Example 6} 

The original program for this example is due to Michael Wolfe of Oregon 
Graduate Institute.

This program performs matrix multiplication \(C = A \times B\)
In each step, array B is rotated by row-blocks, multiplied
diagonal-block-wise in parallel with A, results are accumulated in C 

Note that without the EXECUTE-ON and LOCAL directive, the compiler
will have a hard time to figure out all A, B and C accesses are 
actual local, thus unable to generate the best efficient code 
(i.e. communication-free and no runtime checking in the parallel 
loop body).
 
                                                                \CODE
      REAL A(N,N), B(N,N), C(N,N)
!HPF$ REALIGNABLE B

!* A,B,C are distributed by row blocks
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN (:,:) WITH T:: A, B, C      
!HPF$ DISTRIBUTE T(BLOCK,*)

      NOP = NUMBER_OF_PROCESSORS()
      IB = N/NOP

      DO IT = 0, NOP-1

!* assuming warp around realignment
!HPF$ REALIGN B(I,J) WITH T(I-IT*IB,J)  

        !HPF$ INDEPENDENT
        DO IP = 0, NOP-1     
          !HPF$ EXECUTE (IP) ON T(IP*IB+1,1); LOCAL A, B, C
          ITP = MOD( IT+IP, NOP )

          DO I = 1, IB
            DO J = 1, N
              DO K = 1, IB
                C(IP*IB+I,J) = C(IP*IB+I,J) +             &
                  A(IP*IB+I,ITP*IB+K)*B(ITP*IB+K,J)
              ENDDO  !* K
            ENDDO  !* J
          ENDDO  !* I
        ENDDO  !* IP

      ENDDO  !* IT
                                                                \EDOC

\subsubsection{Commentary}

The following is a discussion between Henk Sips and Tin-Fook Ngai from 
the mailing list.  It is included to clarify some issues in the preceeding.

\begin{enumerate}
\item Sips: The execution model of HPFF (not completely approved yet) states: 
\begin{quote}
The code compiled by an HPF compiler ought do no worse than code
compiled using the owner compute rule.
\end{quote}
This is more relaxed than saying "it uses the owner compute rule". Your
EXECUTE ON is much more specific towards fixing execution on a specified
processor. 

Ngai: What are you objecting to? The EXECUTE ON is a directive.

\item Sips: A template is not executed, so one can't say EXECUTE x ON T. 
Something like 
EXECUTE\_ON\_HOME T should be adopted (Since templates are currently no
objects one could even deny this)

Ngai: Agree.  I never feel comfortable with the key words I used.  I don't have
objection to ``EXECUTE x ON\_HOME T''.  Any other suggestions are also
welcome.


\item Sips: Adopting EXECUTE x ON on DO loops without any indepence requirements, as
your proposal seems to allow, can yield all kind of intricate
synchronization schemes, when iterations are not independent (or must be
assumed to be dependent). This seems to go further than the first simple
step, which HPFF ought to be.

Ngai: Clearly, the proposed feature is primarily intended for INDEPENDENT DO,
FORALL and other parallel indexed assignments.  Before making the
proposal, I have also thought about ordinary DO loops as you pointed out
here. Code generation seems not a problem: If the user choose to specify
execution location of an iteration of an ordinary DO loops, simple
compilation requires only one synchronication at the end of each iteration
to ensure the DO sequential semantics. This naive compilation looks dumb
but the user may still gain due to the already data distribution.  (A
smarter compiler of course can do a better job but definitely is not
required.)  That is why I don't restrict EXECUTE ON to INDEPENDENT DOs and
make the rule simpler.


\item Sips: Binding iterations to templates can currently only be done statically,
since 
the current draft does not allow dynamic templates. So iterations
boundaries must be known at compile time. One has to apply the subroutine
trick to allow this, which is not very neat.

Ngai: That is the intention of the proposal:  Only static binding is allowed.
Even the loop index is bounded by runtime variable, the binding to
template node is still static.


\item Sips: Allowing EXECUTE x ON on groups of statements, gives a scoping issue, so
there should also be something like END EXECUTE x ON, do undo the
annotation. 

Ngai: The current proposal seems sufficient in this issue.  The scope for single
statement (FORALL statement, single statement in FORALL construct, and array
assignment statement) is clear.  For groups of statements, EXECUTE ON can
only applies to either the entire body within a FORALL construct or the
entire iteration of a DO loop.


\item Sips: Again we have complicated scoping problems. How about this example:
                                                                \CODE
!HPF$ TEMPLATE T1(N), T2(N)

DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    DO J=1,N
      !HPF$ EXECUTE (J) ON T2(J)
      A(I,J) =  A(I,J) + B(I,J) 
    ENDDO
ENDDO
                                                                \EDOC
This example satisfies the constraint only if by entering the J-loop, the
I-index is dereferenced from the assertion just after the I-loop. Although
logical, it might be confusing to users. However, in the program
                                                                \CODE
!HPF$ TEMPLATE T1(N), T2(N)
DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    DO J=1,N
        A(I,J) =  A(I,J) + B(I,J) 
    ENDDO
ENDDO
                                                                \EDOC
there is no such dereferencing. 

Ngai: Good examples.  This bug surely needs to be fixed.  Here is my solution:
\begin{itemize}
\item For nested EXECUTE ON directives, only the immediate enclosed EXECUTE ON
  directive is effective.
\end{itemize}
In the former example, the statement ``C(I) = D(I)" will be executed on the
home of T1(I) while the statement ``A(I,J) =  A(I,J) + B(I,J)" will be
executed on home of T2(J) for all I.  In the latter case, the entire
I-loop body that includes the DO J loop is executed on the home of T1(I).

\item Sips: The wrap feature of templates will probably be deleted from the draft.
The same thing (shifting data each iteration) can reached by using CSHIFT
or the subroutine trick and making the template as large as the ieteration
space.

Ngai: I and Wolfe discussed the example (Example 6 in the proposal) long before
our revision on the wrap feature.  Sorry for any confusion from this
example.  (However, this example also illustrates the use of wrap in data
distribution -- we should come up with a cleaner solution next meeting.)

\item Sips: We cannot do separate compilation in some examples:
                                                                \CODE
!HPF$ TEMPLATE T1(N)
  DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    DO J=1,N
      A(I,J) = B(I,J)
    ENDDO
  ENDDO
                                                                \EDOC
Here A(I,J) is calculated in T(I). If we encapsulate the J-loop into a
subroutine we get something like:
                                                                \CODE
!HPF$ TEMPLATE T1(N)
  DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    CALL FOO(A(I,:),B(I,:))
  ENDDO

...

SUBROUTINE FOO(AA,BB)
!HPF$ ALIGN AA,BB with *
    DO J=1,N
      AA(J) = BB(J)
    ENDDO
                                                                \EDOC
The dummy arguments AA and BB are aligned to the incoming array sections.
Normally, without any EXECUTE\_ON, the subroutine would execute AA(J), which
is equivalent to A(I,J), on home A(I,J). However, with an EXECUTE\_ON in the
main program there is no way the subroutine can know that AA(J) should be
executed on T(I). The consequence is that any subroutine call is to *undo*
the EXECUTE\_ON on entry and {\em redo} the EXECUTE_ON on return.

(Ngai did not respond publically.)
\end{enumerate}

\alternative B
\subsection{Proposal for a Statement Grouping Syntax and ON Clause}

\footnote{Version of October 5, 1992 -- Clemens-August Thole, GMD-I1.T,
Sankt Augustin.
This section has not been discussed.}
I agree with Tin-Fook, that something like the on-clause should be 
contained in HPF. I brought a proposal with me to the last HPF meeting 
which was distributed by Chuck, but neither the FORALL working group
nor the plenary had time to discuss the proposal.

I would appreciate comments on the various features.

\subsubsection{Introduction}

This proposal introduces an extension to HPF to group several 
statements in order to be able to specify properties for a whole block
of statement at once. A block of statements is called HPF-section.
HPF-sections can be used to describe properties for independent execution
between blocks of statements aswell as the mapping of their execution.

For the specification of a specific mapping of the execution of statements
or HPF-sections the ON-clause is introduced. A subset of a template is used
as reference object onto which the statements are mapped in an canonical
manner. The careful selection of the reference template allows to specify,
how the execution of the code is mapped onto the parallel architecture.


\subsubsection{HPF-sections}

The HPF directives SECTIONS, SECTION, and END SECTIONS are used to specify
grouping of statements. SECTIONS and END SECTIONS specify the beginning
and end of a list of HPF-sections and SECTION the beginning of the next 
HPF-section. The syntax is as follows:
                                                                \BNF
hpf-block \IS        !HPF$ SECTION
                [HPF-section-list]
        !HPF$ END SECTIONS

hpf-section \IS        !HPF$ SECTION
                [execution-part]
                                                                \FNB
\noindent Constraint: For any {\em hpf-section} under no circumstances a 
transfer of control
is performed during the execution of the code outside of its 
{\em execution-part}.

\paragraph{Example}
                                                                \CODE
        !HPF$ SECTIONS
        !HPF$ SECTION
                A = A + B
                B = C + D
        !HPF$ SECTION
                E = B
                IF (E.GT.F) GOTO 10
                        E = 0D0
         10     CONTINUE
        !HPF$ END SECTIONS
                                                                \EDOC
This example specifies a list of two HPF-sections. The control statement in
the second HPF-section is valid because after the transfer of control the
execution continues in the same HPF-section.


\subsubsection{ON-clause}

The ON-clause specifies a subsection of a template, which is used as a reference
object for the execution of the next statement, construct, of HPF-section.
If the left-hand-side of an assignment coinsides in shape with the reference
object, the evaluation of the right-hand-side and the assignment for 
a specific element of the left-hand-side is performed at that processor, onto
which the corresponding element of the reference object is mapped.

\paragraph{Syntax}

Add the following rules:
                                                                \BNF
executable-construct \IS        !HPF$ ON on-spec
                executable-construct

hpf-section \IS        !HPF$ ON on-spec
                hpf-section
        
on-spec \IS        align-spec
                                                                \FNB
The {\it executable-construct} of {\it hpf-section} is called on-clause-target.

\paragraph{Constraints}
\begin{enumerate}
\item No {\it executable-construct} may be used as object of the on-clause, which
   generates any transfer of control out of the construct itself. This
   includes the entry-statement. 
\item {\it Statement-block}s used in constructs must fulfill the constraints of
   HPF-sections.
\item The shape of the {\it on-spec} must cover in each dimension the shape of
   of any left-hand-side of an assignment statement, which is target of an
   on-clause. If a "*" is used in the {\it on-spec}, this dimension is skipped
   for constructing the shape of the {\it on-spec}.
\item If an on-clause is contained in the on-clause-target, the new {\it on-spec}
   must be a subsection of the {\it on-spec} of the outer on-clause.
\end{enumerate}

\paragraph{Example}
                                                                \CODE
                REAL, DIMENSION(n) :: a, b, c, d
        !HPF$   TEMPLATE grid(n)
        !HPF$   ALIGN WITH grid :: a, b, c, d

        !HPF$   ON grid(2:n)
                a(1:n-1) = a(2:n) + b(2:n) + c(2:n)
                                                                \EDOC
The on-clause indicates, that the evaluation of the right-hand-side is 
performed on that processors, which hold the data elements of the 
right-hand-side. For the assignment to the left-hand-side data movement is
necessary.

\paragraph{Interpretation}

The interpretation of the on-clause depends on the type of the on-clause-target.

If the on-clause-target is an assignment statement the {\it on-spec} is used to
determine where the assignment statement is executed. If the shape of the 
right-hand-side is identically to the shape of {\it on-spec}, the computation for
a specific element of the assignment statement is performed where the 
corresponding element of the {\it on-spec} is mapped to. If the shape of the 
{\it on-spec} is larger, the compiler may use any sufficient larger subsection.
The use of "*" in the {\it on-spec} specifies, that the same computations are
mapped onto the corresponding line of processors and several processors
will do the same update. This may save communication operations.
The the case of the where-statement, the forall-statement, and the 
forall-construct the same mapping is applied to the evaluation of the 
conditions and each assignment.

If the on-clause is placed in front of the if-construct, that case-construct,
or the do-construct, the {\it on-spec} is used for the evaluations of the 
conditions as well as the loop bounds and the execution of the statement-blocks,
which are part of the construct. For the statement-blocks the interpretation 
rules for HPF-sections apply.

With respect to the allocate, deallocate, nullify, and I/O related statements
the {\it on-spec} is used for the evaluation of the parameters of the statements
and the evaluation of I/O objects. 

In the case of subroutine calls and functions the {\it on-spec} is used for the
evaluation of the parameters. It determines also the mapping of the resulting 
object. The {\it on-spec} determines also the set of processors, which will be
used for the evaluation of the subroutine. 

In the case of HPF-sections the on-clause is applied to each statement of the
execution part. Control transfer statements are allowed in this case and the 
constraints ensure, that the context on the same {\it on-spec} is not lost.

\paragraph{Additional example}
                                                                \CODE
        REAL, DIMENSION(n,n) :: a, b, c, d
!HPF$   TEMPLATE grid(n,n)
!HPF$   ALIGN WITH grid :: a, b, c, d

!HPF$   ON grid(2:n,2:n)
        DO i=2,n
!HPF$       ON grid(i,2:n)
            DO j=2,n
!HPF$           ON grid(i,j)
                a(i-1,j-1) = a(i,j) + b(i,j)*c(i,j)
            ENDDO
        ENDDO
                                                                \EDOC

\paragraph{Comment}

The compiler should be able to adjust the span of the loops to the local 
extent 
due to the restrictions on the specifiers of the sections of the {\it 
on-spec}.


\subsection{ALLOCATE in FORALL}

\label{forall-allocate}

\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation.
At the September 10-11 meeting, this was not included as part of the
FORALL because it seemed too big a leap from the allowed assignment
statements.}
Proposal:  ALLOCATE, DEALLOCATE, and NULLIFY statements may appear
	in the body of a FORALL.

Rationale: these are just another kind of assignment.  They may have
	a kind of side effect (storage management), but it is a
	benign side effect (even milder than random number generation).

Example:
                                                            \CODE
      TYPE SCREEN
        INTEGER, POINTER :: P(:,:)
      END TYPE SCREEN
      TYPE(SCREEN) :: S(N)
      INTEGER IERR(N)
      ...
!  Lots of arrays with different aspect ratios
      FORALL (J=1:N)  ALLOCATE(S(J)%P(J,N/J),STAT=IERR(J))
      IF(ANY(IERR)) GO TO 99999
                                                            \EDOC

\subsection{Generalized Data References}

\label{data-ref}

\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation.
This was not acted on at the September 10-11 meeting because the
FORALL subgroup wanted to minimize changes to the Fortran~90 standard.}
Proposal:  Delete the constraint in section 6.1.2 of the Fortran 90
	standard (page 63, lines 7 and 8):
\begin{quote}
	Constraint: In a data-ref, there must not be more than one
		part-ref with nonzero rank.  A part-name to the right
		of a part-ref with nonzero rank must not have the
		POINTER attribute.
\end{quote}

Rationale: further opportunities for parallelism.

Example:
                                                                     \CODE
TYPE(MONARCH) :: C(N), W(N)
      ...
! Munch that butterfly
C = C + W * A%P		! Illegal in Fortran 90
                                                                      \EDOC


\subsection{FORALL with INDEPENDENT Directives}
\label{begin-independent}

\footnote{Version of July 21, 1992) - Min-You Wu.
This was rejected at the FORALL subgroup meeting on September 9, 1992,
because it only offered syntactic sugar for capabilities already in
the FORALL INDEPENDENT.  It was also suggested that the BEGIN
INDEPENDENT syntax
should be reserved for other uses, such as MIMD features.}
This proposal is an extension of Guy Steele's INDEPENDENT proposal.
We propose a block FORALL with the directives for independent 
execution of statements.  The INDEPENDENT directives are used
in a block style.  

The block FORALL is in the form of
                                                         \CODE
      FORALL (...) [ON (...)]
        a block of statements
      END FORALL
                                                         \EDOC
where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:
                                                         \CODE
!HPF$BEGIN INDEPENDENT
!HPF$END INDEPENDENT
                                                         \EDOC
The two directives must be used in pair.  
A sub-block of statements 
parenthesized in the two directives is called an {\em asynchronous} 
sub-block or {\em independent} sub-block.  
The statements that are 
not in an asynchronous sub-block are in {\em synchronized} sub-blocks
or {\em non-independent} sub-block.  
The synchronized sub-block is 
the same as Guy Steele's synchronized FORALL statement, and the 
asynchronous sub-block is the same as the FORALL with the INDEPENDENT 
directive.  
Thus, the block FORALL
                                                          \CODE
      FORALL (e)
        b1
!HPF$BEGIN INDEPENDENT
        b2
!HPF$END INDEPENDENT
        b3
      END FORALL
                                                           \EDOC
means roughly the same as
                                                           \CODE
      FORALL (e)
        b1
      END FORALL
!HPF$INDEPENDENT
      FORALL (e)
        b2
      END FORALL
      FORALL (e)
        b3
      END FORALL
                                                          \EDOC
														  
Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronous sub-block are completely independent.
The INDEPENDENT directives indicates to the compiler there is no 
dependence and consequently, synchronizations are not necessary.
It is users' responsibility to ensure there is no dependence
between instances in an asynchronous sub-block.
A compiler can do dependence analysis for the asynchronous sub-blocks
and issue an error message when there exists a dependence or a warning
when it finds a possible dependence.

\subsubsection{What does ``no dependence between instances" mean?}

It means that no true dependence, anti-dependence,
or output dependence between instances.
Examples of these dependences are shown below:
\begin{enumerate}
\item True dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i) = ... 
        ...  = x(i+1)
      END FORALL
                                                            \EDOC
Notice that dependences in FORALL are different from that in a DO loop.
If the above example was a DO loop, that would be an anti-dependence.

\item Anti-dependence:
                                                            \CODE
      FORALL (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END FORALL
                                                            \EDOC

\item Output dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i+1) = ... 
        x(i) = ...
      END FORALL
                                                            \EDOC
\end{enumerate}

Independent does not imply no communication.  One instance may access 
data in the other instances, as long as it does not cause a dependence.  
The following example is an independent block:
                                                            \CODE
      FORALL (i = 1:N)
!HPF$BEGIN INDEPENDENT
        x(i) = a(i-1)
        y(i-1) = a(i+1)
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC

\subsubsection{Statements that can appear in FORALL}

FORALL statements, WHERE-ELSEWHERE statements, some intrinsic functions 
(and possibly elemental functions and subroutines) can appear in the
FORALL:
\begin{enumerate}
\item FORALL statement
                                                            \CODE
      FORALL (I = 1 : N)
        A(I,0) = A(I-1,0)
        FORALL (J = 1 : N)
!HPF$BEGIN INDEPENDENT
          A(I,J) = A(I,0) + B(I-1,J-1)
          C(I,J) = A(I,J)
!HPF$END INDEPENDENT
        END FORALL
      END FORALL
                                                            \EDOC

\item WHERE
                                                            \CODE
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
\end{enumerate}


\subsubsection{Rationale}

\begin{enumerate}
\item A FORALL with a single asynchronous sub-block as shown below is 
the same as a do independent (or doall, or doeach, or parallel do, etc.).
                                                            \CODE
      FORALL (e)
!HPF$BEGIN INDEPENDENT
        b1
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
A FORALL without any INDEPENDENT directive is the same as a tightly 
synchronized FORALL.  We only need to define one type of parallel 
constructs including both synchronized and asynchronous blocks.  
Furthermore, combining asynchronous and synchronized FORALLs, we 
have a loosely synchronized FORALL which is more flexible for many 
loosely synchronous applications.

\item With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  The INDEPENDENT directives can act 
as barrier synchronizations.  One may suggest a smart compiler 
that can recognize dependences and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependences.  
When the compiler cannot determine whether there is a dependence, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.
\end{enumerate}


\end{document}

From ngai@hpltfn.hpl.hp.com  Thu Oct 15 12:04:59 1992
Received: from hplms2.hpl.hp.com by titan.cs.rice.edu (AA11707); Thu, 15 Oct 92 12:04:59 CDT
Received: from hpltfn.hpl.hp.com by hplms2.hpl.hp.com with SMTP
	(16.5/15.5+IOS 3.20) id AA25417; Thu, 15 Oct 92 10:04:55 -0700
Received: by hpltfn.hpl.hp.com
	(16.6/15.5+IOS 3.14) id AA12895; Thu, 15 Oct 92 10:04:42 -0700
Date: Thu, 15 Oct 92 10:04:42 -0700
From: Tin-Fook Ngai <ngai@hpltfn.hpl.hp.com>
Message-Id: <9210151704.AA12895@hpltfn.hpl.hp.com>
To: chk@cs.rice.edu
Cc: hpff-core@cs.rice.edu, hpff-forall@cs.rice.edu
In-Reply-To: chk@cs.rice.edu's message of Wed, 14 Oct 1992 16:57:50 -0600 <9210142154.AB09950@cs.rice.edu>
Subject: Revised proposal on EXECUTE-ON (LaTeX)


Here is the promised LaTeX edition using our macros.  Please update
(simply cut and paste) the corresponding subsection in the HPF Language
Specification draft.  Thanks.

Tin-Fook
----------------------

\subsection{EXECUTE-ON-HOME Directive}
\label{on-clause}

\footnote{Version of 
October 14, 1992
--
Tin-Fook Ngai,
Hewlett-Packard Laboratories.
This section has not been disussed.}

The EXECUTE-ON-HOME directive is used to suggest where an
iteration of a DO construct or an indexed parallel assignment should be
executed.  The directive informs the compiler which data access should be
local and which data access may be remote.
 
                                                                       \BNF
execute-on-home-directive  \IS  EXECUTE (subscript-list) ON_HOME align-spec 
[; LOCAL array-name-list]
                                                                       \FNB

The EXECUTE-ON-HOME directive must immediately precede the corresponding
DO loop body, array assignment, FORALL statement, FORALL construct or
individual assignment statement in a FORALL construct.

The scope of an EXECUTE-ON-HOME directive is the entire loop body of the
enclosing DO construct, or the following array assignment, FORALL
statement, FORALL construct or assignment statement in a FORALL construct.

The {\em subscript-list} identifies a distinct iteration index or an indexed
parallel assignment.  The {\em align-spec} identifies a template node.  Every
iteration index or indexed assignment must be associated with one
and only one template node.  The EXECUTE-ON-HOME directive suggests that
the iteration or parallel assignment should be executed on the processor
to where the template node is mapped.  When the EXECUTE-ON-HOME directive
is applied to a subroutine call, it affects only the execution location of
the caller but not the execution location of the called subroutine.

The optional LOCAL directive informs the compiler that all data accesses
to the specified {\em array-name-list} can be handled as local data
accesses if the related HPF data mapping directives are honored.

EXECUTE-ON-HOME directives can be nested, but only the immediately
preceding EXECUTE-ON-HOME directive is effective.


\paragraph{Example 1}
                                                                         \CODE 
      REAL A(N), B(N)
!HPF$ TEMPLATE T(N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ DISTRIBUTE T(CYCLIC(2))

!HPF$ INDEPENDENT            
      DO I = 1, N/2 
!HPF$ EXECUTE (I) ON_HOME T(2*I); LOCAL A, B, C
      ! we know that P(2*I-1) and P(2*I) is a permutation 
      ! of 2*I-1 and 2*I
        A(P(2*I - 1)) = B(2*I - 1) + C(2*I - 1)    
        A(P(2*I)) = B(2*I) + C(2*I)
      END DO
                                                                         \EDOC 


\paragraph{Example 2}
                                                                         \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ EXECUTE (I,J) ON_HOME T(I+1,J-1)
      FORALL (I=1:N-1, J=2:N)   A(I,J) = A(I+1,J-1) + B(I+1,J-1)
                                                                         \EDOC


\paragraph{Example 3}
                                                                         \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ EXECUTE (I,J) ON_HOME T(I,J)  
      ! apply to the entire FORALL construct
      FORALL (I=1:N-1, J=2:N) 
        A(I,J) = A(I+1,J-1) + B(I+1,J-1)
        B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
                                                                         \EDOC


\paragraph{Example 4}
                                                                         \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
      FORALL (I=1:N-1, J=2:N) 
!HPF$ EXECUTE (I,J) ON_HOME T(I,J)  
      ! applies only to the following assignment
        A(I,J) = A(I+1,J-1) + B(I+1,J-1)
        B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
                                                                         \EDOC


\paragraph{Example 5}
                                                                         \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

!HPF$ EXECUTE (I,J) ON_HOME T(I+1,J-1)
      A(1:N-1,2:N) = A(2:N,1:N-1) + B(2:N,1:N-1)
                                                                         \EDOC

   
\paragraph{Example 6}

The original program for this example is due to Michael Wolfe of Oregon 
Graduate Institute.

This program performs matrix multiplication \(C = A \times B\).
In each step, array B is rotated by row-blocks, multiplied
diagonal-block-wise in parallel with A, results are accumulated in C 

Note that without the EXECUTE-ON-HOME and LOCAL directive, the compiler
will have a hard time to figure out all A, B and C accesses are actually
local, thus it is unable to generate the best efficient code (i.e.
communication-free and no runtime checking in the parallel loop body).
 
                                                                         \CODE
      REAL A(N,N), B(N,N), C(N,N)

      PARAMETER(NOP = NUMBER_OF_PROCESSORS())
!HPF$ REALIGNABLE B
!HPF$ TEMPLATE T(2*N,N)               ! to allow wrap around mapping
!HPF$ ALIGN (I,J) WITH T(I,J):: A, C      
!HPF$ ALIGN B(I,J) WITH T(N+I,J)
!HPF$ DISTRIBUTE T(CYCLIC(N/NOP),*)   ! distributed by row blocks

      IB = N/NOP

      DO IT = 0, NOP-1

      ! rotate B by row-blocks
!HPF$ REALIGN B(I,J) WITH T(N-IT*IB+I,J)  

!HPF$ INDEPENDENT                     ! data parallel loop
        DO IP = 0, NOP-1     
!HPF$ EXECUTE (IP) ON_HOME T(IP*IB+1,1); LOCAL A, B, C
          ITP = MOD( IT+IP, NOP )

          DO I = 1, IB
            DO J = 1, N
              DO K = 1, IB
                C(IP*IB+I,J) = C(IP*IB+I,J) +                      
     1                         A(IP*IB+I,ITP*IB+K)*B(ITP*IB+K,J)
              ENDDO  ! K 
            ENDDO  ! J 
          ENDDO  ! I 
        ENDDO  ! IP

      ENDDO  ! IT
                                                                        \EDOC


From presberg@tc.cornell.edu  Thu Oct 15 14:57:58 1992
Received: from theory.TC.Cornell.EDU by titan.cs.rice.edu (AA16934); Thu, 15 Oct 92 14:57:58 CDT
Received: by theory.TC.Cornell.EDU id AA03814
  (5.65c/IDA-1.4.4); Thu, 15 Oct 1992 15:57:54 -0400
Date: Thu, 15 Oct 1992 15:57:54 -0400
From: David Presberg <presberg@tc.cornell.edu>
Message-Id: <199210151957.AA03814@theory.TC.Cornell.EDU>
To: Tin-Fook Ngai <ngai@hpltfn.hpl.hp.com>, chk@cs.rice.edu
Subject: "Revised proposal on EXECUTE-ON..." by Ngai, versus that in "nugatory"
References: <9210142154.AB09950@cs.rice.edu>
	<9210151704.AA12895@hpltfn.hpl.hp.com>
        <9210142056.AA14106@erato.cs.rice.edu>
Cc: presberg@tc.cornell.edu, hpff-core@cs.rice.edu, hpff-forall@cs.rice.edu

Gentlemen (Chuck and Tin-Fook) --

Pardon me, but I'm confused.

I just got 46 pages (post-LaTeX printed copy) of Chuck's "latest and
greatest" Forall chapter via the hpff-forall distribution list (and
deliberately NOT to hpff-core).

And then, almost immediately I got Tin-Fook's "Revised proposal..."
that was sent to both the hpff-forall AND to the hpff-core lists!!

There are quite noticeable differences, perhaps only in the spellings
of keywords, but also perhaps in semantic (static) constraints.  The
layout is sufficiently different that a machine-compare is of no use.

Do I have to do a word-by-word comparison to decide that they are in
fact the same thing?  Or are they intended to be different?

Please, could either of you comment, briefly, as to the status of
each/either version?  (I suggest that you both have to now send the
explanation to hpff-forall and to the hpff-core lists.)

----------

It seems a little late in the year to be having some problems keeping
track of proposals!  (Well, I guess I will regret that remark since I
have not volunteered to assemble any documents or keep track of any
files...)  And the real shame is that at first glance I am quite
inclined to help promote some kind of "ON clause" directive, oriented
to "iteration-space" distribution (in addition to all of our
previous "data-space" decomposition directives).

I hope our Forum procedures hold together until the end of the year.

-- Pres
- -----------------------------------------------------------------
- David L. Presberg, Parallel Systems Software Engineer, CNSF/TIG
- 740 Engineering and Theory Center Building, Cornell University,
- Ithaca, NY 14853-3801 607-254-8861 presberg@theory.tc.cornell.edu
- -----------------------------------------------------------------

From ngai@hpltfn.hpl.hp.com  Thu Oct 15 15:52:41 1992
Received: from hplms2.hpl.hp.com by titan.cs.rice.edu (AA19266); Thu, 15 Oct 92 15:52:41 CDT
Received: from hpltfn.hpl.hp.com by hplms2.hpl.hp.com with SMTP
	(16.5/15.5+IOS 3.20) id AA27343; Thu, 15 Oct 92 13:52:30 -0700
Received: by hpltfn.hpl.hp.com
	(16.6/15.5+IOS 3.14) id AA13032; Thu, 15 Oct 92 13:52:11 -0700
Date: Thu, 15 Oct 92 13:52:11 -0700
From: Tin-Fook Ngai <ngai@hpltfn.hpl.hp.com>
Message-Id: <9210152052.AA13032@hpltfn.hpl.hp.com>
To: presberg@tc.cornell.edu
Cc: chk@cs.rice.edu, presberg@tc.cornell.edu, hpff-core@cs.rice.edu,
        hpff-forall@cs.rice.edu
In-Reply-To: David Presberg's message of Thu, 15 Oct 1992 15:57:54 -0400 <199210151957.AA03814@theory.TC.Cornell.EDU>
Subject: "Revised proposal on EXECUTE-ON..." by Ngai, versus that in "nugatory"


Pres and other HPF mail recipients,

I apologize that I have caused such confusions.  Chuck's 46 pages draft
uses my first proposal (originated on September 14).  That subsection
should now be removed and replaced by my revised proposal (revised on
October 14).  Since I have adopted a new format that is more in line with
the existing chapters, a line-to-line comparison (or 'diff') is nearly
useless.  The changes in the proposal are:

==========
Changes:
1. Change keyword from EXECUTE ON to EXECUTE ON_HOME 
2. Allow nested EXECUTE-ON-HOME directives.  Only the immediately preceding
   directive is effective.
3. EXECUTE-ON-HOME directives have effect only on the caller of a subroutine
   call not on the called subroutine.  
4. Rewrite Example 6 to conform with current HPFF proposal that does not
   support automatic wrap-around mapping

Format changes:
1. Removed the subsubsections
2. Move the constraint into the interpretation part
3. Move usage and scope right after the syntax
==========

I have also included here the updated draft (of 49 pages) for your reference.  

Hope this will make life easier.  


Tin-Fook

------------------------

%chapter-head.tex

%Version of August 5, 1992 - David Loveman, Digital Equipment Corporation

\documentstyle[twoside,11pt]{report}
\pagestyle{headings}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin=.25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin=-.5in
\textwidth=6.0in
\textheight=9.0in
\parindent=2em

%the file syntax-macs.tex is physically included below

%syntax-macs.tex

%Version of July 29, 1992 - Guy Steele, Thinking Machines

\newdimen\bnfalign         \bnfalign=2in
\newdimen\bnfopwidth       \bnfopwidth=.3in
\newdimen\bnfindent        \bnfindent=.2in
\newdimen\bnfsep           \bnfsep=6pt
\newdimen\bnfmargin        \bnfmargin=0.5in
\newdimen\codemargin       \codemargin=0.5in
\newdimen\intrinsicmargin  \intrinsicmargin=3em
\newdimen\casemargin       \casemargin=0.75in
\newdimen\argumentmargin   \argumentmargin=1.8in

\def\IT{\it}
\def\RM{\rm}
\let\CHAR=\char
\let\CATCODE=\catcode
\let\DEF=\def
\let\GLOBAL=\global
\let\RELAX=\relax
\let\BEGIN=\begin
\let\END=\end


\def\FUNNYCHARACTIVE{\CATCODE`\a=13 \CATCODE`\b=13 \CATCODE`\c=13 \CATCODE`\d=13
		     \CATCODE`\e=13 \CATCODE`\f=13 \CATCODE`\g=13 \CATCODE`\h=13
		     \CATCODE`\i=13 \CATCODE`\j=13 \CATCODE`\k=13 \CATCODE`\l=13
		     \CATCODE`\m=13 \CATCODE`\n=13 \CATCODE`\o=13 \CATCODE`\p=13
		     \CATCODE`\q=13 \CATCODE`\r=13 \CATCODE`\s=13 \CATCODE`\t=13
		     \CATCODE`\u=13 \CATCODE`\v=13 \CATCODE`\w=13 \CATCODE`\x=13
		     \CATCODE`\y=13 \CATCODE`\z=13 \CATCODE`\[=13 \CATCODE`\]=13
                     \CATCODE`\-=13}

\def\RETURNACTIVE{\CATCODE`\
=13}

\makeatletter
\def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus 
 -.2ex}{2.3ex plus .2ex}{\large\sf}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-3.25ex plus -1ex minus 
 -.2ex}{1.5ex plus .2ex}{\large\sf}}
\def\alternative#1 #2#3{\def\@tempa{#1}\def\@tempb{A}\ifx\@tempa\@tempb\else
    \expandafter\@altbumpdown\string#2\@foo\fi
    #2{Version #1: #3}}
\def\@altbumpdown#1#2\@foo{\global\expandafter\advance\csname c@#2\endcsname-1}

\def\@ifpar#1#2{\let\@tempe\par \def\@tempa{#1}\def\@tempb{#2}\futurelet
    \@tempc\@ifnch}

\def\?#1.{\begingroup\def\@tempq{#1}\list{}{\leftmargin\intrinsicmargin}\relax
  \item[]{\bf\@tempq.} \@intrinsictest}
\def\@intrinsictest{\@ifpar{\@intrinsicpar\@intrinsicdesc}{\@intrinsicpar\relax}}
\long\def\@intrinsicdesc#1{\list{}{\relax
  \def\@tempb{ Arguments}\ifx\@tempq\@tempb
			  \leftmargin\argumentmargin
			  \else \leftmargin\casemargin \fi
  \labelwidth\leftmargin  \advance\labelwidth -\labelsep
  \parsep 4pt plus 2pt minus 1pt
  \let\makelabel\@intrinsiclabel}#1\endlist}
\long\def\@intrinsicpar#1#2\\{#1{#2}\@ifstar{\@intrinsictest}{\endlist\endgroup}}
\def\@intrinsiclabel#1{\setbox0=\hbox{\rm #1}\ifnum\wd0>\labelwidth
  \box0 \else \hbox to \labelwidth{\box0\hfill}\fi}
\def\Case(#1):{\item[{\it Case (#1):}]}
\def\ {\@ifnextchar({\def\@tempq{#1}\@intrinsicopt}{\item[#1]}}
\def\@intrinsicopt(#1){\item[{\@tempq} (#1)]}

\def\MATRIX#1{\relax
    \@ifnextchar,{\@MATRIXTABS{}#1,\@FOO, \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar;{\@MATRIXTABS{}#1,\@FOO; \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar:{\@MATRIXTABS{}#1,\@FOO: \hskip0pt plus 1filll\penalty-1\@gobble
  }{\@ifnextchar.{\hfill\penalty1\null\penalty10000\hskip0pt plus 1filll
		  \@MATRIXTABS{}#1,\@FOO.\penalty-50\@gobble
  }{\@MATRIXTABS{}#1,\@FOO{ }\hskip0pt plus 1filll\penalty-1}}}}}

\def\@MATRIXTABS#1#2,{\@ifnextchar\@FOO{\@MATRIX{#1#2}}{\@MATRIXTABS{#1#2&}}}
\def\@MATRIX#1\@FOO{\(\left[\begin{array}{rrrrrrrrrr}#1\end{array}\right]\)}

\def\@IFSPACEORRETURNNEXT#1#2{\def\@tempa{#1}\def\@tempb{#2}\futurelet\@tempc\@ifspnx}

{
\FUNNYCHARACTIVE
\GLOBAL\DEF\FUNNYCHARDEF{\RELAX
    \DEFa{{\IT\CHAR"61}}\DEFb{{\IT\CHAR"62}}\DEFc{{\IT\CHAR"63}}\RELAX
    \DEFd{{\IT\CHAR"64}}\DEFe{{\IT\CHAR"65}}\DEFf{{\IT\CHAR"66}}\RELAX
    \DEFg{{\IT\CHAR"67}}\DEFh{{\IT\CHAR"68}}\DEFi{{\IT\CHAR"69}}\RELAX
    \DEFj{{\IT\CHAR"6A}}\DEFk{{\IT\CHAR"6B}}\DEFl{{\IT\CHAR"6C}}\RELAX
    \DEFm{{\IT\CHAR"6D}}\DEFn{{\IT\CHAR"6E}}\DEFo{{\IT\CHAR"6F}}\RELAX
    \DEFp{{\IT\CHAR"70}}\DEFq{{\IT\CHAR"71}}\DEFr{{\IT\CHAR"72}}\RELAX
    \DEFs{{\IT\CHAR"73}}\DEFt{{\IT\CHAR"74}}\DEFu{{\IT\CHAR"75}}\RELAX
    \DEFv{{\IT\CHAR"76}}\DEFw{{\IT\CHAR"77}}\DEFx{{\IT\CHAR"78}}\RELAX
    \DEFy{{\IT\CHAR"79}}\DEFz{{\IT\CHAR"7A}}\DEF[{{\RM\CHAR"5B}}\RELAX
    \DEF]{{\RM\CHAR"5D}}\DEF-{\@IFSPACEORRETURNNEXT{{\CHAR"2D}}{{\IT\CHAR"2D}}}}
}

%%% Warning!  Devious return-character machinations in the next several lines!
%%%           Don't even *breathe* on these macros!
{\RETURNACTIVE\global\def\RETURNDEF{\def
{\@ifnextchar\FNB{}{\@stopline\@ifnextchar
{\@NEWBNFRULE}{\penalty\@M\@startline\ignorespaces}}}}\global\def\@NEWBNFRULE
{\vskip\bnfsep\@startline\ignorespaces}\global\def\@ifspnx{\ifx\@tempc\@sptoken \let\@tempd\@tempa \else \ifx\@tempc
\let\@tempd\@tempa \else \let\@tempd\@tempb \fi\fi \@tempd}}
%%% End of bizarro return-character machinations.

\def\IS{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth  \hskip-\bnfalign
  \hbox to \bnfalign{\unhbox\@curfield\hfill}\hbox to \bnfopwidth{\bf is \hfill}}
\def\OR{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth \hbox to \bnfopwidth{\bf or \hfill}}
\def\R#1 {\hbox to 0pt{\hskip-\bnfmargin R#1\hfill}}
\def\XBNF{\FUNNYCHARDEF\FUNNYCHARACTIVE\RETURNDEF\RETURNACTIVE
  \def\@underbarchar{{\char"5F}}\tt\frenchspacing
  \advance\@totalleftmargin\bnfmargin \tabbing
  \hskip\bnfalign\hskip\bnfopwidth\hskip\bnfindent\=\kill\>\+\@gobblecr}
\def\endXBNF{\-\endtabbing}

\def\BNF{\BEGIN{XBNF}}
\def\FNB{\END{XBNF}}

\begingroup \catcode `|=0 \catcode`\\=12
|gdef|@XCODE#1\EDOC{#1|endtrivlist|end{tt}}
|endgroup

\def\CODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces \@XCODE}
\def\ICODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces
   \FUNNYCHARDEF\FUNNYCHARACTIVE \UNDERBARACTIVE\UNDERBARDEF \@XCODE}

\def\@underbarsub#1{{\ifmmode _{#1}\else {$_{#1}$}\fi}}
\let\@underbarchar\_
\def\@underbar{\let\@tempq\@underbarsub\if\@tempz A\let\@tempq\@underbarchar\fi
  \if\@tempz B\let\@tempq\@underbarchar\fi\if\@tempz C\let\@tempq\@underbarchar\fi
  \if\@tempz D\let\@tempq\@underbarchar\fi\if\@tempz E\let\@tempq\@underbarchar\fi
  \if\@tempz F\let\@tempq\@underbarchar\fi\if\@tempz G\let\@tempq\@underbarchar\fi
  \if\@tempz H\let\@tempq\@underbarchar\fi\if\@tempz I\let\@tempq\@underbarchar\fi
  \if\@tempz J\let\@tempq\@underbarchar\fi\if\@tempz K\let\@tempq\@underbarchar\fi
  \if\@tempz L\let\@tempq\@underbarchar\fi\if\@tempz M\let\@tempq\@underbarchar\fi
  \if\@tempz N\let\@tempq\@underbarchar\fi\if\@tempz O\let\@tempq\@underbarchar\fi
  \if\@tempz P\let\@tempq\@underbarchar\fi\if\@tempz Q\let\@tempq\@underbarchar\fi
  \if\@tempz R\let\@tempq\@underbarchar\fi\if\@tempz S\let\@tempq\@underbarchar\fi
  \if\@tempz T\let\@tempq\@underbarchar\fi\if\@tempz U\let\@tempq\@underbarchar\fi
  \if\@tempz V\let\@tempq\@underbarchar\fi\if\@tempz W\let\@tempq\@underbarchar\fi
  \if\@tempz X\let\@tempq\@underbarchar\fi\if\@tempz Y\let\@tempq\@underbarchar\fi
  \if\@tempz Z\let\@tempq\@underbarchar\fi\@tempq}
\def\@under{\futurelet\@tempz\@underbar}

\def\UNDERBARACTIVE{\CATCODE`\_=13}
\UNDERBARACTIVE
\def\UNDERBARDEF{\def_{\protect\@under}}
\UNDERBARDEF

\catcode`\$=11  

%the following line would allow derived-type component references 
%FOO%BAR in running text, but not allow LaTeX comments
%without this line, write FOO\%BAR
%\catcode`\%=11 

\makeatother

%end of file syntax-macs.tex


\title{{\em D R A F T} \\High Performance Fortran \\ FORALL Proposal}
\author{High Performance Fortran Forum}
\date{October 14, 1992}

\hyphenation{RE-DIS-TRIB-UT-ABLE sub-script Wil-liam-son}

\begin{document}

\maketitle

\newpage

\pagenumbering{roman}

\vspace*{4.5in}

This is the result of a LaTeX run of a draft of a single chapter of 
the HPFF Final Report document.

\vspace*{3.0in}

\copyright 1992 Rice University, Houston Texas.  Permission to copy 
without fee all or part of this material is granted, provided the 
Rice University copyright notice and the title of this document 
appear, and notice is given that copying is by permission of Rice 
University.

\tableofcontents

\newpage

\pagenumbering{arabic}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%put text of chapter here


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%put \end{document} here
%statements.tex


%Revision history:
%August 2, 1992 - Original version of David Loveman, Digital Equipment
%	Corporation and Charles Koelbel, Rice University
%August 19, 1992 - chk - cleaned up discrepancies with Fortran 90 array 
%	expressions
%August 20, 1992 - chk - added DO INDEPENDENT section, Guy Steele's 
%	pointer proposals
%August 24, 1992 - chk - ELEMENTAL functions proposal
%August 31, 1992 - chk - PURE functions proposal
%September 3, 1992 - chk - reorganized sections
%September 21, 1992 - chk - began incorporating updates from Sept
%	10-11 meeting
%October 14, 1992 - chk - Incorporated ON and revised PURE 


\newenvironment{constraints}{
        \begin{list}{Constraint:}{
                \settowidth{\labelwidth}{Constraint:}
                \settowidth{\labelsep}{w}
                \settowidth{\leftmargin}{Constraint:w}
                \setlength{\rightmargin}{0cm}
        }
}{
        \end{list}
}


\chapter{Statements}
\label{statements}

\section{Overview}

\footnote{Version of September 21, 1992 
- Charles Koelbel, Rice University.}
The purpose of the FORALL construct is to provide a convenient syntax for 
simultaneous assignments to large groups of array elements.
In this respect it is very similar to the functionality provided by array 
assignments and WHERE constructs.
FORALL differs from these constructs primarily in its syntax, which is 
intended to be more suggestive of local operations on each element of an 
array.
It is also possible to specify slightly more general array regions than 
are allowed by the basic array triplet notation.
Both single-statement and block FORALLs are defined in this proposal.

The FORALL statement, in both its single-statment and block forms, was
accepted by the High Performance Fortran Forum working group on its
second reading September 10, 1992.
This vote was contingent on a more complete definition of PURE
functions.
The idea of PURE functions was accepted by the HPFF working group at
its first reading on September 10, 1992.
However, the definition at that time was not completely acceptable due to
technical errors; those errors discussed at that time have been
revised in this draft.
The single-statement form of FORALL was accepted by the HPFF working
group as part of the official HPF subset in a first reading on
September 11, 1992; the block FORALL was excluded from the subset at
the same time.

The purpose of the INDEPENDENT directive is to allow the programmer to
give additional information to the compiler.
The user can assert that no data object is defined by one iteration of
a loop and used (read or written) by another; similar information can
be provided about the combinations of index values in a FORALL
statement.
A compiler may rely on this information to make optimizations, such as
parallelization or reorganizing communication.
If the assertion is true, the semantics of the program are not
changed; if it is false, the program is not standard-conforming and
has no defined meaning.
The ``Other Proposals'' section contains a number of additional
assertions with this flavor.

The INDEPENDENT assertion was accepted by the High Performance Fortran
Forum working group on its second reading on September 10, 1992.
The group also directed the FORALL subgroup to further explore methods for
allowing reduction operations to be accomplished in INDEPENDENT loops.

The following proposals are designed as a modification of the Fortran 90 
standard; all references to rule numbers and section numbers pertain to 
that document unless otherwise noted.


\section{Element Array Assignment - FORALL}
 

\label{forall-stmt}

\footnote{Version of September 21, 1992 - David
Loveman, Digital Equipment Corporation and Charles Koelbel, Rice
University.
Approved at second reading on September 10, 1992.}
The element array
assignment statement (FORALL statement) is used to specify an array
assignment in terms of array elements or groups of array sections.
The element array assignment may be
masked with a scalar logical expression.  
In functionality, it is similar to array assignment statements;
however, more general array sections can be assigned in FORALL.

Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsection{General Form of Element Array Assignment}

                                                                       \BNF
forall-stmt          \IS FORALL (forall-triplet-spec-list
                       [,scalar-mask-expr ]) forall-assignment

forall-triplet-spec  \IS subscript-name = subscript : subscript 
                          [ : stride]
                                                                       \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

                                                                       \BNF
forall-assignment    \IS array-element = expr
                     \OR array-element => target
                     \OR array-section = expr
                                                                       \FNB

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: In the cases of simple assignment, the {\it array-element} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent
Constraint: In the case of pointer assignment, the {\it array-element} 
and {\it target} have the same constraints as the {\it pointer-object} 
and {\it target}, respectively, in a {\it pointer-assignment-stmt}.

\noindent
Constraint: In the cases of array section assignment, the {\it 
array-section} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent Constraint: If any subexpression in {\it expr}, {\it 
array-element}, or {\it array-section} is a {\it function-reference}, 
then the {\it function-name} must be a ``pure'' function as defined in
Section~\ref{forall-pure}.


For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
1}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
forall-assignment} is not executed.

A ``pure'' function is defined in Section~\ref{forall-pure}; the
intuition is that a pure function cannot have side effects.
The PURE declaration places syntactic constraints on the function to
ensure this.

Examples of element array assignments are:

                                                                  \CODE
REAL H(N,N), X(N,N), Y(N,N)
TYPE MONARCH
    INTEGER, POINTER :: P
END TYPE MONARCH
TYPE(MONARCH) :: A(N)
INTEGER B(N)
      ...
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, Y(I,J) .NE. 0.0) X(I,J) = 1.0 / Y(I,J)

! Set up a butterfly pattern
FORALL (J=1:N)  A(J)%P => B(1+IEOR(J-1,2**K))
                                                                  \EDOC 

\subsection{Interpretation of Element Array Assignments}  

Execution of an element array assignment consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Evaluation in any order of the {\it expr} or {\it target} and all 
subscripts contained in the 
{\it array-element} or {\it array-section} in the {\it forall-assignment} 
for all active combinations of {\em subscript-name} values.
In the case of pointer assignment where the {\it target} is not a 
pointer, the evaluation consists of identifying the object referenced 
rather than computing its value.

\item Assignment of the computed {\it expr} values to the corresponding 
elements specified by {\it array-element} or {\it array-section}.
The assignments may be made in any order.
In the case of a pointer assignment where the {\it target} is not a 
pointer, this assignment consists of associating the {\it array-element} 
with the object referenced.

\end{enumerate}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 


The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.

Note that if a function called in a FORALL is declared PURE, then it 
is impossible for that function's evaluation to affect other expressions' 
evaluations, either for the same combination of 
{\it subscript-name} values or for a different combination.
In addition, it is possible that the compiler can perform 
more extensive optimizations when all functions are declared PURE.


\subsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

                                                                   \CODE
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn , mask ) &
      a(e1,...,em) = rhs
                                                                   \EDOC

\noindent
is equivalent to the following standard Fortran 90 code:

\raggedbottom
                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
	  ...
  END DO
END DO

!then evaluate the expr in the forall-assignment for all valid 
!combinations of subscript names for which the scalar mask 
!expression is true (it is safe to avoid saving the subscript 
!expressions because of the conditions on FORALL expressions)

DO v1=templ1,tempu1,temps1
  DO v2=tel2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
	  ...
  END DO
END DO

!then perform the assignment of these values to the corresponding 
!elements of the array being assigned to

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(e1,...,em) = temprhs(v1,v2,...,vn)
        END IF
      END DO
	  ...
  END DO
END DO
                                                                      \EDOC
\flushbottom

\subsection{Consequences of the Definition of the FORALL Statement}

This section should be moved to the comments chapter in the final
draft.

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  
(This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Right-hand sides and subscripts on the left hand side of a {\it 
forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\item The intent of ``pure'' functions is to provide a class of
functions without side-effects, and to allow this side-effect freedom
to be checked syntactically.

\end{itemize}


\section{FORALL Construct}

\label{forall-construct}

\footnote{Version of August 20, 1992 -
David Loveman, Digital Equipment Corporation and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992.}
The FORALL construct is a generalization of the element array
assignment statement allowing multiple assignments, masked array 
assignments, and nested FORALL statements to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it
forall-construct}.

\subsection{General Form of the FORALL Construct}

                                                                    \BNF
forall-construct   \IS FORALL (forall-triplet-spec-list [,scalar-mask-expr ])
                               forall-body-stmt-list
                            END FORALL

forall-body-stmt     \IS forall-assignment
                     \OR where-stmt
                     \OR forall-stmt
                     \OR forall-construct
                                                                    \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\noindent
Constraint:  Any left-hand side {\it array-section} or {\it 
array-element} in any {\it forall-body-stmt}
must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
within a {\it forall-construct}, then the inner FORALL may not redefine 
any {\it subscript-name} used in the outer {\it forall-construct}.
This rule applies recursively in the event of multiple nesting levels.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
1)}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
forall-assignment}s are not  executed.

Examples of the FORALL construct are:

                                                                 \CODE
FORALL ( I = 2:N-1, J = 2:N-1 )
  A(I,J) = A(I,J-1) + A(I,J+1) + A(I-1,J) + A(I+1,J)
  B(I,J) = A(I,J)
END FORALL

FORALL ( I = 1:N-1 )
  FORALL ( J = I+1:N )
    A(I,J) = A(J,I)
  END FORALL
END FORALL

FORALL ( I = 1:N, J = 1:N )
  A(I,J) = MERGE( A(I,J), A(I,J)**2, I.EQ.J )
  WHERE ( .NOT. DONE(I,J,1:M) )
    B(I,J,1:M) = B(I,J,1:M)*X
  END WHERE
END FORALL
                                                                \EDOC


\subsection{Interpretation of the FORALL Construct}

Execution of a FORALL construct consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Execute the {\it forall-body-stmts} in the order they appear.
Each statement is executed completely (that is, for all active 
combinations of {\it subscript-name} values) according to the following 
interpretation:

\begin{enumerate}

\item Assignment statements, pointer assignment statements, and array
assignment statements (i.e.
statements in the {\it forall-assignment} category) evaluate the 
right-hand side {\it expr} and any left-and side subscripts for all 
active {\it subscript-name} values,
then assign those results to the corresponding left-hand side references.

\item WHERE statements evaluate their {\it mask-expr} for all active 
combinations of values of {\it subscript-name}s.
All elements of all masks may be evaluated in any order. 
The assignments within the WHERE branch of the statement are then 
executed in order using the above interpretation of array assignments 
within the FORALL, but the only array elements assigned are those 
selected by both the active {\it subscript-names} and the WHERE mask.
Finally, the assignments in the ELSEWHERE branch are executed if that 
branch is present.
The assignments here are also treated as array assignments, but elements 
are only assigned if they are selected by both the active combinations 
and by the negation of the WHERE mask.

\item FORALL statements and FORALL constructs first evaluate the 
subscript and stride expressions in 
the {\it forall-triplet-spec-list} for all active combinations of the 
outer FORALL constructs.
The set of valid combinations of {\it subscript-names} for the inner 
FORALL is then the union of the sets defined by these bounds and strides 
for each active combination of the outer {\it subscript-names}.
For example, the valid set of the inner FORALL in the second example in 
the last section is the upper triangle (not including the main diagonal) 
of the \(n \times n\) matrix a.
The scalar mask expression is then evaluated for all valid combinations 
of the inner FORALL's {\it subscript-names} to produce the set of active 
combinations.
If there is no scalar mask expression, it is assumed to be always true.
Each statement in the inner FORALL is then executed for each valid 
combination (of the inner FORALL), recursively following the 
interpretations given in this section.

\end{enumerate}

\end{enumerate}

If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL construct itself. 

A single assignment or array assignment statement in a {\it 
forall-construct} must obey the same restrictions as a {\it 
forall-assignment} in a simple {\it forall-stmt}.
(Note that the lowest level of nested statements must always be an 
assignment statement.)
For example, an assignment may not cause the same array element to be 
assigned more than once.
Different statements may, however, assign to the 
same array element, and assignments made in one
statement may affect the execution of a later statement.

\subsection{Scalarization of the FORALL Construct}

A {\it forall-construct} othe form:

                                                                \CODE
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
                                                                \EDOC

where each si is an assignment is equivalent to the following scalar code:

                                                                \CODE
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
                                                                \EDOC

A similar statement can be made using FORALL constructs when the 
si may be WHERE or FORALL constructs.

A {\it forall-construct} of the form:

                                                                \CODE
FORALL ( v1=l1:u1:s1, mask )
  WHERE ( mask2(l2:u2) )
    a(vi,l2:u2) = rhs1
  ELSEWHERE
    a(vi,l2:u2) = rhs2
  END WHERE
END FORALL
                                                                \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the masks for the WHERE

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    tmpl2(v1) = l2
    tmpu2(v1) = u2
    tempmask2(v1,tmpl2(v1):tmpu2(v1)) = mask2(tmpl2(v1):tmpu2(v1))
  END IF
END DO

!then evaluate the WHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs1(v1,tmpl2(v1):tmpu2(v1)) = rhs1
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs1(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO

!then evaluate the ELSEWHERE branch

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      temprhs2(v1,tmpl2(v1):tmpu2(v1)) = rhs2
    END WHERE
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(v1,tmpl2(v1):tmpu2(v1)) )
      a(v1,tmpl2(v1):tmpu2(v1)) = temprhs2(v1,tmpl2(v1):tmpu2(v1))
    END WHERE
  END IF
END DO
                                                                   \EDOC


A {\it forall-construct} of the form:

                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  FORALL ( v2=l2:u2:s2, mask2 )
    a(e1,e2) = rhs1
	b(e3,e4) = rhs2
  END FORALL
END FORALL
                                                                  \EDOC

is equivalent to the following standard Fortran 90 code:


                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the inner FORALL bounds, etc

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    templ2(v1) = l2
    tempu2(v1) = u2
    temps2(v1) = s2
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      tempmask2(v1,v2) = mask2
    END DO
  END IF
END DO

!first statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs1(v1,v2) = rhs1
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        a(e1,e2) = temprhs1(v1,v2)
      END IF
    END DO
  END IF
END DO

!second statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs2(v1,v2) = rhs2
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        b(e3,e4) = temprhs2(v1,v2)
      END IF
    END DO
  END IF
END DO
                                                                   \EDOC


\subsection{Consequences of the Definition of the FORALL Construct}

This section should be moved to the comments chapter of the final
draft.

\begin{itemize}

\item A block FORALL means roughly the same as replicating the FORALL
header in front of each array assignment statement in the block, except
that any expressions in the FORALL header are evaluated only once,
rather than being re-evaluated before each of the statements in the body.
(The exceptions to this rule are nested FORALL statements and WHERE
statements, which introduce syntactic and functional complications
into the copying.)

\item One may think of a block FORALL as synchronizing twice per
contained assignment statement: once after handling the rhs and other 
expressions
but before performing assignments, and once after all assignments have
been performed but before commencing the next statement.  (In practice,
appropriate dependence analysis will often permit the compiler to
eliminate unnecessary synchronizations.)

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item In general, any expression in a FORALL is evaluated only for valid 
combinations of all surrounding subscript names for which all the
scalar mask eressions are true.

\item Nested FORALL bounds and strides can depend on outer FORALL {\it 
subscript-names}.  They cannot redefine those names, even temporarily (if 
they did there  would be no way to avoid multiple assignments to the same 
array element).

\item Dependences are allowed from one statement to later statements, but 
never from an assignment statement to itself.

\end{itemize}


\section{Pure Procedures and Elemental Reference}

\label{forall-pure}

\footnote{Version of October 14, 
1992 - John Merlin, University of Southampton, 
and Charles Koelbel, Rice University. Approved at first reading on
September 10, 1992, subject to technical revisions for correctness.
The suggestions made there have been incorporated in this draft.}
A {\it pure function\/} is one that produces no side effects.  This
means that the only effect of a pure function reference on the state 
of a program is to return a result---it does not modify the values, 
pointer associations or data mapping of any of its arguments or global 
data, and performs no I/O.
A {\em pure subroutine\/} is one that produces no side effects
except for modifying the values and/or pointer associations of certain
arguments.  

A pure procedure (i.e.\ function or subroutine) may be used in any way 
that a normal procedure can.
In addition, a procedure is required to be pure if it is used in any 
of the following contexts:
\begin{itemize}
        \item a FORALL statement or construct;
        \item an elemental reference (see section \ref{elem-ref-of-pure-procs});
        \item within the body of a pure procedure;
        \item as an actual argument in a pure procedure reference.
\end{itemize}

The side-effect freedom of a pure function ensures that it can be invoked
concurrently in a FORALL or elemental reference without undesirable
consequences such as non-determinism, and additionally assists the efficient
implementation of concurrent execution.  A pure subroutine can be
invoked concurrently in an elemental reference, and since its side effects
are limited to a known subset of its arguments (as we shall see later), 
an implementation can check that a reference obeys Fortran~90's restrictions 
on argument association and is consequently deterministic.


\subsection{Pure procedure declaration and interface}

If a non-intrinsic procedure is used in a context that requires it to be 
pure, then its interface must be explicit in the scope of that use, 
and both its interface body (if provided) and its definition must contain 
the PURE declaration.  The form of this declaration is 
a directive immediately after the {\it function-stmt\/} or {\it
subroutine-stmt\/} of the procedure interface body or definition:
                                                                 \BNF
pure-directive \IS !HPF$ PURE [procedure-name]
                                                                 \FNB

Intrinsic functions, including HPF intrinsic functions, are always pure 
and require no explicit declaration of this fact;  intrinsic subroutines 
are pure if they are elemental (e.g.\ MVBITS) but not otherwise.
A statement function is pure if and only if all functions that it
references are pure.

\subsubsection{Pure function definition}

To define pure functions, Rule~R1215 of the Fortran~90 standard is changed 
to:
                                                                 \BNF
function-subprogram \IS         function-stmt
                                [pure-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-function-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.2 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it pure-directive\/}, it must match the {\it function-name\/} in the 
{\it function-stmt\/}.

        \item In a pure function, a local variable must not have the 
SAVE attribute. (Note that this means that a local variable cannot be 
initialised in a {\it type-declaration-stmt\/} or a
{\it data-stmt\/}, which imply the SAVE attribute.)

        \item A pure function must not use a dummy argument, a global 
variable, or an object that is storage associated with a global variable,
or a subobject thereof, in the following contexts:
        \begin{itemize}
                \item as the assignment variable of an {\it assignment-stmt\/}
or {\it forall-assignment\/};
                \item as a DO variable or implied DO variable, or as a 
{\it subscript-name\/} in a {\it forall-triplet-spec\/};
                \item in an {\it assign-stmt\/};
                \item as the {\it pointer-object\/} or {\it target\/}
of a {\it pointer-assignment-stmt\/};
                \item as the {\it expr\/} of an {\it assignment-stmt\/}
or {\it forall-assignment\/} whose assignment variable is of a derived 
type, or is a pointer to a derived type, that has a pointer component 
at any level of component selection;
                \item as an {\it allocate-object\/} or {\it stat-variable\/}
in an {\it allocate-stmt\/} or {\it deallocate-stmt\/}, or as a
{\it pointer-object\/} in a {\it nullify-stmt\/};
                \item as an actual argument associated with a dummy 
argument with INTENT (OUT) or (INOUT) or with the POINTER attribute.
        \end{itemize}

        \item Any procedure referenced in a pure function, including
one referenced via a defined operation or assignment, must be pure.

        \item In a pure function, a dummy argument or local variable 
must not appear in an {\it align-directive\/}, {\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a pure function, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A pure function must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a function is pure, a {\it pure-directive\/} must be given.

The above constraints are designed to guarantee that a pure function
is free from side effects (i.e.\ modifications of data visible outside
the function), which means that it is safe to reference concurrently, 
as explained earlier.

The second constraint ensures that a pure function does not retain
an internal state between calls, which would allow side-effects between 
calls to the same procedure.

The third constraint ensures that dummy arguments and global variables
are not modified by the function.
In the case of a dummy or global pointer, this applies to both its 
pointer association and its target value, so it cannot be subject to 
a pointer assignment or to an ALLOCATE, DEALLOCATE or NULLIFY
statement.
Incidentally, these constraints imply that only local variables and the
dummy result variable can be subject to assignment or pointer assignment.

In addition, a dummy or global data object cannot be the {\it target\/}
of a pointer assignment (i.e.\ it cannot be used as the right hand side
of a pointer assignment to a local pointer or to the result variable), 
for then its value could be modified via the pointer.

In connection with the last point, it should be noted that an ordinary 
(as opposed to pointer) assignment to a variable of derived type that has 
a pointer component at any level of component selection may result in a 
{\em pointer\/} assignment to the pointer component of the variable.
That is certainly the case for an intrinsic assignment.  In that case
the expression on the right hand side of the assignment has the same type 
as the assignment variable, and the assignment results in a pointer 
assignment of the pointer components of the expression result to the
corresponding components of the variable (see section 7.5.1.5 of the 
Fortran~90 standard).  However, it may also be the case for a 
{\em defined\/} assignment to such a variable, even if the data type of 
the expression has no pointer components;  the defined assignment may still 
involve pointer assignment of part or all of the expression result to the 
pointer components of the assignment variable.  Therefore, a dummy or 
global object cannot be used as the right hand side of any assignment to 
a variable of derived type with pointer components, for then it, or part 
of it, might be the target of a pointer assignment, in violation of the 
restriction mentioned above.

(Incidentally, the last two paragraphs only prevent the reference of 
a dummy or global object as the {\em only\/} object on the right hand
side of a pointer assignment or an assignment to a variable with pointer
components.  There are no constraints on its reference as an operand, 
actual argument, subscript expression, etc.\ in these circumstances).

Finally, a dummy or global data object cannot be used in a procedure 
reference as an actual argument associated with a dummy argument of
INTENT (OUT) or (INOUT) or with a dummy pointer, for then it may be
modified by the procedure reference.  
This constraint, like the others, can be statically checked, since any
procedure referenced within a pure function must be either a pure 
function, which does not modify its arguments, or a pure subroutine, 
whose interface must specify the INTENT or POINTER attributes of its 
arguments (see below).
Incidentally, notice that in this context it is assumed that an actual 
argument associated with a dummy pointer is modified, since Fortran~90 
does not allow its intent to be specified.

Constraint 4 ensures that all procedures called from a pure function 
are themselves pure and hence side effect free, except, in the case of
subroutines, for modifying actual arguments associated with dummy pointers 
or dummy arguments with INTENT(OUT) or (INOUT).  As we have just 
explained, it can be checked that global or dummy objects are not used
in such arguments, which would violate the required side-effect freedom.

Constraints 5 and 6 protect dummy and global data objects from realignment 
and redistribution (another type of side effect).  
In addition, constraint 5 prevents explicit declaration of the mapping 
(i.e.\ alignment and distribution) of dummy arguments and local variables.  
This is because the function may be invoked concurrently, with each 
invocation operating on a segment of data whose distribution is specific 
to that invocation.  Thus, the distribution of a dummy object must be 
`assumed' from the corresponding actual argument.  
Also, it is left to the implementation to determine a suitable mapping 
of the local variables, which would typically depend on the mapping of 
the dummy arguments.

Constraint 7 prevents I/O, whose order would be non-deterministic in 
the context of concurrent execution.  A PAUSE statement requires input
and so is disallowed for the same reason.


\subsubsection{Pure subroutine definition}

To define pure subroutines, Rule~R1219 is changed to:
                                                                 \BNF
subroutine-subprogram \IS       subroutine-stmt
                                [pure-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-subroutine-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.3 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it pure-directive\/}, it must match the {\it sub\-rou\-tine-name\/} in the 
{\it subroutine-stmt\/}.

        \item The {\it specification-part\/} of a pure subroutine must 
specify the intents of all non-pointer and non-procedure dummy arguments.

        \item In a pure subroutine, a local variable must not have the 
SAVE attribute. (Note that this means they cannot be initialised in a 
{\it type-declaration-stmt\/} or a {\it data-stmt\/}.)

        \item A pure subroutine must not use a dummy parameter with 
        INTENT(IN), a global variable, or an 
object that is storage associated with a global variable, or a subobject 
thereof, in the following contexts:
        \begin{itemize}
                \item as the assignment variable of an {\it assignment-stmt\/}
or {\it forall-assignment\/};
                \item as a DO variable or implied DO variable, or as a 
{\it subscript-name\/} in a {\it forall-triplet-spec\/};
                \item in an {\it assign-stmt\/};
                \item as the {\it pointer-object\/} or {\it target\/}
of a {\it pointer-assignment-stmt\/};
                \item as the {\it expr\/} of an {\it assignment-stmt\/}
or {\it forall-assignment\/} whose assignment variable is of a derived 
type, or is a pointer to a derived type, that has a pointer component 
at any level of component selection;
                \item as an {\it allocate-object\/} or {\it stat-variable\/}
in an {\it allocate-stmt\/} or {\it deallocate-stmt\/}, or as a
{\it pointer-object\/} in a {\it nullify-stmt\/};
                \item as an actual argument associated with a dummy 
argument with INTENT (OUT) or (INOUT) or with the POINTER attribute.
        \end{itemize}

        \item Any procedure referenced in a pure subroutine, including
one referenced via a defined operation or assignment, must be pure.

        \item In a pure subroutine, a dummy argument or local variable 
must not appear in an {\it align-directive\/}, {\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a pure subroutine, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A pure subroutine must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a subroutine is pure, a {\it pure-directive\/} must be
given.

The constraints for pure subroutines are based on the same principles 
as for pure functions, except that now side effects to dummy arguments 
are permitted.  


\subsubsection{Pure procedure interfaces}
\label{pure-proc-interface}

To define interface specifications for pure procedures, Rule~R1204 is 
changed to:
                                                                \BNF
interface-body \IS      function-stmt
                        [pure-directive]
                        [specification-part]
                        end-function-stmt
                \OR     subroutine-stmt
                        [pure-directive]
                        [specification-part]
                        end-subroutine-stmt
                                                                \FNB
with the following constraint in addition to those in
Section~12.3.2.1 of the Fortran~90 standard:
\begin{constraints}

        \item An {\it interface-body\/} of a pure subroutine must specify
the intents of all non-pointer and non-procedure dummy arguments.

\end{constraints}

The procedure characteristics defined by an interface body must be
consistent with the procedure's definition.
Regarding pure procedures, this is interpreted as follows:
\begin{enumerate}
        \item A procedure that is declared pure at its definition may be
declared pure in an interface block, but this is not required.
        \item A procedure that is not declared pure at its definition must 
not be declared pure in an interface block.
\end{enumerate}
That is, if an interface body contains a {\it pure-directive\/}, then the 
corresponding procedure definition must also contain it, though the 
reverse is not true.
When a procedure definition with a {\it pure-directive\/}
is compiled, the compiler may check that it satisfies the necessary 
constraints.


\subsection{Pure procedure reference}
To define pure procedure references, the following extra constraint is 
added to Section~12.4.1 of the Fortran~90 standard:
\begin{constraints}

        \item In a reference to a pure procedure, a {\it procedure-name\/} 
{\it actual-arg\/} must be the name of a pure procedure.

\end{constraints}


\subsection{Elemental reference of pure procedures}
\label{elem-ref-of-pure-procs}

Fortran 90 introduces the concept of `elemental procedures', which are 
defined for scalar arguments but may also be applied to conforming 
array-valued arguments.  The latter type of reference to an elemental 
procedure is called an `elemental' reference.    For an elemental function, 
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g.\ SIN(X).  For an elemental 
subroutine, the effect on each element of an INTENT(OUT) or INTENT(INOUT) 
array argument is as would be obtained by calling the subroutine with 
the corresponding elements of the arguments.  An example is the intrinsic 
subroutine MVBITS.

However, Fortran~90 restricts elemental reference to a subset of 
the intrinsic procedures --- programmers cannot define their own 
elemental procedures.  Obviously, elemental invocation is equivalent 
to concurrent invocation, so extra constraints beyond those for normal 
Fortran procedures are required to allow this to be done safely
(e.g.\ deterministically).  Appropriate constraints in this case are
the same as for function calls in FORALL;  indeed, the latter are 
virtually equivalent to elemental reference of the function in an 
array assignment, given the close correspondence between FORALL and 
array assignment.  Hence, pure procedures may also be referenced 
elementally, subject to certain additional constraints given below.

\subsubsection{Elemental reference of pure functions}

A non-intrinsic pure function may be referenced {\em elementally\/} 
in array expressions, with a similar interpretation to the elemental
reference of Fortran~90 elemental intrinsic functions, provided it
satisfies the additional constraints that:
\begin{enumerate}
        \item Its non-procedure dummy arguments and dummy result are 
scalar and do not have the POINTER attribute.
        \item The length of any character dummy argument or result is 
independent of argument values (though it may be assumed, or depend on the 
lengths of other character arguments and/or a character result).
\end{enumerate}
We call non-intrinsic pure functions that satisfy these constraints 
`elemental non-intrinsic functions'.

The interpretation of an elemental reference of such a function is as 
follows (adapted from Section 12.4.3 of the Fortran~90 standard):
\begin{quotation}

A reference to an elemental non-intrinsic function is an elemental
reference if one or more non-procedure actual arguments are arrays
and all array arguments have the same shape.  If any actual argument 
is a function, its result must have the same shape as that of the 
corresponding function dummy procedure.  A reference to an elemental 
intrinsic function is an elemental reference if one or more actual 
arguments are arrays and all arrays have the same shape.

The result of such a reference has the same shape as the array arguments,
and the value of each element of the result, if any, is obtained by 
evaluating the function using the scalar and procedure arguments and
the corresponding elements of the array arguments.  The elements of
the result may be evaluated in any order.

For example, if \verb@foo@ is a pure function with the following interface:
                                                \CODE
    INTERFACE
      REAL FUNCTION foo (x, y, z, dummy_func)
        !HPF$ PURE foo
        REAL, INTENT(IN) :: x, y, z
        INTERFACE        ! interface for 'dummy_func'
          REAL FUNCTION dummy_func (x)
            !HPF$ PURE dummy_func
            REAL, INTENT(IN) :: x
          END FUNCTION dummy_func
        END INTERFACE
      END FUNCTION foo
    END INTERFACE
                                                \EDOC
and \verb@a@ and \verb@b@ are arrays of shape \verb@(m,n)@ and \verb@sin@
is the Fortran~90 elemental intrinsic function, then:
                                                \CODE
    foo (a, 0.0, b, sin)
                                                \EDOC
is an array expression of shape \verb@(m,n)@ whose \verb@(i,j)@ element
has the value:
                                                \CODE
    foo (a(i,j), 0.0, b(i,j), sin)
                                                \EDOC
\end{quotation}

To define elemental references of elemental non-intrinsic functions, 
the following extra constraints are added after Rule~R1209 
({\it function-reference\/}):
\begin{constraints}

        \item A non-intrinsic function that is referenced elementally 
must be a pure function with an explicit interface, and must satisfy 
the following additional constraints:
        \begin{itemize}
                \item Its non-procedure dummy arguments and dummy result
must be scalar and must not have the POINTER attribute.
                \item The length of any character dummy argument or a 
character dummy result must not depend on argument values (though it may 
be assumed, or depend on the lengths of other character arguments and/or a 
character result).
        \end{itemize}

        \item In an elemental reference of a non-intrinsic function,
a {\it function-name\/} {\it actual-arg\/} must have a result whose shape 
agrees with that of the corresponding function dummy procedure.

\end{constraints}

The reasons for these constraints are explained in the next section.


\subsubsection{Elemental reference of pure subroutines}

A non-intrinsic pure subroutine may be referenced {\em elementally\/}, 
with a similar interpretation to the elemental reference of Fortran~90 
elemental intrinsic subroutines, provided it satisfies the additional 
constraints that:
\begin{enumerate}
        \item Its non-procedure dummy arguments are scalar and do not 
have the POINTER attribute.
        \item The length of any character dummy argument is independent 
of argument values (though it may be assumed, or depend on the lengths of 
other character arguments).
\end{enumerate}
We call non-intrinsic pure subroutines that satisfy these constraints 
`elemental non-intrinsic subroutines'.

The interpretation of an elemental reference of such a subroutine 
is as follows (adapted from Section 12.4.5 of the Fortran~90 standard):
\begin{quotation}

A reference to an elemental non-intrinsic subroutine is an elemental
reference if all actual arguments corresponding to INTENT(OUT) and
INTENT(INOUT) dummy arguments are arrays that have the same shape 
and the remaining non-procedure actual arguments are conformable with 
them.  If any actual argument is a function, its result must have the 
same shape as that of the corresponding function dummy procedure.
A reference to an elemental intrinsic subroutine is an elemental 
reference if all actual arguments corresponding to INTENT(OUT) and 
(INTENT(INOUT) dummy arguments are arrays that have the same shape and 
the remaining actual arguments are conformable with them.

The values of the elements of the arrays that correspond to INTENT(OUT)
and INTENT(INOUT) dummy arguments are the same as if the subroutine were 
invoked separately, in any order, using the scalar and procedure arguments 
and corresponding elements of the array arguments.

\end{quotation}

To define elemental references of elemental non-intrinsic subroutines, 
the following constraints are added after Rule~R1210 ({\it call-stmt\/}):
\begin{constraints}

        \item A non-intrinsic subroutine that is referenced elementally 
must be a pure subroutine with an explicit interface, and must satisfy 
the following additional constraints:
        \begin{itemize}
                \item Its non-procedure dummy arguments must be scalar 
and must not have the POINTER attribute.
                \item The length of any character dummy argument must 
not depend on argument values (though it may be assumed, or depend on 
the lengths of other character arguments).
        \end{itemize}

        \item In an elemental reference of a non-intrinsic subroutine,
a {\it function-name\/} {\it actual-arg\/} must have a result whose shape 
agrees with that of the corresponding function dummy procedure.

\end{constraints}

It is perhaps worth outlining the reasons for the extra constraints 
imposed on pure procedures in order for them to be referenced elementally.  

The dummy result of a function or `output' arguments of a subroutine
are not allowed to have the POINTER attribute because of a Fortran~90
technicality, namely, that under elemental reference the corresponding 
actual arguments must be array variables, and Fortran~90 does not permit 
an array of pointers to be referenced.\footnote{
        See the final constraint after Rule~R613 of the Fortran~90 standard.
Note the difference between an {\em array of pointers\/}, which cannot 
be declared or referenced in Fortran~90, and a {\em pointer array\/},
which can.
}
The `input' arguments of an elemental reference are prohibited from 
having the POINTER attribute for consistency with the output arguments 
or result.  However, this last constraint does not impose 
any real restrictions on an elemental reference, as the corresponding 
actual arguments {\em can\/} be pointers, in which case they are 
`de-referenced' and their targets are associated with the dummy arguments.  
In fact, the only reason for a dummy argument to be a pointer is so that
its pointer association can be changed, which is not allowed for `input'
arguments.  (Incidentally, since a pure function has only `input' 
arguments, there would be no loss of generality in disallowing dummy 
pointers in pure functions generally.)  Note that the prohibition of 
dummy pointers in pure subroutines that are elementally referenced means 
that all their non-procedure dummy arguments can have their intent 
explicitly specified (and indeed this is required by the constraints for 
pure subroutine interfaces---see Section \ref{pure-proc-interface}) which 
assists the checking of argument usage.

In an elemental reference, any actual argument that is a function
must have a result whose shape agrees with that of the corresponding 
function dummy procedure.  That is, elemental usage does not extend to 
function arguments, as Fortran~90 does not support the concept of an `array' 
of functions.
Naively it might appear that a function actual argument that is associated 
with a scalar dummy function could return an array result provided it 
conforms with the other array arguments of the elemental reference.  
However, this is not meaningful under elemental reference, as an 
array-valued function cannot be decomposed into an `array' of scalar 
function references, as would be required in this context.

Finally, the length of any character dummy argument or a character
dummy result cannot depend on argument {\em values\/} (though it can
be assumed, or depend on the lengths of other character arguments and/or
a character result).  This ensures that under elemental reference, all 
elements of an array argument or result of character type will have the 
same length, as required by Fortran~90.


\subsection{Examples of pure procedure usage}

\subsubsection{FORALL statements and constructs}

Pure functions may be used in expressions in FORALL statements and 
constructs, unlike general functions.  
Because a {\it forall-assignment}
may be an {\it array-assignment} the pure function can have an array
result.  
For example:
                                                              \CODE
INTERFACE
  FUNCTION f (x)
    !HPF$ PURE f
    REAL, DIMENSION(3) :: f, x
  END FUNCTION f
END INTERFACE
REAL  v (3,10,10)
...
FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 
                                                              \EDOC


\subsubsection{Elemental references}
Examples of elemental function usage are
                                                              \CODE
INTERFACE 
  REAL FUNCTION foo (x, y, z)
    !HPF$ PURE foo
    REAL, INTENT(IN) :: x, y, z
  END FUNCTION foo
END INTERFACE

REAL a(100), b(100), c(100)
REAL p, q, r

a(1:n) = foo (a(1:n), b(1:n), c(1:n))
a(1:n) = foo (a(1:n), q, r)
a = sin(b)
                                                              \EDOC
An example involving a WHERE-ELSEWHERE construct is
                                                              \CODE
INTERFACE
  REAL FUNCTION f_egde (x)
    !HPF$ PURE
    REAL x
  END FUNCTION f_edge
  REAL FUNCTION f_interior (x)
    !HPF$ PURE
    REAL x
  END FUNCTION f_interior
END INTERFACE

REAL a (10,10)
LOGICAL edges (10,10)

WHERE (edges)
  a = f_egde (a)
ELSE WHERE
  a = f_interior (a)
END WHERE
                                                          \EDOC

Examples of elemental subroutine usage are
                                                                \CODE
INTERFACE 
  SUBROUTINE solve_simul(tol, y, z)
    !HPF$ PURE solve_simul
    REAL, INTENT(IN) :: tol
    REAL, INTENT(INOUT) :: y, z
  END SUBROUTINE
END INTERFACE

REAL a(100), b(100), c(100)
INTEGER bits(10)

CALL solve_simul( 0.1, a, b )
CALL solve_simul( c, a, b )
CALL mvbits( bits, 0, 4, bits, 4) ! Fortran 90 elemental intrinsic
                                                                \EDOC

User-defined elemental procedures have several potential advantages.
They are a convenient programming tool, as the same procedure 
can be applied to actual arguments of any rank.

In addition, the implementation of an elemental function returning an
array-valued result in an array expression is likely to be more 
efficient than that of an equivalent array function.  One reason is 
that it requires less temporary storage for the result (i.e.\ storage 
for a single result versus storage for the entire array of results).  
Another is that it saves on looping if an array expression is 
implemented by sequential iteration over the component elemental 
expressions (as may be done for the `segment' of the array expression 
local to each process).  This is because, in the sequential version, 
the elemental function can be invoked elementally in situ within the 
expression.  The array function, on the other hand, must be executed 
before the expression is evaluated, storing its result in a temporary 
array for use within the expression.  Looping is then required during 
the execution of the array function body as well as the expression 
evaluation.


\subsection{MIMD parallelism via pure procedures}

We have seen that a pure procedure may be invoked concurrently at each
`element' of an array if it is referenced elementally or in a FORALL 
statement or construct (where an `element' may itself be an array in
a non-elemental reference).  In these cases, a limited form of MIMD 
parallelism can be obtained by means of branches within the pure procedure 
which depend on arguments associated with array elements or their 
subscripts (the latter especially in a FORALL context).  For example:
                                                              \CODE
    FUNCTION f (x, i)
      !HPF$ PURE f
      REAL x       ! associated with array element
      INTEGER i    ! associated with array subscript
      IF (x > 0.0) THEN     ! content-based conditional
        ...
      ELSE IF (i==1 .OR. i==n) THEN    ! subscript-based conditional
        ...
      ENDIF
    END FUNCTION

    ...
    REAL a(n)
    INTEGER i
    ...
    FORALL (i=1:n)  a(i) = f( a(i), i)
    ...
    a = f( a, (/i,i=1,n/) )     ! an elemental reference equivalent
                                ! to the above FORALL

                                                              \EDOC
This may sometimes provide an alternative to using
WHERE-ELSEWHERE constructs or sequences of masked FORALLs with their 
potential synchronisation overhead. 


\subsection{Comments}

This section should be moved to the comments chapter of the final draft.

\subsubsection{Pure procedures}

\begin{itemize}

\item The constraints for a pure procedure guarantee
freedom from side-effects, thus ensuring that it can be invoked
concurrently at each
`element' of an array (where an ``element'' may itself be a data-structure, 
including an array).

\item All constraints can be statically checked, thus providing safety
for the programmer.

Of course, a price that must be paid for this additional security is
that the constraints must be quite rigorous, which means that it
is possible to write a function that is side-effect free in behaviour
but which nevertheless fails to satisfy the constraints 
(e.g.\ a function that contains an assignment to a global variable,
but in a branch that is not executed in any invocation of the function
during a particular program execution).


\item It is expected that most High Performance Fortran library 
procedures will conform to the constraints required of pure procedures
(by the very nature of library procedures), and so can be declared pure 
and referenced in FORALL statements and constructs (if they are functions) 
and within user-defined pure procedures.  It is also anticipated that 
most library procedures will not reference global data, whose use may 
sometimes inhibit concurrent execution (see below).

The constraints on pure procedures are limited to those necessary 
for statically checkable side-effect freedom and the elimination 
of saved internal state.  Subject to these restrictions, maximum 
functionality has been preserved in the definition of pure procedures.
This has been done to make elemental reference and function calls in 
FORALL as widely available as possible, and so that quite general library 
procedures can be classified as pure.  

A drawback of this flexibility is that pure procedures permit certain 
features whose use may hinder, and in the worst case prevent, concurrent 
execution in FORALL and elemental references (that is, such references 
may have to be implemented by sequentialisation).  
Foremost among these features are the access of global data, particularly 
distributed global data, and the fact that the arguments and, for a pure 
function, the result may be pointers or data structures with pointer 
components, including recursive data structures such as lists and trees.
The programmer should be aware of the potential performance penalties 
of using such features.


\item An earlier draft of this proposal contained a constraint disallowing 
pure procedures from accessing global data objects, particularly
distributed data objects.
This constraint has been dropped as inessential to the side-effect freedom 
that the HPF committee requested.
However, it may well be that some machines will have great difficulty 
implementing FORALL without this constraint.


\item One of us (JHM) is still in favour of disallowing access to global 
variables for a number of reasons: 
\begin{enumerate}
\item Aesthetically, it is in keeping with the
nature of a `pure' function, i.e. a function in the mathematical
sense, and in practical terms it imposes no real restrictions on the 
programmer, as global data can be passed-in via the argument list; 
\item Without this constraint HPF programs can no longer be implemented 
by pure message-passing, or at least not efficiently, i.e. without
sequentialising FORALL statements containing function calls and greatly
complicating their implementation; 
\item Absence of this restriction may inhibit optimisation of FORALLs
and array assignments, as the optimisation of assigning the {\it expr\/}
directly to the assignment variable rather than to a temporary intermediate
array now requires interprocedural analysis rather than just local 
analysis.
\end{enumerate}

\end{itemize}

\subsubsection{Elemental references}

\begin{itemize}

\item The original draft proposed allowing pure procedures 
to be invoked elementally even if their dummy arguments or results 
were array-valued.  These provisions have been dropped to avoid 
promoting storage order to a higher level in Fortran~90
(i.e.\ to avoid introducing the concept of `arrays-if-arrays', 
which Fortran~90 seems to strenuously avoid!)   In practical terms,
the current proposal provides the same functionality as the original 
one for functions, though not for subroutines.  If a programmer wants 
elemental function behaviour, but also wants the `elements' to be
array-valued, this can be achieved using FORALL.

\item In typical FORALL or elemental implementation, a pure procedure 
would be called independently in each process, and its dummy arguments 
would be associated with `elements' local to that process. 
This is the reason for disallowing data mapping directives for
local and dummy variables within the bodies of such procedures.
Note that, particularly in elemental invocations, the actual arguments
can be distributed arrays which need not be `co-distributed'; if not,
a typical implementation would in general perform all data communications 
prior to calling the procedure, and would then pass-in the required 
elements locally via its argument list.

However, access to large global data structures such as look-up tables
is often useful within functions that are otherwise mathematically pure,
and these are allowed to be distributed.

\end{itemize}


\section{The INDEPENDENT Directive}

\label{do-independent}

\footnote{Version of August 20, 1992
 - Guy Steele, Thinking Machines Corporation, and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992; however, the INDEPENDENT subgroup was directed to
examine methods of allowing reductions to be performed within
INDEPENDENT constructs.}
The INDEPENDENT directive can procede a DO loop or FORALL statement or
construct.
Intuitively, it asserts to the compiler that the operations in the
following construct
may be executed independently--that is, in any order, or
interleaved, or concurrently--without changing the semantics
of the program.

The syntax of the INDEPENDENT directive is
                                                  \BNF
independent-dir	\IS	!HPF$INDEPENDENT [ (integer-variable-list) ]
                                                  \FNB

\noindent
Constraint: An {\it independent-dir\/} must immediately precede a DO or FORALL
statement.

\noindent
Constraint: If the {\it integer-variable-list\/} is present, then the
variables named must be the index variables of set of perfectly nested
DO loops or indices from the same FORALL header.

The directive is said to apply to the indices named in its {\it
integer-variable-list}, or equivalently to the loops or FORALL indexed
by those variables.
If no {\it integer-variable-list\/} is present, then it is as if it
were present and contained the index variable for the DO or FORALL
imediately following the directive.


When applied to a nest of DO loops, an INDEPENDENT directive is an
assertion by the programmer that no iteration may affect any other
iteration, either directly or indirectly.
This implies that there are no no exits from the construct other than
normal loop termination, and no I/O is performed by the loop.
A sufficient condition for ensuring this is that
during
the execution of the loop(s), no iteration assigns to any scalar
data object which is 
accessed (i.e.\ read or written) by any other iteration.
The directive is purely advisory and a compiler is free
to ignore them if it cannot make use of the information.


For example:
                                                  \CODE
!HPF$INDEPENDENT
      DO I=1,100
        A(P(I)) = B(I)
      END DO
                                                  \EDOC
asserts that the array P does not have any repeated entries (else they
would cause interference when A was assigned).
It also limits how A and B may be storage associated.
(The remaining examples in this
section assume that no variables are storage or sequence associated.)

Another example:
                                                  \CODE
!HPF$INDEPENDENT (I1,I2,I3)
      DO I1 = 1,N1
        DO I2 = 1,N2
          DO I3 = 1,N3
            DO I4 = 1,N4   !The inner loop is not independent!
              A(I1,I2,I3) = A(I1,I2,I3) + B(I1,I2,I4)*C(I2,I3,I4)
            END DO
          END DO
        END DO
      END DO
                                                  \EDOC
The inner loop is not independent because each element of A is
assigned repeatedly.
However, the three outer loops are independent because they access
different elements of A.
It is not relevant that the outer loops read the same elements from B
and C, because those arrays are not assigned.

The interpretation of INDEPENDENT for FORALL is similar to that for
DO: it asserts that no combination of the indices that INDEPENDENT
applies to may affect another combination.
This is only possible if one combination of index values assigns to a
scalar data object accessed by another
combination.
A DO and a FORALL with the same body are equivalent if they both
have the INDEPENDENT directive.
In the case of a FORALL, any of the variables may be mentioned in the
INDEPENDENT directive:
                                                                \CODE
!HPF$INDEPENDENT (I1,I3)
    FORALL(I1=1:N1,I2=1:N2,I3=1:N3) 
      A(I1,I2,I3) = A(I1,I2-1,I3)
    END FORALL
                                                                \EDOC
This means that for any given values for I1 and I3,
all the right-hand sides for all values of I2 must
be computed before any assignment are done for that
specific pair of (I1,I3) values; but assignments for
one pair of (I1,I3) values need not wait for rhs
evaluation for a different pair of (I1,I3) values.

Graphically, the INDEPENDENT directive can be visualized as
eliminating edges from a precedence graph representing the program.
Figure~\ref{fig-dep} shows the dependences that may normally be
present in a DO an a FORALL.
An arrow from a left-hand-side node (for example, ``lhsa(1)'') 
to a right-hand-side node (e.g. ``rhsb(1)'') means that the RHS
computation may use values assigned in the LHS nodel; thus the
right-hand side must be computed after the left-hand side completes
its store.
Similarly, an arrow from a RHS node to a LHS node means that the LHS
may overwrite a value needed by the RHS computation, again forcing an
ordering.
Edges from the ``BEGIN'' and to the ``END'' nodes represent control
dependences.
The INDEPENDENT directive asserts that the only dependences that a
compiler need enforce are those in Figure~\ref{fig-indep}.
That is, the programmer who uses INDEPENDENT is certifying that if the
compiler only enforces these edges, then the resulting program will be
equivalent to the one in which all the edges are present.
Note that the set of asserted dependences is identical for INDEPENDENT
DO and FORALL constructs.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Here come the pictures!
%

{

%length for use in pictures
\setlength{\unitlength}{0.03in}

%nodes used in all pictures
\newsavebox{\nodes}
\savebox{\nodes}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(50.0,100){\makebox(0,0){BEGIN}}
    \put(20.0,80.0){\makebox(0,0){rhsa(1)}}
    \put(50.0,80.0){\makebox(0,0){rhsa(2)}}
    \put(80.0,80.0){\makebox(0,0){rhsa(3)}}
    \put(20.0,60.0){\makebox(0,0){lhsa(1)}}
    \put(50.0,60.0){\makebox(0,0){lhsa(2)}}
    \put(80.0,60.0){\makebox(0,0){lhsa(3)}}
    \put(20.0,40.0){\makebox(0,0){rhsb(1)}}
    \put(50.0,40.0){\makebox(0,0){rhsb(2)}}
    \put(80.0,40.0){\makebox(0,0){rhsb(3)}}
    \put(20.0,20.0){\makebox(0,0){lhsb(1)}}
    \put(50.0,20.0){\makebox(0,0){lhsb(2)}}
    \put(80.0,20.0){\makebox(0,0){lhsb(3)}}
    \put(50.0,0){\makebox(0,0){END}}
    \put(50.0,100){\oval(25,5)}
    \put(20.0,80.0){\oval(20,5)}
    \put(50.0,80.0){\oval(20,5)}
    \put(80.0,80.0){\oval(20,5)}
    \put(20.0,60.0){\oval(20,5)}
    \put(50.0,60.0){\oval(20,5)}
    \put(80.0,60.0){\oval(20,5)}
    \put(20.0,40.0){\oval(20,5)}
    \put(50.0,40.0){\oval(20,5)}
    \put(80.0,40.0){\oval(20,5)}
    \put(20.0,20.0){\oval(20,5)}
    \put(50.0,20.0){\oval(20,5)}
    \put(80.0,20.0){\oval(20,5)}
    \put(50.0,0){\oval(25,5)}
    \put(50,97.5){\vector(-2,-1){30}}
    \put(50,97.5){\vector(0,-1){15}}
    \put(50,97.5){\vector(2,-1){30}}
    \put(20,17.5){\vector(2,-1){30}}
    \put(50,17.5){\vector(0,-1){15}}
    \put(80,17.5){\vector(-2,-1){30}}
    \end{picture}
}

\begin{figure}

\begin{minipage}{2.70in}
\CODE
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\web}
\savebox{\web}{
    \begin{picture}(60,15)(0,0)
    \put(0,15){\vector(0,-1){15}}
    \put(0,15){\vector(2,-1){30}}
    \put(0,15){\vector(4,-1){60}}
    \put(30,15){\vector(-2,-1){30}}
    \put(30,15){\vector(0,-1){15}}
    \put(30,15){\vector(2,-1){30}}
    \put(60,15){\vector(0,-1){15}}
    \put(60,15){\vector(-2,-1){30}}
    \put(60,15){\vector(-4,-1){60}}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,62.5){\usebox\web}
\put(20,42.5){\usebox\web}
\put(20,22.5){\usebox\web}
\end{picture}
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\chain}
\savebox{\chain}{
    \begin{picture}(20,70)(0,0)
    \put(2.5,2.5){\oval(5,5)[bl]}
    \put(2.5,0){\vector(1,0){5}}
    \put(7.5,2.5){\oval(5,5)[br]}
    \put(10,2.5){\vector(0,1){32.5}}
    \put(10,35){\line(0,1){32.5}}
    \put(12.5,67.5){\oval(5,5)[tl]}
    \put(12.5,70){\vector(1,0){5}}
    \put(17.5,67.5){\oval(5,5)[tr]}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,77.5){\vector(0,-1){15}}
\put(20,57.5){\vector(0,-1){15}}
\put(20,37.5){\vector(0,-1){15}}
\put(25,15){\usebox\chain}
\put(50,77.5){\vector(0,-1){15}}
\put(50,57.5){\vector(0,-1){15}}
\put(50,37.5){\vector(0,-1){15}}
\put(55,15){\usebox\chain}
\put(80,77.5){\vector(0,-1){15}}
\put(80,57.5){\vector(0,-1){15}}
\put(80,37.5){\vector(0,-1){15}}
\end{picture}
\end{minipage}

\caption{Dependences in DO and FORALL without
INDEPENDENT assertions}
\label{fig-dep}
\end{figure}

\begin{figure}

%Draw the picture once, use it twice
\newsavebox{\easy}
\savebox{\easy}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(10,-2.5){\usebox\nodes}
    \put(20,77.5){\vector(0,-1){15}}
    \put(20,57.5){\vector(0,-1){15}}
    \put(20,37.5){\vector(0,-1){15}}
    \put(50,77.5){\vector(0,-1){15}}
    \put(50,57.5){\vector(0,-1){15}}
    \put(50,37.5){\vector(0,-1){15}}
    \put(80,77.5){\vector(0,-1){15}}
    \put(80,57.5){\vector(0,-1){15}}
    \put(80,37.5){\vector(0,-1){15}}
    \end{picture}
}

\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\usebox\easy
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\usebox\easy
\end{minipage}

\caption{Dependences in DO and FORALL with
INDEPENDENT assertions}
\label{fig-indep}
\end{figure}

}

%
%
% End of pictures
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


The compiler is justified in producing
a warning if it can prove that one of these assertions is incorrect.
It is not required to do so, however.
A program containing any false assertion of this type is not
standard-conforming, and the compiler may take any action it deems necessary.


This directive is of course similar to the DOSHARED directive
of Cray MPP Fortran.  A different name is offered here to avoid
even the hint of commitment to execution by a shared memory machine.
Also, the "mechanism" syntax is omitted here, though we might want
to adopt it as further advice to the compiler about appropriate
implementation strategies, if we can agree on a desirable set
of options.


\section{Other Proposals}

The following are proposals made for modification or replacement of the 
above sections.

\subsection{A Proposal for MIMD Support in HPF}

\label{mimd-support}
	          

\subsubsection{Abstract}

\footnote{Version of July 18, 1992 - Clemens-August Thole, GMD I1.T.
In the interest of time, these features were not considered for inclusion 
in the first round of HPFF.}
This proposal tries to supply sufficient language support in order 
to deal with loosely sysnchronous programs, some of which have been 
identified in my "A vote for explicit MIMD support".
This is a proposal for the support of MIMD parallelism, which extends
Section~\ref{do-independent}. 
It is more oriented
towards the CRAY - MPP Fortran Programming Model and the PCF proposal. 
The fine-grain synchronization of PCF is not proposed for implementation.
Instead of the CRAY-mechanims for assigning work to processors an 
extension of the ON-clause is used.
Due to the lack of fine-grain synchronization the constructs can be
executed
on SIMD or sequential architectures just by ignoring the additional
information.


\subsubsection{Summary of the current situation of MIMD support as part of
HPF}

According to the Charles Koelbel's (Rice) mail dated March 20th "Working
Group 4 -
Issues for discussion" MIMD-support is a topic for discussion within
working
group 4. 

Dave Loveman (DEC) has produced a document on FORALL statements 
(inorporated in Sections~\ref{forall-stmt} and \ref{forall-construct})
which
summarizes the discussion. Marc Snir proposed some extensions. These
constructs allow to describe SIMD extensions in an extended way compared
to array assignments. 

A topic for working papers is the interface of HPF Fortran to program units
which execute in SPMD mode. Proposals for "Local Subroutines" have been
made
by Marc Snir and Guy Steele
(Chapter~\ref{foreign}). Both proposals
define local subroutines as program units, which are executed by all
processors independent of each other. Each processor has only access
to the data contained in its local memory. Parts of distributed data
objects
can be accessed and updated by calls to a special library. Any
message-passing
library might be used for synchronization and communication.
This approach does not really integrate MIMD-support into HPF programming.

The MPP Fortran proposal by Douglas M. Pase, Tom MacDonald, Andrew Meltzer
(CRAY)
contained the following features in order to support integrated MIMD
features:
\begin{itemize}
   \item  parallel directive
   \item  shared loops 
   \item  private variables
   \item  barrier synchronization
   \item  no-barrier directive for removing synchronization
   \item  locks, events, critical sections and atomic update
   \item  functions, to examine the mapping of data objects.
\end{itemize}

Steele's "Proposal for loops in HPF" (02.04.92) included a proposal for a 
directive "!HPF$ INDEPENDENT( integer_variable_list)", which specifies
for the next set of nested loops, that the loops with the specified
loop variables can be executed independent from each other.
(Sectin~\ref{do-independent} is a short version of this proposal.) 

Charles Koelbel gave an overview on different styles for parallel loops
in "Parallel Loops Position Paper". No specific proposal was made.

Min-You Wu "Proposal for FORALL, May 1992" extended Guy Steele's 
"!HPF$ INDEPENDENT" proposal to use the directive in a block style.

Clemens-August Thole "A vote for explicit MIMD support" contains 3 examples
from different application areas, which seem to require MIMD support for
efficient execution. 

\paragraph{Summary}

In contrast to FORALL extensions MIMD support is currently not
well-established
as part of HPF Fortran. The examples in "A vote for explicit MIMD support"
show clearly the need for such features. Local subroutines do not fulfill
the requirements because they force to use a distributed memory programming
model,
which should not be necessary in most cases.

With the exception of parallel sections all interesting features
are contained in the MPP-proposal. I would like to split the discussion
on specifying parallelism, synchronization and mapping into three different
topics. Furthermore I would like to see corresponding features to be
expessed
in the style of of the current X3H5 proposal, if possible, in order to
be in line with upcoming standards.


\subsubsection{Proposal for MIMD support}

In order to support the spezification of MIMD-type of parallelism the
following
features are taken from the "Fortran 77 Binding of X3H5 Model for 
Parallel Programming Constructs": 
\begin{itemize}
    \item   PARALLEL DO construct/directive
    \item   PARALLEL SECTIONS worksharing construct/directive
    \item   NEW statement/directive
\end{itemize}

These constructs are not used with PCF like options for mapping or 
sysnchronisation but are combined with the ON clause for mapping operations
onto the parallel architecture. 

\paragraph{PARALLEL DO}

\subparagraph{Explicit Syntax}

The PARALLEL DO construct specifies parallelism among the 
iterations of a block of code. The PARALLEL DO construct has the same
syntax as a DO statement. For a directive approach the directive
!HPF$ PARALLEL can be used in front of a do statement.
After the PARALLEL DO statement a new-declaration may be inserted.

A PARALLEL DO construct might be nested with other parallel constructs. 

\subparagraph{Interpretation}

The PARALLEL DO is used to specify parallel execution of the iterations of
a block of code. Each iteration of a PARALLEL DO is an independent unit
of work. The iterations of PARALLEL DO must be data independent. Iterations
are data independent if the storage sequence accociated with each variable
are array element that is assigned a value by each iteration is not
referenced
by any other iteration. 

A program is not HPF conforming, if for any iteration a statement is
executed,
which causes a transfer of control out of the block defined by the PARALLEL
DO construct. 

The value of the loop index of a PARALLEL DO is undefined outside the scope
of the PARALLEL DO construct. 


\paragraph{PARALLEL SECTIONS}

The parallel sections construct is used to specify parallelism among
sections
of code.

\subparagraph{Explicit Syntax}


                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        !HPF$ SECTION
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
structured as
                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        [new-declaration-stmt-list]
        [section-block]
        [section-block-list]
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
where [section-block] is
                                                              \CODE
        !HPF$ SECTION
        [execution-part]
                                                              \EDOC

\subparagraph{Interpretation}

The parallel sections construct is used to specify parallelism among
sections
of code. Each section of the code is an independent unit of work. A program
is not standard conforming if during the execution of any parallel sections
construct a transfer of control out of the blocks defined by the Parallel
Sections construct is performed. 
In a standard conforming program the sections of code shall be data 
independent. Sections are data independent if the storage sequence
accociated 
with each variable are array element that is assigned a value by each
section
is not referenced by any other section. 


\paragraph{Data scoping}

Data objects, which are local to a subroutine, are different between 
distinct units of work, even if the execute the same subroutine.


\paragraph{NEW statement/directive}

The NEW statement/directive allows the user to generate new instances of 
objects with the same name as an object, which can currently be referenced.


\subparagraph{Explicit Syntax}

A [new-declaration-stmt] is
                                                                \CODE
       !HPF$ NEW variable-name-list
                                                                \EDOC

\subparagraph{Coding rules}

A [varable-name] shall not be
\begin{itemize} 
\item    the name of an assumed size array, dummy argument, common block, 
function or entry point
\item    of type character with an assumed length
\item    specified in a SAVE of DATA statement
\item    associated with any object that is shared for this parallel
construct.
\end{itemize}

\subparagraph{Interpretation}
 
Listing a variable on a NEW statement causes the object to be explicitly
private for the parallel construct. For each unit of work of the parallel 
construct a new instance of the object is created and referenced with the
specific name. 


\subsection{Nested WHERE statements}

\label{nested-where}

\footnote{Version of September 15, 1992 - Guy Steele, Thinking Machines 
Corporation.  This section has not been discussed.}
Here is the text of a proposal once sent to X3J3:
\begin{quote}
Briefly put, the less WHERE is like IF, the more difficult it is to
translate existing serial codes into array notation.  Such codes tend to
have the general structure of one or more DO loops iterating over array
indices and surrounding a body of code to be applied to array elements.
Conversion to array notation frequently involves simply deleting the DO
loops and changing array element references to array sections or whole
array references.  If the loop body contains logical IF statements, these
are easily converted to WHERE statements.  The same is true for translating
IF-THEN constructs to WHERE constructs, except in two cases.  If the IF
constructs are nested (or contain IF statements), or if ELSE IF is used,
then conversion suddenly becomes disproportionately complex, requiring the
user to create temporary variables or duplicate mask expressions and to use
explicit .AND. operators to simulate the effects of nesting.

Users also find it confusing that ELSEWHERE is syntactically and
semantically analogous to ELSE rather than to ELSE IF.

We propose that the syntax of WHERE constructs be extended and
changed to have the form
                                                                \BNF
where-construct       \IS  where-construct-stmt
 				    [ where-body-construct ]...
 				  [ elsewhere-stmt
 				    [ where-body-construct ]... ]...
 				  [ where-else-stmt
 				    [ where-body-construct ]... ]
 				  end-where-stmt
 
 	where-construct-stmt  \IS  WHERE ( mask-expr )
 
 	elsewhere-stmt        \IS  ELSE WHERE ( mask-expr )
 
 	where-else-stmt       \IS  ELSE WHERE
 
 	end-where-stmt        \IS  END WHERE
 
 	mask-expr             \IS  logical-expr
 
 	where-body-construct  \IS  assignment-stmt
 			      \IS  where-stmt
 			      \IS  where-construct
                                                                \FNB                                                     	

\noindent Constraint: In each assignment-stmt, the mask-expr and the variable
being defined must be arrays of the same shape.  If a
where-construct contains a where-stmt, an elsewhere-stmt,
or another where-construct, then the two mask-expr's must
be arrays of the same shape.
 
The meaning of such statements may be understood by rewrite rules.  First
one may eliminate all occurrences of ELSE WHERE:
                                                                \CODE
WHERE (m1)		
    xxx			
ELSE WHERE (m2)		
    yyy				
END WHERE
	                                                            \EDOC
becomes
                                                                \CODE
WHERE (m1)
    xxx
ELSE
    WHERE (m2)
        yyy
    END WHERE
END WHERE
                                                                \EDOC
where xxx and yyy represent any sequences of statements, so long as the
original WHERE, ELSE WHERE, and END WHERE match, and the ELSE WHERE is the
first ELSE WHERE of the construct (that is, yyy may include additional ELSE
WHERE or ELSE statements of the construct).  Next one eliminates ELSE:
                                                                \CODE
WHERE (m)
    xxx
ELSE
    yyy
END WHERE				WHERE (.NOT. temp)
                                                                \EDOC
becomes
                                                                \CODE
temp = m
WHERE (temp)
    xxx
END WHERE
WHERE (.NOT. temp)
    yyy
END WHERE
                                                                \EDOC

Finally one eliminates nested WHERE constructs:
                                                                \CODE
WHERE (m1)
    xxx
    WHERE (m2)
        yyy
    END WHERE
    zzz
END WHERE
                                                                \EDOC
becomes
                                                                \CODE
temp = m1
WHERE (temp)
    xxx
END WHERE
WHERE (temp .AND. (m2))
    yyy
END WHERE
WHERE (temp)
    zzz
END WHERE
                                                                \EDOC
and similarly for nested WHERE statements.

The effects of these rules will surely be a familiar or obvious possibility
to all the members of the committee; I enumerate them explicitly here only
so that there can be no doubt as to the meaning I intend to support.

Such rewriting rules are simple for a compiler to apply, or the code may
easily be compiled even more directly.  But such transformations are
tedious for our users to make by hand and result in code that is
unnecessarily clumsy and difficult to maintain.

One might propose to make WHERE and IF even more similar by making two
other changes.  First, require the noise word THERE to appear in a WHERE
and ELSE WHERE statement after the parenthesized mask-expr, in exactly the
same way that the noise word THEN must appear in IF and ELSE IF statements.
(Read aloud, the results might sound a trifle old-fashioned--"Where knights
dare not go, there be dragons!"--but technically would be as grammatically
correct English as the results of reading an IF construct aloud.)  Second,
allow a WHERE construct to be named, and allow the name to appear in ELSE
WHERE, ELSE, and END WHERE statements.  I do not feel very strongly one way
or the other about these no doubt obvious points, but offer them for your
consideration lest the possibilities be overlooked.
\end{quote}

Now, for compatibility with Fortran 90, HPF should continue to
use ELSEWHERE instead of ELSE, but this causes no ambiguity:

      WHERE(...)
	...
      ELSE WHERE(...)
	...
      ELSEWHERE
	...
      END WHERE

is perfectly unambiguous, even when blanks are not significant.
Since X3J3 declined to adopt the keyword THERE, it should not be
used in HPF either (alas).

\alternative A
\subsection{EXECUTE-ON-HOME Directive}
\label{on-clause}

\footnote{Version of 
October 14, 1992
--
Tin-Fook Ngai,
Hewlett-Packard Laboratories.
This section has not been disussed.}

The EXECUTE-ON-HOME directive is used to suggest where an
iteration of a DO construct or an indexed parallel assignment should be
executed.  The directive informs the compiler which data access should be
local and which data access may be remote.
 
                                                                       \BNF
execute-on-home-directive  \IS  EXECUTE (subscript-list) ON_HOME align-spec 
[; LOCAL array-name-list]
                                                                       \FNB

The EXECUTE-ON-HOME directive must immediately precede the corresponding
DO loop body, array assignment, FORALL statement, FORALL construct or
individual assignment statement in a FORALL construct.

The scope of an EXECUTE-ON-HOME directive is the entire loop body of the
enclosing DO construct, or the following array assignment, FORALL
statement, FORALL construct or assignment statement in a FORALL construct.

The {\em subscript-list} identifies a distinct iteration index or an indexed
parallel assignment.  The {\em align-spec} identifies a template node.  Every
iteration index or indexed assignment must be associated with one
and only one template node.  The EXECUTE-ON-HOME directive suggests that
the iteration or parallel assignment should be executed on the processor
to where the template node is mapped.  When the EXECUTE-ON-HOME directive
is applied to a subroutine call, it affects only the execution location of
the caller but not the execution location of the called subroutine.

The optional LOCAL directive informs the compiler that all data accesses
to the specified {\em array-name-list} can be handled as local data
accesses if the related HPF data mapping directives are honored.

EXECUTE-ON-HOME directives can be nested, but only the immediately
preceding EXECUTE-ON-HOME directive is effective.


\paragraph{Example 1}
                                                                         \CODE 
      REAL A(N), B(N)
!HPF$ TEMPLATE T(N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ DISTRIBUTE T(CYCLIC(2))

!HPF$ INDEPENDENT            
      DO I = 1, N/2 
!HPF$ EXECUTE (I) ON_HOME T(2*I); LOCAL A, B, C
      ! we know that P(2*I-1) and P(2*I) is a permutation 
      ! of 2*I-1 and 2*I
        A(P(2*I - 1)) = B(2*I - 1) + C(2*I - 1)    
        A(P(2*I)) = B(2*I) + C(2*I)
      END DO
                                                                         \EDOC 


\paragraph{Example 2}
                                                                         \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ EXECUTE (I,J) ON_HOME T(I+1,J-1)
      FORALL (I=1:N-1, J=2:N)   A(I,J) = A(I+1,J-1) + B(I+1,J-1)
                                                                         \EDOC


\paragraph{Example 3}
                                                                         \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ EXECUTE (I,J) ON_HOME T(I,J)  
      ! apply to the entire FORALL construct
      FORALL (I=1:N-1, J=2:N) 
        A(I,J) = A(I+1,J-1) + B(I+1,J-1)
        B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
                                                                         \EDOC


\paragraph{Example 4}
                                                                         \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
      FORALL (I=1:N-1, J=2:N) 
!HPF$ EXECUTE (I,J) ON_HOME T(I,J)  
      ! applies only to the following assignment
        A(I,J) = A(I+1,J-1) + B(I+1,J-1)
        B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
                                                                         \EDOC


\paragraph{Example 5}
                                                                         \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

!HPF$ EXECUTE (I,J) ON_HOME T(I+1,J-1)
      A(1:N-1,2:N) = A(2:N,1:N-1) + B(2:N,1:N-1)
                                                                         \EDOC

   
\paragraph{Example 6}

The original program for this example is due to Michael Wolfe of Oregon 
Graduate Institute.

This program performs matrix multiplication \(C = A \times B\).
In each step, array B is rotated by row-blocks, multiplied
diagonal-block-wise in parallel with A, results are accumulated in C 

Note that without the EXECUTE-ON-HOME and LOCAL directive, the compiler
will have a hard time to figure out all A, B and C accesses are actually
local, thus it is unable to generate the best efficient code (i.e.
communication-free and no runtime checking in the parallel loop body).
 
                                                                         \CODE
      REAL A(N,N), B(N,N), C(N,N)

      PARAMETER(NOP = NUMBER_OF_PROCESSORS())
!HPF$ REALIGNABLE B
!HPF$ TEMPLATE T(2*N,N)               ! to allow wrap around mapping
!HPF$ ALIGN (I,J) WITH T(I,J):: A, C      
!HPF$ ALIGN B(I,J) WITH T(N+I,J)
!HPF$ DISTRIBUTE T(CYCLIC(N/NOP),*)   ! distributed by row blocks

      IB = N/NOP

      DO IT = 0, NOP-1

      ! rotate B by row-blocks
!HPF$ REALIGN B(I,J) WITH T(N-IT*IB+I,J)  

!HPF$ INDEPENDENT                     ! data parallel loop
        DO IP = 0, NOP-1     
!HPF$ EXECUTE (IP) ON_HOME T(IP*IB+1,1); LOCAL A, B, C
          ITP = MOD( IT+IP, NOP )

          DO I = 1, IB
            DO J = 1, N
              DO K = 1, IB
                C(IP*IB+I,J) = C(IP*IB+I,J) +                      
     1                         A(IP*IB+I,ITP*IB+K)*B(ITP*IB+K,J)
              ENDDO  ! K 
            ENDDO  ! J 
          ENDDO  ! I 
        ENDDO  ! IP

      ENDDO  ! IT
                                                                        \EDOC

\subsubsection{Commentary}

** Note that the following discussion is on an eariler proposal. The
current proposal has already addressed most of the issues raised from the
discussion. **

The following is a discussion between Henk Sips and Tin-Fook Ngai from 
the mailing list.  It is included to clarify some issues in the preceeding.

\begin{enumerate}
\item Sips: The execution model of HPFF (not completely approved yet) states: 
\begin{quote}
The code compiled by an HPF compiler ought do no worse than code
compiled using the owner compute rule.
\end{quote}
This is more relaxed than saying "it uses the owner compute rule". Your
EXECUTE ON is much more specific towards fixing execution on a specified
processor. 

Ngai: What are you objecting to? The EXECUTE ON is a directive.

\item Sips: A template is not executed, so one can't say EXECUTE x ON T. 
Something like 
EXECUTE\_ON\_HOME T should be adopted (Since templates are currently no
objects one could even deny this)

Ngai: Agree.  I never feel comfortable with the key words I used.  I don't have
objection to ``EXECUTE x ON\_HOME T''.  Any other suggestions are also
welcome.


\item Sips: Adopting EXECUTE x ON on DO loops without any indepence requirements, as
your proposal seems to allow, can yield all kind of intricate
synchronization schemes, when iterations are not independent (or must be
assumed to be dependent). This seems to go further than the first simple
step, which HPFF ought to be.

Ngai: Clearly, the proposed feature is primarily intended for INDEPENDENT DO,
FORALL and other parallel indexed assignments.  Before making the
proposal, I have also thought about ordinary DO loops as you pointed out
here. Code generation seems not a problem: If the user choose to specify
execution location of an iteration of an ordinary DO loops, simple
compilation requires only one synchronication at the end of each iteration
to ensure the DO sequential semantics. This naive compilation looks dumb
but the user may still gain due to the already data distribution.  (A
smarter compiler of course can do a better job but definitely is not
required.)  That is why I don't restrict EXECUTE ON to INDEPENDENT DOs and
make the rule simpler.


\item Sips: Binding iterations to templates can currently only be done statically,
since 
the current draft does not allow dynamic templates. So iterations
boundaries must be known at compile time. One has to apply the subroutine
trick to allow this, which is not very neat.

Ngai: That is the intention of the proposal:  Only static binding is allowed.
Even the loop index is bounded by runtime variable, the binding to
template node is still static.


\item Sips: Allowing EXECUTE x ON on groups of statements, gives a scoping issue, so
there should also be something like END EXECUTE x ON, do undo the
annotation. 

Ngai: The current proposal seems sufficient in this issue.  The scope for single
statement (FORALL statement, single statement in FORALL construct, and array
assignment statement) is clear.  For groups of statements, EXECUTE ON can
only applies to either the entire body within a FORALL construct or the
entire iteration of a DO loop.


\item Sips: Again we have complicated scoping problems. How about this example:
                                                                \CODE
!HPF$ TEMPLATE T1(N), T2(N)

DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    DO J=1,N
      !HPF$ EXECUTE (J) ON T2(J)
      A(I,J) =  A(I,J) + B(I,J) 
    ENDDO
ENDDO
                                                                \EDOC
This example satisfies the constraint only if by entering the J-loop, the
I-index is dereferenced from the assertion just after the I-loop. Although
logical, it might be confusing to users. However, in the program
                                                                \CODE
!HPF$ TEMPLATE T1(N), T2(N)
DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    DO J=1,N
        A(I,J) =  A(I,J) + B(I,J) 
    ENDDO
ENDDO
                                                                \EDOC
there is no such dereferencing. 

Ngai: Good examples.  This bug surely needs to be fixed.  Here is my solution:
\begin{itemize}
\item For nested EXECUTE ON directives, only the immediate enclosed EXECUTE ON
  directive is effective.
\end{itemize}
In the former example, the statement ``C(I) = D(I)" will be executed on the
home of T1(I) while the statement ``A(I,J) =  A(I,J) + B(I,J)" will be
executed on home of T2(J) for all I.  In the latter case, the entire
I-loop body that includes the DO J loop is executed on the home of T1(I).

\item Sips: The wrap feature of templates will probably be deleted from the draft.
The same thing (shifting data each iteration) can reached by using CSHIFT
or the subroutine trick and making the template as large as the ieteration
space.

Ngai: I and Wolfe discussed the example (Example 6 in the proposal) long before
our revision on the wrap feature.  Sorry for any confusion from this
example.  (However, this example also illustrates the use of wrap in data
distribution -- we should come up with a cleaner solution next meeting.)

\item Sips: We cannot do separate compilation in some examples:
                                                                \CODE
!HPF$ TEMPLATE T1(N)
  DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    DO J=1,N
      A(I,J) = B(I,J)
    ENDDO
  ENDDO
                                                                \EDOC
Here A(I,J) is calculated in T(I). If we encapsulate the J-loop into a
subroutine we get something like:
                                                                \CODE
!HPF$ TEMPLATE T1(N)
  DO I=1,N
    !HPF$ EXECUTE (I) ON T1(I)
    C(I) = D(I)
    CALL FOO(A(I,:),B(I,:))
  ENDDO

...

SUBROUTINE FOO(AA,BB)
!HPF$ ALIGN AA,BB with *
    DO J=1,N
      AA(J) = BB(J)
    ENDDO
                                                                \EDOC
The dummy arguments AA and BB are aligned to the incoming array sections.
Normally, without any EXECUTE\_ON, the subroutine would execute AA(J), which
is equivalent to A(I,J), on home A(I,J). However, with an EXECUTE\_ON in the
main program there is no way the subroutine can know that AA(J) should be
executed on T(I). The consequence is that any subroutine call is to *undo*
the EXECUTE\_ON on entry and {\em redo} the EXECUTE_ON on return.

(Ngai did not respond publically.)
\end{enumerate}

\alternative B
\subsection{Proposal for a Statement Grouping Syntax and ON Clause}

\footnote{Version of October 5, 1992 -- Clemens-August Thole, GMD-I1.T,
Sankt Augustin.
This section has not been discussed.}
I agree with Tin-Fook, that something like the on-clause should be 
contained in HPF. I brought a proposal with me to the last HPF meeting 
which was distributed by Chuck, but neither the FORALL working group
nor the plenary had time to discuss the proposal.

I would appreciate comments on the various features.

\subsubsection{Introduction}

This proposal introduces an extension to HPF to group several 
statements in order to be able to specify properties for a whole block
of statement at once. A block of statements is called HPF-section.
HPF-sections can be used to describe properties for independent execution
between blocks of statements aswell as the mapping of their execution.

For the specification of a specific mapping of the execution of statements
or HPF-sections the ON-clause is introduced. A subset of a template is used
as reference object onto which the statements are mapped in an canonical
manner. The careful selection of the reference template allows to specify,
how the execution of the code is mapped onto the parallel architecture.


\subsubsection{HPF-sections}

The HPF directives SECTIONS, SECTION, and END SECTIONS are used to specify
grouping of statements. SECTIONS and END SECTIONS specify the beginning
and end of a list of HPF-sections and SECTION the beginning of the next 
HPF-section. The syntax is as follows:
                                                                \BNF
hpf-block \IS        !HPF$ SECTION
                [HPF-section-list]
        !HPF$ END SECTIONS

hpf-section \IS        !HPF$ SECTION
                [execution-part]
                                                                \FNB
\noindent Constraint: For any {\em hpf-section} under no circumstances a 
transfer of control
is performed during the execution of the code outside of its 
{\em execution-part}.

\paragraph{Example}
                                                                \CODE
        !HPF$ SECTIONS
        !HPF$ SECTION
                A = A + B
                B = C + D
        !HPF$ SECTION
                E = B
                IF (E.GT.F) GOTO 10
                        E = 0D0
         10     CONTINUE
        !HPF$ END SECTIONS
                                                                \EDOC
This example specifies a list of two HPF-sections. The control statement in
the second HPF-section is valid because after the transfer of control the
execution continues in the same HPF-section.


\subsubsection{ON-clause}

The ON-clause specifies a subsection of a template, which is used as a reference
object for the execution of the next statement, construct, of HPF-section.
If the left-hand-side of an assignment coinsides in shape with the reference
object, the evaluation of the right-hand-side and the assignment for 
a specific element of the left-hand-side is performed at that processor, onto
which the corresponding element of the reference object is mapped.

\paragraph{Syntax}

Add the following rules:
                                                                \BNF
executable-construct \IS        !HPF$ ON on-spec
                executable-construct

hpf-section \IS        !HPF$ ON on-spec
                hpf-section
        
on-spec \IS        align-spec
                                                                \FNB
The {\it executable-construct} of {\it hpf-section} is called on-clause-target.

\paragraph{Constraints}
\begin{enumerate}
\item No {\it executable-construct} may be used as object of the on-clause, which
   generates any transfer of control out of the construct itself. This
   includes the entry-statement. 
\item {\it Statement-block}s used in constructs must fulfill the constraints of
   HPF-sections.
\item The shape of the {\it on-spec} must cover in each dimension the shape of
   of any left-hand-side of an assignment statement, which is target of an
   on-clause. If a "*" is used in the {\it on-spec}, this dimension is skipped
   for constructing the shape of the {\it on-spec}.
\item If an on-clause is contained in the on-clause-target, the new {\it on-spec}
   must be a subsection of the {\it on-spec} of the outer on-clause.
\end{enumerate}

\paragraph{Example}
                                                                \CODE
                REAL, DIMENSION(n) :: a, b, c, d
        !HPF$   TEMPLATE grid(n)
        !HPF$   ALIGN WITH grid :: a, b, c, d

        !HPF$   ON grid(2:n)
                a(1:n-1) = a(2:n) + b(2:n) + c(2:n)
                                                                \EDOC
The on-clause indicates, that the evaluation of the right-hand-side is 
performed on that processors, which hold the data elements of the 
right-hand-side. For the assignment to the left-hand-side data movement is
necessary.

\paragraph{Interpretation}

The interpretation of the on-clause depends on the type of the on-clause-target.

If the on-clause-target is an assignment statement the {\it on-spec} is used to
determine where the assignment statement is executed. If the shape of the 
right-hand-side is identically to the shape of {\it on-spec}, the computation for
a specific element of the assignment statement is performed where the 
corresponding element of the {\it on-spec} is mapped to. If the shape of the 
{\it on-spec} is larger, the compiler may use any sufficient larger subsection.
The use of "*" in the {\it on-spec} specifies, that the same computations are
mapped onto the corresponding line of processors and several processors
will do the same update. This may save communication operations.
The the case of the where-statement, the forall-statement, and the 
forall-construct the same mapping is applied to the evaluation of the 
conditions and each assignment.

If the on-clause is placed in front of the if-construct, that case-construct,
or the do-construct, the {\it on-spec} is used for the evaluations of the 
conditions as well as the loop bounds and the execution of the statement-blocks,
which are part of the construct. For the statement-blocks the interpretation 
rules for HPF-sections apply.

With respect to the allocate, deallocate, nullify, and I/O related statements
the {\it on-spec} is used for the evaluation of the parameters of the statements
and the evaluation of I/O objects. 

In the case of subroutine calls and functions the {\it on-spec} is used for the
evaluation of the parameters. It determines also the mapping of the resulting 
object. The {\it on-spec} determines also the set of processors, which will be
used for the evaluation of the subroutine. 

In the case of HPF-sections the on-clause is applied to each statement of the
execution part. Control transfer statements are allowed in this case and the 
constraints ensure, that the context on the same {\it on-spec} is not lost.

\paragraph{Additional example}
                                                                \CODE
        REAL, DIMENSION(n,n) :: a, b, c, d
!HPF$   TEMPLATE grid(n,n)
!HPF$   ALIGN WITH grid :: a, b, c, d

!HPF$   ON grid(2:n,2:n)
        DO i=2,n
!HPF$       ON grid(i,2:n)
            DO j=2,n
!HPF$           ON grid(i,j)
                a(i-1,j-1) = a(i,j) + b(i,j)*c(i,j)
            ENDDO
        ENDDO
                                                                \EDOC

\paragraph{Comment}

The compiler should be able to adjust the span of the loops to the local 
extent 
due to the restrictions on the specifiers of the sections of the {\it 
on-spec}.


\subsection{ALLOCATE in FORALL}

\label{forall-allocate}

\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation.
At the September 10-11 meeting, this was not included as part of the
FORALL because it seemed too big a leap from the allowed assignment
statements.}
Proposal:  ALLOCATE, DEALLOCATE, and NULLIFY statements may appear
	in the body of a FORALL.

Rationale: these are just another kind of assignment.  They may have
	a kind of side effect (storage management), but it is a
	benign side effect (even milder than random number generation).

Example:
                                                            \CODE
      TYPE SCREEN
        INTEGER, POINTER :: P(:,:)
      END TYPE SCREEN
      TYPE(SCREEN) :: S(N)
      INTEGER IERR(N)
      ...
!  Lots of arrays with different aspect ratios
      FORALL (J=1:N)  ALLOCATE(S(J)%P(J,N/J),STAT=IERR(J))
      IF(ANY(IERR)) GO TO 99999
                                                            \EDOC

\subsection{Generalized Data References}

\label{data-ref}

\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation.
This was not acted on at the September 10-11 meeting because the
FORALL subgroup wanted to minimize changes to the Fortran~90 standard.}
Proposal:  Delete the constraint in section 6.1.2 of the Fortran 90
	standard (page 63, lines 7 and 8):
\begin{quote}
	Constraint: In a data-ref, there must not be more than one
		part-ref with nonzero rank.  A part-name to the right
		of a part-ref with nonzero rank must not have the
		POINTER attribute.
\end{quote}

Rationale: further opportunities for parallelism.

Example:
                                                                     \CODE
TYPE(MONARCH) :: C(N), W(N)
      ...
! Munch that butterfly
C = C + W * A%P		! Illegal in Fortran 90
                                                                      \EDOC


\subsection{FORALL with INDEPENDENT Directives}
\label{begin-independent}

\footnote{Version of July 21, 1992) - Min-You Wu.
This was rejected at the FORALL subgroup meeting on September 9, 1992,
because it only offered syntactic sugar for capabilities already in
the FORALL INDEPENDENT.  It was also suggested that the BEGIN
INDEPENDENT syntax
should be reserved for other uses, such as MIMD features.}
This proposal is an extension of Guy Steele's INDEPENDENT proposal.
We propose a block FORALL with the directives for independent 
execution of statements.  The INDEPENDENT directives are used
in a block style.  

The block FORALL is in the form of
                                                         \CODE
      FORALL (...) [ON (...)]
        a block of statements
      END FORALL
                                                         \EDOC
where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:
                                                         \CODE
!HPF$BEGIN INDEPENDENT
!HPF$END INDEPENDENT
                                                         \EDOC
The two directives must be used in pair.  
A sub-block of statements 
parenthesized in the two directives is called an {\em asynchronous} 
sub-block or {\em independent} sub-block.  
The statements that are 
not in an asynchronous sub-block are in {\em synchronized} sub-blocks
or {\em non-independent} sub-block.  
The synchronized sub-block is 
the same as Guy Steele's synchronized FORALL statement, and the 
asynchronous sub-block is the same as the FORALL with the INDEPENDENT 
directive.  
Thus, the block FORALL
                                                          \CODE
      FORALL (e)
        b1
!HPF$BEGIN INDEPENDENT
        b2
!HPF$END INDEPENDENT
        b3
      END FORALL
                                                           \EDOC
means roughly the same as
                                                           \CODE
      FORALL (e)
        b1
      END FORALL
!HPF$INDEPENDENT
      FORALL (e)
        b2
      END FORALL
      FORALL (e)
        b3
      END FORALL
                                                          \EDOC
														  
Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronous sub-block are completely independent.
The INDEPENDENT directives indicates to the compiler there is no 
dependence and consequently, synchronizations are not necessary.
It is users' responsibility to ensure there is no dependence
between instances in an asynchronous sub-block.
A compiler can do dependence analysis for the asynchronous sub-blocks
and issue an error message when there exists a dependence or a warning
when it finds a possible dependence.

\subsubsection{What does ``no dependence between instances" mean?}

It means that no true dependence, anti-dependence,
or output dependence between instances.
Examples of these dependences are shown below:
\begin{enumerate}
\item True dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i) = ... 
        ...  = x(i+1)
      END FORALL
                                                            \EDOC
Notice that dependences in FORALL are different from that in a DO loop.
If the above example was a DO loop, that would be an anti-dependence.

\item Anti-dependence:
                                                            \CODE
      FORALL (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END FORALL
                                                            \EDOC

\item Output dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i+1) = ... 
        x(i) = ...
      END FORALL
                                                            \EDOC
\end{enumerate}

Independent does not imply no communication.  One instance may access 
data in the other instances, as long as it does not cause a dependence.  
The following example is an independent block:
                                                            \CODE
      FORALL (i = 1:N)
!HPF$BEGIN INDEPENDENT
        x(i) = a(i-1)
        y(i-1) = a(i+1)
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC

\subsubsection{Statements that can appear in FORALL}

FORALL statements, WHERE-ELSEWHERE statements, some intrinsic functions 
(and possibly elemental functions and subroutines) can appear in the
FORALL:
\begin{enumerate}
\item FORALL statement
                                                            \CODE
      FORALL (I = 1 : N)
        A(I,0) = A(I-1,0)
        FORALL (J = 1 : N)
!HPF$BEGIN INDEPENDENT
          A(I,J) = A(I,0) + B(I-1,J-1)
          C(I,J) = A(I,J)
!HPF$END INDEPENDENT
        END FORALL
      END FORALL
                                                            \EDOC

\item WHERE
                                                            \CODE
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
\end{enumerate}


\subsubsection{Rationale}

\begin{enumerate}
\item A FORALL with a single asynchronous sub-block as shown below is 
the same as a do independent (or doall, or doeach, or parallel do, etc.).
                                                            \CODE
      FORALL (e)
!HPF$BEGIN INDEPENDENT
        b1
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
A FORALL without any INDEPENDENT directive is the same as a tightly 
synchronized FORALL.  We only need to define one type of parallel 
constructs including both synchronized and asynchronous blocks.  
Furthermore, combining asynchronous and synchronized FORALLs, we 
have a loosely synchronized FORALL which is more flexible for many 
loosely synchronous applications.

\item With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  The INDEPENDENT directives can act 
as barrier synchronizations.  One may suggest a smart compiler 
that can recognize dependences and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependences.  
When the compiler cannot determine whether there is a dependence, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.
\end{enumerate}


\end{document}


From @ecs.soton.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Thu Oct 15 16:21:37 1992
Received: from sun2.nsfnet-relay.ac.uk by titan.cs.rice.edu (AB20067); Thu, 15 Oct 92 16:21:37 CDT
Via: uk.ac.southampton.ecs; Thu, 15 Oct 1992 20:35:55 +0100
Via: camra.ecs.soton.ac.uk; Thu, 15 Oct 92 20:00:32 BST
From: John Merlin <jhm@ecs.soton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Thu, 15 Oct 92 20:08:19 BST
Date: Thu, 15 Oct 92 20:04:39 BST
Message-Id: <9012.9210151904@bacchus.ecs.soton.ac.uk>
To: hpff-forall@cs.rice.edu
Subject: comments on FORALL draft

I have some comments and corrections on sections 2 & 3 of the latest 
draft of the HPF FORALL chapter.  (I don't have any on the 'Pure 
procedures' section, of course, which is already perfect! :-)).  
Here goes...

-----------------------
>                                                                        \BNF
> forall-assignment    \IS array-element = expr
>                      \OR array-element => target
>                      \OR array-section = expr
>                                                                        \FNB

Rather than introduce the new grammar symbol 'forall-assignment'
(which should be 'forall-assignment-stmt' anyway) I wonder if it 
wouldn't be better just to use the existing F90 grammar symbols 
'assignment-stmt' and 'pointer-assignment-stmt', along with constraints 
that the 'variable' or 'pointer-object' must be subscripted, etc. etc.  
That's the way that F90 handles the assignments in a 'where-stmt' and 
'where-construct'.

There are problems with these rules as presented.  Firstly, they
disallow a 'forall-assignment' like:

	A(I) % B = ...

which I presume you want to allow (note that the variable here belongs
to the syntactic class 'structure-component', not 'array-element'
--see pp 62-64 of the standard).
Also,	

>                      \OR array-element => target

is illegal F90 (the lhs must be a pointer, but p62 of the std says: 
"an array element...never has the POINTER attribute").  Probably what 
you want is:

		structure-component => target

with the constraints that 'structure-component' must be subscripted,
blah, blah, which will permit the example given later:

> ! Set up a butterfly pattern
> FORALL (J=1:N)  A(J)%P => B(1+IEOR(J-1,2**K))

(Actually, I'm not convinced that pointer assignment should be here at 
all, since it has such limited use and probably very limited scope for
concurrency).

BTW, if you drop 'forall-assignment' in favour of 'assignment-stmt'
and 'pointer-assignment-stmt', you can remove all these constraints:

> \noindent
> Constraint: In the cases of simple assignment, the {\it array-element} and 
> {\it expr} have the same constraints as the {\it variable} and {\it expr} 
> in an {\it assignment-stmt}.
> 
> \noindent
> Constraint: In the case of pointer assignment, the {\it array-element} 
> and {\it target} have the same constraints as the {\it pointer-object} 
> and {\it target}, respectively, in a {\it pointer-assignment-stmt}.
> 
> \noindent
> Constraint: In the cases of array section assignment, the {\it 
> array-section} and 
> {\it expr} have the same constraints as the {\it variable} and {\it expr} 
> in an {\it assignment-stmt}.

---------------------

> For each subscript name in the {\it forall-assignment}, the set of
> permitted values is determined on entry to the statement and is
> \[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
> 1}{m3} \rfloor  \]
> and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
> subscript, the second subscript, and the stride respectively in the
> {\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
> were present with a value of the integer 1.  The expression {\it
> stride} must not have the value 0.  If for some subscript name 
> \(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
> forall-assignment} is not executed.

I believe the expression should be:

	\lfloor (m2 -m1 + m3) / m3 \rfloor
not
	\lfloor (m2 -m1 + 1) / m3 \rfloor

---------------------

> A ``pure'' function is defined in Section~\ref{forall-pure}; the
> intuition is that a pure function cannot have side effects.

I would prefer something like:

"A ``pure'' function is defined in Section.. and is guarnateed not to
have side effects".

(sounds more confident).

---------------------

> Note that if a function called in a FORALL is declared PURE, then it 
> is impossible for that function's evaluation to affect other expressions' 
> evaluations, either for the same combination of 
> {\it subscript-name} values or for a different combination.
> In addition, it is possible that the compiler can perform 
> more extensive optimizations when all functions are declared PURE.

This seems to suggest that it's optional for functions to be PURE
in FORALL.  If the pure proposal is accepted, it's better to say 
something like:

"Since all functions called in FORALL must be pure, ..."

---------------------

> \subsection{Scalarization of the FORALL Statement}
> 
>...

In the scalarisation of the 'forall-stmt', it's claimed that:

> !               ... (it is safe to avoid saving the subscript 
> !expressions because of the conditions on FORALL expressions)

but I thought that the following was allowed:

	FORALL (I=...)  J(J(I)) = ...

If so, then the subscript expressions must be saved.

Also, if all functions in FORALL must be pure then a big simplification 
is possible, since the 'mask', 'rhs' and the subscript expressions can all 
be evaluated concurrently (except that for a given index, 'mask' must 
precede 'rhs' and 'e1,..,em').  This is because all expressions are 
guaranteed side-effect free; the only 'side-effect' of a forall is 
actually performing the assignment.

Therefore I suggest that the scalarisation could be simplified along the 
following lines:

>                                                                    \CODE
> ! In any order, evaluate subscript and stride expressions,
> ! the scalar mask expression, and, for all valid combinations of
> ! subscript names for which the scalar mask expression is true, 
> ! the 'rhs' expression and lhs subscript expressions.
> 
> templ1 = l1
> tempu1 = u1
> temps1 = s1
> templ2 = l2
> tempu2 = u2
> temps2 = s2
>   ...
> templn = ln
> tempun = un
> tempsn = sn
> DO v1=l1,u1,s1
>   DO v2=l2,u2,s2
>     ...
>       DO vn=ln,un,sn
>         tempmask(v1,v2,...,vn) = mask
>         IF (tempmask(v1,v2,...,vn)) THEN
>           temprhs(v1,v2,...,vn) = rhs
>           tempe1(v1,v2,...,vn) = e1
>           tempem(v1,v2,...,vn) = em
>         END IF
>       END DO
> 	  ...
>   END DO
> END DO
> 
> ! Then perform the assignment of these values to the corresponding 
> ! elements of the array being assigned to
> 
> DO v1=templ1,tempu1,temps1
>   DO v2=templ2,tempu2,temps2
>     ...
>       DO vn=templn,tempun,tempsn
>         IF (tempmask(v1,v2,...,vn)) THEN
>           a(tempe1(v1,..,vn), ..,tempem(v1,..,vn)) = temprhs(v1,v2,...,vn)
>         END IF
>       END DO
> 	  ...
>   END DO
> END DO
>                                                                       \EDOC

(In the first set of DO-loops I've used 'l1' rather than 'templ1', etc, 
for the bounds to avoid implying any order between the evaluation
of the subscript and stride expressions and the mask, rhs, etc,
although of course it's inefficient to evaluate 'l1', etc, twice.)

The same simplifications (and necessity of saving array element subscripts)
apply to subsequent sequentialisations.

-----------------------

> \subsection{General Form of the FORALL Construct}

contains the constraint:

> Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
> within a {\it forall-construct}, then the inner FORALL may not redefine 
> any {\it subscript-name} used in the outer {\it forall-construct}.
> This rule applies recursively in the event of multiple nesting levels.

If functions in FORALL are pure and side-effect free, then I think
this constraint is redundant; the only 'side-effects' of the inner
FORALL are assignment to array elements, so the 'subscript-name's
cannot be redefined.
(Assuming the 'subscript-name' isn't equivalenced to an array element,
that is - on second thoughts, perhaps it's better to keep this constraint 
to cover this case!)

> For each subscript name in the {\it forall-assignment}s, the set of
> permitted values is determined on entry to the construct and is
> \[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
> 1)}{m3} \rfloor  \]
> and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
> subscript, the second subscript, and the stride respectively in the
> {\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
> were present with a value of the integer 1.  The expression {\it
> stride} must not have the value 0.  If for some subscript name 
> \(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
> forall-assignment}s are not  executed.

As before, I think the expression should be ((m2-m1+m3)/m3)

-----------------------

In
> \subsection{Interpretation of the FORALL Construct}

item 3(a):

> \item Assignment statements, pointer assignment statements, and array
> assignment statements (i.e.
> statements in the {\it forall-assignment} category) evaluate the 
> right-hand side {\it expr} and any left-and side subscripts for all 
                                          ^^^
> active {\it subscript-name} values,
> then assign those results to the corresponding left-hand side references.
              ^^^^^^^^^^^^^
              the results of {\it expr\/}

item 3(b):

> \item WHERE statements evaluate their {\it mask-expr} for all active 
                        ^
                        and constructs?

-----------------------

> A single assignment or array assignment statement in a {\it 
> forall-construct} must obey the same restrictions as a {\it 
> forall-assignment} in a simple {\it forall-stmt}.

Perhaps simpler to say:

A {\it forall-assignment} must obey the same restrictions in a 
{\it forall-construct} as in a simple {\it forall-stmt}.

-----------------------
In the example scalarisation:

> FORALL ( v1=l1:u1:s1, mask )
>   WHERE ( mask2(l2:u2) )
>     a(vi,l2:u2) = rhs1
        ^^
        'e1' would be more general (and I presume you meant 'v1').
>   ELSEWHERE
>     a(vi,l2:u2) = rhs2
        ^^
        likewise, 'e2' would be more general.
>   END WHERE
> END FORALL

-----------------------
Other tiny editorial matters:

> A {\it forall-construct} othe form:
                           ^

> where each si is an assignment is equivalent to the following scalar code:
             ^
             {\bf si}, or perhaps $s_{i}$ (and again later).

> \item In general, any expression in a FORALL is evaluated only for valid 
> combinations of all surrounding subscript names for which all the
> scalar mask eressions are true.
               ^

From chk@cs.rice.edu  Thu Oct 15 18:43:49 1992
Received: from  by titan.cs.rice.edu (AB23737); Thu, 15 Oct 92 18:43:49 CDT
Message-Id: <9210152343.AB23737@titan.cs.rice.edu>
Date: Thu, 15 Oct 1992 18:48:32 -0600
To: presberg@tc.cornell.edu, Tin-Fook Ngai <ngai@hpltfn.hpl.hp.com>
From: chk@cs.rice.edu
Subject: Re: "Revised proposal on EXECUTE-ON..." by Ngai, versus that in "nugatory"
Cc: chk@cs.rice.edu, presberg@tc.cornell.edu, hpff-core@cs.rice.edu,
        hpff-forall@cs.rice.edu

Let me just add that I'll be sending out a revised draft (to hpff-forall
and hpff-core) as soon as the last promised proposal reaches me.  My
apologies for the confusion as well.

                                                Chuck


From chk@cs.rice.edu  Thu Oct 15 19:17:41 1992
Received: from moe.rice.edu by titan.cs.rice.edu (AA24128); Thu, 15 Oct 92 19:17:41 CDT
Received: from titan.cs.rice.edu (cs.rice.edu) by moe.rice.edu (AA22531); Thu, 15 Oct 92 19:17:38 CDT
Received: from DialupEudora (charon.rice.edu) by titan.cs.rice.edu (AA24120); Thu, 15 Oct 92 19:17:03 CDT
Message-Id: <9210160017.AA24120@titan.cs.rice.edu>
Date: Thu, 15 Oct 1992 19:20:20 -0600
To: hpff-forall@rice.edu, John Merlin <jhm@ecs.soton.ac.uk>
From: chk@cs.rice.edu
Subject: Re: comments on FORALL draft


Will incorporate your suggestions, except as noted below.

At 20:04 10/15/92 +0000, John Merlin wrote:
>
>-----------------------
>>                                                                        \BNF
>> forall-assignment    \IS array-element = expr
>>                      \OR array-element => target
>>                      \OR array-section = expr
>>                                                                        \FNB
>
>Rather than introduce the new grammar symbol 'forall-assignment'
>(which should be 'forall-assignment-stmt' anyway) I wonder if it 
>wouldn't be better just to use the existing F90 grammar symbols 
>'assignment-stmt' and 'pointer-assignment-stmt', along with constraints 
>that the 'variable' or 'pointer-object' must be subscripted, etc. etc.  
>That's the way that F90 handles the assignments in a 'where-stmt' and 
>'where-construct'.

A more radical suggestion occurs to me: 

Constraint: The variable of an assignment-stmt must refer to distinct data
objects for all active combinations of subcript-name values.

Constraint: The pointer-object of a pointer-assignment-stmt must refer to
distinct data objects for all active combinations of subscript-name values.

Someone with a good cross-reference into the F90 standard check whether
"refer to distinct data objects" are the right words - my intent is that
the objects (or subobjects, like array sections) named on the left-hand
sides should be disjoint (nonoverlapping) sections of memory.

Reasoning - John's right that I was picking the wrong syntactic categories.
 This is an attempt to make the condition as general as possible, and tie
it to behavior rather than syntax.  For example, if we wanted to define
PRIVATE variables within the scope of nested FORALLs (JUST AN EXAMPLE, NOT
A PROPOSAL!), then presumably the variable would refer to a different
object for each "iteration" - and the condition allows that.  On the other
hand, a normal scalar certainly does not refer to different objects for
different FORALL indices and thus is disallowed on the left-hand side.

>(Actually, I'm not convinced that pointer assignment should be here at 
>all, since it has such limited use and probably very limited scope for
>concurrency).

Why didn't you say so earlier?  Your vote might have carried the election
(which was tied 0-0 going into the last meeting)!  Sorry, it's political
season here in the states...

>
>> For each subscript name in the {\it forall-assignment}, the set of
>> permitted values is determined on entry to the statement and is
>> \[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
>> 1}{m3} \rfloor  \]
>> and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
>> subscript, the second subscript, and the stride respectively in the
>> {\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
>> were present with a value of the integer 1.  The expression {\it
>> stride} must not have the value 0.  If for some subscript name 
>> \(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
>> forall-assignment} is not executed.
>
>I believe the expression should be:
>
>        \lfloor (m2 -m1 + m3) / m3 \rfloor
>not
>        \lfloor (m2 -m1 + 1) / m3 \rfloor

How did this make it through 3 drafts before someone caught it?
      
>> \subsection{Scalarization of the FORALL Statement}
>
>In the scalarisation of the 'forall-stmt', it's claimed that:
>
>> !               ... (it is safe to avoid saving the subscript 
>> !expressions because of the conditions on FORALL expressions)
>
>but I thought that the following was allowed:
>
>        FORALL (I=...)  J(J(I)) = ...
>
>If so, then the subscript expressions must be saved.

Correct; I oversimplified when side effects were eliminated.

>Also, if all functions in FORALL must be pure then a big simplification 
>is possible, since the 'mask', 'rhs' and the subscript expressions can all 
>be evaluated concurrently (except that for a given index, 'mask' must 
>precede 'rhs' and 'e1,..,em').  This is because all expressions are 
>guaranteed side-effect free; the only 'side-effect' of a forall is 
>actually performing the assignment.

Also correct.

However, the bounds and stride expressions must be evaluated first in any
case, since evaluation of the mask, rhs, and subscripts depends on them
(unless you can explain how to fill in an array section without knowing its
bounds?).  I'll therefore use the temporaries in the scalarization instead
of the actual expressions.

>> \subsection{General Form of the FORALL Construct}
>
>contains the constraint:
>
>> Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
>> within a {\it forall-construct}, then the inner FORALL may not redefine 
>> any {\it subscript-name} used in the outer {\it forall-construct}.
>> This rule applies recursively in the event of multiple nesting levels.
>
>If functions in FORALL are pure and side-effect free, then I think
>this constraint is redundant; the only 'side-effects' of the inner
>FORALL are assignment to array elements, so the 'subscript-name's
>cannot be redefined.

The constraint was there to disallow
        FORALL ( i = 1 : n )
            FORALL ( i = 2 : m )
                A(i) = 0.0
            END FORALL
        END FORALL

                                                Chuck


From zrlp09@trc.amoco.com  Fri Oct 16 10:23:59 1992
Received: from moe.rice.edu by titan.cs.rice.edu (AA02401); Fri, 16 Oct 92 10:23:59 CDT
Received: from noc.msc.edu by moe.rice.edu (AA27939); Fri, 16 Oct 92 10:23:58 CDT
Received: from uc.msc.edu by noc.msc.edu (5.65/MSC/v3.0.1(920324))
	id AA21503; Fri, 16 Oct 92 10:23:56 -0500
Received: from [129.230.11.2] by uc.msc.edu (5.65/MSC/v3.0z(901212))
	id AA14470; Fri, 16 Oct 92 10:23:55 -0500
Received: from trc.amoco.com (apctrc.trc.amoco.com) by netserv2 (4.1/SMI-4.0)
	id AA28791; Fri, 16 Oct 92 10:23:10 CDT
Received: from backus.trc.amoco.com by trc.amoco.com (4.1/SMI-4.1)
	id AA09068; Fri, 16 Oct 92 10:23:08 CDT
Received: from mahler.trc.amoco.com by backus.trc.amoco.com (4.1/SMI-4.1)
	id AA08166; Fri, 16 Oct 92 10:23:07 CDT
Received: from localhost by mahler.trc.amoco.com (4.1/SMI-4.1)
	id AA01554; Fri, 16 Oct 92 10:23:07 CDT
Message-Id: <9210161523.AA01554@mahler.trc.amoco.com>
To: chk@cs.rice.edu
Cc: hpff-forall@rice.edu
Subject: Re: comments on FORALL draft 
In-Reply-To: Your message of Thu, 15 Oct 92 19:20:20 -0600.
             <9210160017.AA24120@titan.cs.rice.edu> 
Date: Fri, 16 Oct 92 10:23:06 -0500
From: "Rex Page" <zrlp09@trc.amoco.com>

About forall constraints, Chuck wrote:

> A more radical suggestion occurs to me: 

> Constraint: The variable of an assignment-stmt must refer to distinct data
> objects for all active combinations of subcript-name values.

> Constraint: The pointer-object of a pointer-assignment-stmt must refer to
> distinct data objects for all active combinations of subscript-name values.

> Someone with a good cross-reference into the F90 standard check whether
> "refer to distinct data objects" are the right words - my intent is that
> the objects (or subobjects, like array sections) named on the left-hand
> sides should be disjoint (nonoverlapping) sections of memory.

I would regard A(1:10) and A(2:10) as distinct data objects, but they
have some subobjects in common.  This, unfortunately, complicates the
wording of the first constraint.

Something like the following might work:

  Constraint: Variables defined in an assignment statement corresponding
  to different active values of subscript names from the forall triplet
  specification must be distinct and must have no subobjects in common.

(Note: The phrase "must be distinct and" is redundant, strictly speaking.)

The problem doesn't come up with pointer objects in pointer assignment
statements because they must be scalars of type POINTER, and such scalars
have no subobjects.  So, the constraint on pointer assignments is ok as
written.  Still, it might be a good idea to be clear about what pointer
object in the pointer assignment is being restricted.  (The target of the
assignment can also be a pointer object, but targets need not be distinct.)

Something like this, perhaps:

  Constraint:  Pointer objects defined in a pointer assignment statement
  corresponding to different active values of subscript names from the
  forall triplet specification must be distinct.


Rex Page

From chk@cs.rice.edu  Sun Oct 18 23:59:17 1992
Received: from moe.rice.edu by titan.cs.rice.edu (AA27379); Sun, 18 Oct 92 23:59:17 CDT
Received: from titan.cs.rice.edu (cs.rice.edu) by moe.rice.edu (AA22018); Sun, 18 Oct 92 23:59:13 CDT
Received: from DialupEudora (charon.rice.edu) by titan.cs.rice.edu (AA27059); Sun, 18 Oct 92 23:43:17 CDT
Message-Id: <9210190443.AA27059@titan.cs.rice.edu>
Date: Sun, 18 Oct 1992 23:47:00 -0600
To: hpff-forall@rice.edu, hpff-core@cs.rice.edu
From: chk@cs.rice.edu
Subject: New proposal
X-Attachments: :Macintosh HD:3737:haupt.tex:

I just received a new proposal from Syracuse for changing/expanding the
FORALL and INDEPENDENT parts of HPF.  For a number of reasons (starting
with wanting to get some sleep tonight), I can't incorporate them into the
HPF draft by the deadline tomorrow morning.  Instead, I am circulating them
to these lists.  Copies will be available at the HPFF meeting Wednesday,
but the "official" FORALL/INDEPENDENT chapter will be the one I send out
later tonight (or early tomorrow morning, depending on how fast I tpe).

                                                Chuck
%chapter-head.tex

%Version of August 5, 1992 - David Loveman, Digital Equipment Corporation

\documentstyle[twoside,11pt]{report}
\pagestyle{headings}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin=.25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin=-.5in
\textwidth=6.0in
\textheight=9.0in
\parindent=2em

%the file syntax-macs.tex is physically included below

%syntax-macs.tex

%Version of July 29, 1992 - Guy Steele, Thinking Machines

\newdimen\bnfalign         \bnfalign=2in
\newdimen\bnfopwidth       \bnfopwidth=.3in
\newdimen\bnfindent        \bnfindent=.2in
\newdimen\bnfsep           \bnfsep=6pt
\newdimen\bnfmargin        \bnfmargin=0.5in
\newdimen\codemargin       \codemargin=0.5in
\newdimen\intrinsicmargin  \intrinsicmargin=3em
\newdimen\casemargin       \casemargin=0.75in
\newdimen\argumentmargin   \argumentmargin=1.8in

\def\IT{\it}
\def\RM{\rm}
\let\CHAR=\char
\let\CATCODE=\catcode
\let\DEF=\def
\let\GLOBAL=\global
\let\RELAX=\relax
\let\BEGIN=\begin
\let\END=\end


\def\FUNNYCHARACTIVE{\CATCODE`\a=13 \CATCODE`\b=13 \CATCODE`\c=13 \CATCODE`\d=13
                     \CATCODE`\e=13 \CATCODE`\f=13 \CATCODE`\g=13 \CATCODE`\h=13
                     \CATCODE`\i=13 \CATCODE`\j=13 \CATCODE`\k=13 \CATCODE`\l=13
                     \CATCODE`\m=13 \CATCODE`\n=13 \CATCODE`\o=13 \CATCODE`\p=13
                     \CATCODE`\q=13 \CATCODE`\r=13 \CATCODE`\s=13 \CATCODE`\t=13
                     \CATCODE`\u=13 \CATCODE`\v=13 \CATCODE`\w=13 \CATCODE`\x=13
                     \CATCODE`\y=13 \CATCODE`\z=13 \CATCODE`\[=13 \CATCODE`\]=13
                     \CATCODE`\-=13}

\def\RETURNACTIVE{\CATCODE`\
=13}

\makeatletter
\def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus 
 -.2ex}{2.3ex plus .2ex}{\large\sf}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-3.25ex plus -1ex minus 
 -.2ex}{1.5ex plus .2ex}{\large\sf}}
\def\alternative#1 #2#3{\def\@tempa{#1}\def\@tempb{A}\ifx\@tempa\@tempb\else
    \expandafter\@altbumpdown\string#2\@foo\fi
    #2{Version #1: #3}}
\def\@altbumpdown#1#2\@foo{\global\expandafter\advance\csname c@#2\endcsname-1}

\def\@ifpa#1#2{\let\@tempe\par \def\@tempa{#1}\def\@tempb{#2}\futurelet
    \@tempc\@ifnch}

\def\?#1.{\begingroup\def\@tempq{#1}\list{}{\leftmargin\intrinsicmargin}\relax
  \item[]{\bf\@tempq.} \@intrinsictest}
\def\@intrinsictest{\@ifpar{\@intrinsicpar\@intrinsicdesc}{\@intrinsicpar\re
lax}}
\long\def\@intrinsicdesc#1{\list{}{\relax
  \def\@tempb{ Arguments}\ifx\@tempq\@tempb
                          \leftmargin\argumentmargin
                          \else \leftmargin\casemargin \fi
  \labelwidth\leftmargin  \advance\labelwidth -\labelsep
  \parsep 4pt plus 2pt minus 1pt
  \let\makelabel\@intrinsiclabel}#1\endlist}
\long\def\@intrinsicpar#1#2\\{#1{#2}\@ifstar{\@intrinsictest}{\endlist\endgr
oup}}
\def\@intrinsiclabel#1{\setbox0=\hbox{\rm #1}\ifnum\wd0>\labelwidth
  \box0 \else \hbox to \labelwidth{\box0\hfill}\fi}
\def\Case(#1):{\item[{\it Case (#1):}]}
\def\ {\@ifnextchar({\def\@tempq{#1}\@intrinsicopt}{\item[#1]}}
\def\@intrinsicopt(#1){\item[{\@tempq} (#1)]}

\def\MATRIX#1{\relax
    \@ifnextchar,{\@MATRIXTABS{}#1,\@FOO, \hskip0pt plus
1filll\penalty-1\@gobble
  }{\@ifnextchar;{\@MATRIXTABS{}#1,\@FOO; \hskip0pt plus
1filll\penalty-1\@gobble
  }{\@ifnextchar:{\@MATRIXTABS{}#1,\@FOO: \hskip0pt plus
1filll\penalty-1\@gobble
  }{\@ifnextchar.{\hfill\penalty1\null\penalty10000\hskip0pt plus 1filll
                  \@MATRIXTABS{}#1,\@FOO.\penalty-50\@gobble
  }{\@MATRIXTABS{}#1,\@FOO{ }\hskip0pt plus 1filll\penalty-1}}}}}

\def\@MATRIXTABS#1#2,{\@ifnextchar\@FOO{\@MATRIX{#1#2}}{\@MATRIXTABS{#1#2&}}}
\def\@MATRIX#1\@FOO{\(\left[\begin{array}{rrrrrrrrrr}#1\end{array}\right]\)}

\def\@IFSPACEORRETURNNEXT#1#2{\def\@tempa{#1}\def\@tempb{#2}\futurelet\@temp
c\@ifspnx}

{
\FUNNYCHARACTIVE
\GLOBAL\DEF\FUNNYCHARDEF{\RELAX
    \DEFa{{\IT\CHAR"61}}\DEFb{{\IT\CHAR"62}}\DEFc{{\IT\CHAR"63}}\RELAX
    \DEFd{{\IT\CHAR"64}}\DEFe{{\IT\CHAR"65}}\DEFf{{\IT\CHAR"66}}\RELAX
    \DEFg{{\IT\CHAR"67}}\DEFh{{\IT\CHAR"68}}\DEFi{{\IT\CHAR"69}}\RELAX
    \DEFj{{\IT\CHAR"6A}}\DEFk{{\IT\CHAR"6B}}\DEFl{{\IT\CHAR"6C}}\RELAX
    \DEFm{{\IT\CHAR"6D}}\DEFn{{\IT\CHAR"6E}}\DEFo{{\IT\CHAR"6F}}\RELAX
    \DEFp{{\IT\CHAR"70}}\DEFq{{\IT\CHAR"71}}\DEFr{{\IT\CHAR"72}}\RELAX
    \DEFs{{\IT\CHAR"73}}\DEFt{{\IT\CHAR"74}}\DEFu{{\IT\CHAR"75}}\RELAX
    \DEFv{{\IT\CHAR"76}}\DEFw{{\IT\CHAR"77}}\DEFx{{\IT\CHAR"78}}\RELAX
    \DEFy{{\IT\CHAR"79}}\DEFz{{\IT\CHAR"7A}}\DEF[{{\RM\CHAR"5B}}\RELAX
    \DEF]{{\RM\CHAR"5D}}\DEF-{\@IFSPACEORRETURNNEXT{{\CHAR"2D}}{{\IT\CHAR"2D}}}}
}

%%% Warning!  Devious return-character machinations in the next several lines!
%%%           Don't even *breathe* on these macros!
{\RETURNACTIVE\global\def\RETURNDEF{\def
{\@ifnextchar\FNB{}{\@stopline\@ifnextchar
{\@NEWBNFRULE}{\penalty\@M\@startline\ignorespaces}}}}\global\def\@NEWBNFRULE
{\vskip\bnfsep\@startline\ignorespaces}\global\def\@ifspnx{\ifx\@tempc\@spto
ken \let\@tempd\@tempa \else \ifx\@tempc
\let\@tempd\@tempa \else \let\@tempd\@tempb \fi\fi \@tempd}}
%%% End of bizarro return-character machinations.

\def\IS{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth  \hskip-\bnfalign
  \hbox to \bnfalign{\unhbox\@curfield\hfill}\hbox to \bnfopwidth{\bf is
\hfill}}
\def\OR{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth \hbox to \bnfopwidth{\bf or \hfill}}
\def\R#1 {\hbox to 0pt{\hskip-\bnfmargin R#1\hfill}}
\def\XBNF{\FUNNYCHARDEF\FUNNYCHARACTIVE\RETURNDEF\RETURNACTIVE
  \def\@underbarchar{{\char"5F}}\tt\frenchspacing
  \advance\@totalleftmargin\bnfmargin \tabbing
  \hskip\bnfalign\hskip\bnfopwidth\hskip\bnfindent\=\kill\>\+\@gobblecr}
\def\endXBNF{\-\endtabbing}

\def\BNF{\BEGIN{XBNF}}
\def\FNB{\END{XBNF}}

\begingroup \catcode `|=0 \catcode`\\=12
|gdef|@XCODE#1\EDOC{#1|endtrivlist|end{tt}}
|endgroup

\def\CODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces \@XCODE}
\def\ICODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces
   \FUNNYCHARDEF\FUNNYCHARACTIVE \UNDERBARACTIVE\UNDERBARDEF \@XCODE}

\def\@underbarsub#1{{\ifmmode _{#1}\else {$_{#1}$}\fi}}
\let\@underbarchar\_
\def\@underbar{\let\@tempq\@underbarsub\if\@tempz A\let\@tempq\@underbarchar\fi
  \if\@tempz B\let\@tempq\@underbarchar\fi\if\@tempz
C\let\@tempq\@underbarchar\fi
  \if\@tempz D\let\@tempq\@underbarchar\fi\if\@tempz
E\let\@tempq\@underbarchar\fi
  \if\@tempz F\let\@tempq\@underbarchar\fi\if\@tempz
G\let\@tempq\@underbarchar\fi
  \if\@tempz H\let\@tempq\@underbarchar\fi\if\@tempz
I\let\@tempq\@underbarchar\fi
  \if\@tempz J\let\@tempq\@underbarchar\fi\if\@tempz
K\let\@tempq\@underbarchar\fi
  \if\@tempz L\let\@tempq\@underbarchar\fi\if\@tempz
M\let\@tempq\@underbarchar\fi
  \if\@tempz N\let\@tempq\@underbarchar\fi\if\@tempz
O\let\@tempq\@underbarchar\fi
  \if\@tempz P\let\@tempq\@underbarchar\fi\if\@tempz
Q\let\@tempq\@underbarchar\fi
  \if\@tempz R\let\@tempq\@underbarchar\fi\if\@tempz
S\let\@tempq\@underbarchar\fi
  \if\@tempz T\let\@tempq\@underbarchar\fi\if\@tempz
U\let\@tempq\@underbarchar\fi
  \if\@tempz V\let\@tempq\@underbarchar\fi\if\@tempz
W\let\@tempq\@underbarchar\fi
  \if\@tempz X\let\@tempq\@underbarchar\fi\if\@tempz
Y\let\@tempq\@underbarchar\fi
  \if\@tempz Z\let\@tempq\@underbarchar\fi\@tempq}
\def\@under{\futurelet\@tempz\@underbar}

\def\UNDERBARACTIVE{\CATCODE`\_=13}
\UNDERBARACTIVE
\def\UNDERBARDEF{\def_{\protect\@under}}
\UNDERBARDEF

\catcode`\$=11  

%the folowing line would allow derived-type component references 
%FOO%BAR in running text, but not allow LaTeX comments
%without this line, write FOO\%BAR
%\catcode`\%=11 

\makeatother

%end of file syntax-macs.tex


\title{{\em D R A F T} \\High Performance Fortran \\ FORALL Proposal}
\author{G. Fox, T. Haupt, A. Choudhary, S. Ranka \\
Northeast Parallel Architectures Center\\
at Syracuse University, Syracuse, New York}
\date{October 15, 1992}

\hyphenation{RE-DIS-TRIB-UT-ABLE sub-script Wil-liam-son}

\begin{document}

\maketitle

\newpage

\pagenumbering{roman}

\vspace*{4.5in}

This is the result of a LaTeX run of a draft of a single chapter of 
the HPFF Final Report document.

\vspace*{3.0in}

\copyright 1992 Rice University, Houston Texas.  Permission to copy 
without fee all or part of this material is granted, provided the 
Rice University copyright notice and the title of this document 
appear, and notice is given that copying is by permission of Rice 
University.

\tableofcontents

\newpage

\pagenumbering{arabic}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%put text of chapter here


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%put \end{document} here
%statements.tex


%Revision history:
%August 2, 1992 - Original version of David Loveman, Digital Equipment
%       Corporation and Charles Koelbel, Rice University
%August 19, 1992 - chk - cleaned up discrepancies with Fortran 90 array 
%       expressions
%August 20, 1992 - chk - added DO INDEPENDENT section, Guy Steele's 
%       pointer proposals
%August 24, 1992 - chk - ELEMENTAL functions proposal
%August 31, 1992 - chk - PURE functions proposal
%September 3, 1992 - chk - reorganized sections
%September 21, 1992 - chk - began incorporating updates from Sept
%       10-11 meeting
%October 14, 1992 - chk - Incorporated ON and revised PURE 


\newenvironment{constraints}{
        \begin{list}{Constraint:}{
                \settowidth{\labelwidth}{Constraint:}
                \settowidth{\labelsep}{w}
                \settowidth{\leftmargin}{Constraint:w}
                \setlength{\rightmargin}{0cm}
        }
}{
        \end{list}
}


\chapter{Statements}
\label{statements}

\section{Overview}

\footnote{Version of October 15, 1992 
- G. Fox, T. Haupt, A. Choudhary, S. Ranka, Syracuse University.}
Programs written according to the current standard of Fortran, Fortran 90,
do not provide enough information to exploit fully capability of modern
architectures. In particular, detection of parallelism in fortran codes is 
usually very difficult. As a consequence, compilers are not able to 
generate efficient codes for machines, architecture of which support
concurrent execution. And although Fortran 90 includes some features which
make it possible to prepare codes to be run concurrently on some type of
architectures, notably array assignments on SIMD machines, in general, 
these features are not sufficient.

\subsection{Parallel Statements in Fortran 90}

Natural candidates for parallel execution are do loops, in cases where 
the are no loop carried data dependencies. The simplest example is an array 
assignment, introduced in Fortran 90. The array assignment, like that shown
below

                                                                     \CODE
    REAL, ARRAY(N) :: N
    ...
    A(I)=A(I-1)
    ...
                                                                     \EDOC
can be interpreted in terms of Fortran 77 syntax as a sequence of two
do loops: 

                                                                     \CODE
    DO I=1,N
    TEMP(I)=A(I-1)
    ENDDO

    DO I=1,N
    A(I)=TEMP(I)
    ENDDO
                                                                     \EDOC
Introduction of a temporary array TEMP is an implementation detail, actually
not necessary for many existing or future processor and/or system 
architectures. Nevertheless, it helps to understand the semantics of the
assignment. In particular, it is clearly seen that the resulting value of 
elements of the array A does not depend on the order in which they are 
assigned. This property of the array assignment makes it possible to
perform the assignments in parallel.

Masked array assignments, to assign only certain elements of one array
to another array, are introduced in Fortran 90 by WHERE statements and
WHERE construct. For example,


                                                                      \CODE
  EAL, ARRAY(N) :: A
    ...
    WHERE (A/=0) 
     RECIP_A=1.0/A
    ELSEWHERE
     RECIP_A=1.0
    ENDWHERE
                                                                     \EDOC
As an regular array assignment, the masked array assignment can be executed
in parallel.

The idea of array assignment is further extended by Fortran 90 elemental 
intrinsic functions. The simplest example of their use is

                                                                    \CODE
   REAL, ARRAY(N) :: A,X
   ...
   A=F(X)
   ...
                                                                    \EDOC
where F is one of 67 predefined intrinsics functions such as ABS, COS, 
LOG, etc. Semantics of this assignment is that each array element A(I) is
assigned a value of function F evaluated for the actual argument X(I). Each
invocation of F is independent of each other, regardless how complicated is
evaluation of the function value,  and therefore the function can be
called concurrently for different values of actual arguments.  

It has been demonstrated that a class of algorithms can be efficiently
implemented on parallel machines, in particular SIMD, using exclusively
Fortran 90 parallel statements described above. Nevertheless, the Fortran-90
does not provide sufficient syntactical means to define parallelism. 
As a consequence, we propose to extend Fortran 90 standard by introducing 
a generalization of array assignment statement, FORALL statement and FORALL 
construct with possibility of invoking user defined procedures in elemental 
fashion, in a close analogy to Fortran 90 elemental intrinsics functions.

\subsection{Overview of Proposed Extensions}

\subsubsection{FORALL statement}
\footnote{Proposed earlier by D. Loveman (DEC) and Ch. Koelbel (Rice), and
approved at second reading on September 10, 1992}
Fortran 90 introduces several restrictions on array assignments, 
in particular, it require that operands of the right side expressions be 
conformable with the left hand side array. These restrictions can be
relaxed by introducing an explicit parallel loop construct FORALL. FORALL,
which essentially preserves semantics of Fortran 90 array assignments and it
allows for convenient assignments like

                                                                  \CODE
   REAL, ARRAY(N,M) :: A
   ...
   FORALL(I=1:N,J=1:N) A(I,J)=I+J

                                                                  \EDOC
as opposed to standard Fortran 90 
                                                                  \CODE
   REAL, ARRAY(N,M)    :: A,A1,A2
   INTEGER, ARRAY(N)   :: IMN
   INTEGER, ARRAY(M)   :: IMM

   IMN=[1..N]
   A1=SPREAD(IMN,DIM=2,NCOPIES=M)
   IMM=[1..N]
   A2=SPREAD(IMM,DIM=1,NCOPIES=N)
   A=A1+A2              
                                                                \EDOC
or, to give more examples of convenience of FORALL statements
                                                                  \CODE
   REAL, ARRAY(N,M) :: B
   LOGICAL, ARRAY(N):: MASK
   REAL, ARRAY(N)   :: X
   REAL, ARRAY(M)   :: Y
   ...
   FORALL(I=1:N,J=1:N,MASK.EQ.-1) A(I,J)=A(I,J)-X(I)*Y(J)
   ...
   
   REAL, ARRAY(N,N) :: C
   REAL, ARRAY(N)   :: D
   ...
   C=0
   FORALL(I=1:N,J=1:N, I.EQ.J) C(I,J)=D(I)
   ...  
                                                                  \EDOC
which are difficult or tedious to express in Fortran 90.
   
Determinism of results requires that all array elements on the left hand side
be assigned at most once, and each subscript must appear within the
subscript expression(s) on the left hand side of the assignment.

\subsubsection{FORALL-ELSEFORALL construct}


\footnote{This is an extension to previously proposed FORALL construct, as
defined in the draft 0.2 of the HPF proposal}
FORALL-ELSEFORALL construct is a natural generalization of Fortran 90 
WHERE-ELSEWHERE construct. Accepting it, the last example can be rewritten as

                                                                  \CODE
   REAL, ARRAY(N,N) :: C
   REAL, ARRAY(N)   :: D
   ...
   FORALL(I=1:N,J=1:N, I.EQ.J) 
         C(I,J)=D(I)
   ELSEFORALL
         C(I,J)=0
   ENDFORALL
   ...
                                                                  \EDOC
The intended semantics can be inferred by comparison of these exaple codes,
and a detailed description in given later. A construct proposed in previous
drafts of the HPF: 
                                                                  \CODE
   FORALL(I=1:N,J=1:N)
    WHERE(MASK)
      assignment
    ELSEWHERE
      assignment
    ENDWHERE
   ENDFORALL
                                                                  \EDOC
seems to introduce unnecessary limitations coming from limitations
of WHERE construct: the mask array must conform with the variables on the 
right side in all of the array assignment statements in the construct.

\subsubsection{Parallel Loops}
Next, a limitation that only single array assignment statements can appear
in FORALL may be relaxed. This, however, implies some assertions made by
programmer about data dependencies in a body of parallel statements, nicely
depictured by G. Steele (TMC) in figure 1.1 and 1.2. 

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Here come the pictures!
%

{

%length for use in pictures
\setlength{\unitlength}{0.03in}

%nodes used in all pictures
\newsavebox{\nodes}
\savebox{\nodes}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(50.0,100){\makebox(0,0){BEGIN}}
    \put(20.0,80.0){\makebox(0,0){rhsa(1)}}
    \put(50.0,80.0){\makebox(0,0){rhsa(2)}}
    \put(80.0,80.0){\makebox(0,0){rhsa(3)}}
    \put(20.0,60.0){\makebox(0,0){lhsa(1)}}
    \put(50.0,60.0){\makebox(0,0){lhsa(2)}}
    \put(80.0,60.0){\makebox(0,0){lhsa(3)}}
    \put(20.0,40.0){\makebox(0,0){rhsb(1)}}
    \put(50.0,40.0){\makebox(0,0){rhsb(2)}}
    \put(80.0,40.0){\makebox(0,0){rhsb(3)}}
    \put(20.0,20.0){\makebox(0,0){lhsb(1)}}
    \put(50.0,20.0){\makebox(0,0){lhsb(2)}}
    \put(80.0,20.0){\makebox(0,0){lhsb(3)}}
    \put(50.0,0){\makebox(0,0){END}}
    \put(50.0,100){\oval(25,5)}
    \put(20.0,80.0){\oval(20,5)}
    \put(50.0,80.0){\oval(20,5)}
    \put(80.0,80.0){\oval(20,5)}
    \put(20.0,60.0){\oval(20,5)}
    \put(50.0,60.0){\oval(20,5)}
    \put(80.0,60.0){\oval(20,5)}
    \put(20.0,40.0){\oval(20,5)}
    \put(50.0,40.0){\oval(20,5)}
    \put(80.0,40.0){\oval(20,5)}
    \put(20.0,20.0){\oval(20,5)}
    \put(50.0,20.0){\oval(20,5)}
    \put(80.0,20.0){\oval(20,5)}
    \put(50.0,0){\oval(25,5)}
    \put(50,97.5){\vector(-2,-1){30}}
    \put(50,97.5){\vector(0,-1){15}}
    \put(50,97.5){\vector(2,-1){30}}
    \put(20,17.5){\vector(2,-1){30}}
    \put(50,17.5){\vector(0,-1){15}}
    \put(80,17.5){\vector(-2,-1){30}}
    \end{picture}
}

\begin{figure}

\begin{minipage}{2.70in}
\CODE
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\web}
\savebox{\web}{
    \begin{picture}(60,15)(0,0)
    \put(0,15){\vector(0,-1){15}}
    \put(0,15){\vector(2,-1){30}}
    \put(0,15){\vector(4,-1){60}}
    \put(30,15){\vector(-2,-1){30}}
    \put(30,15){\vector(0,-1){15}}
    \put(30,15){\vector(2,-1){30}}
    \put(60,15){\vector(0,-1){15}}
    \put(60,15){\vector(-2,-1){30}}
    \put(60,15){\vector(-4,-1){60}}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,62.5){\usebox\web}
\put(20,42.5){\usebox\web}
\put(20,22.5){\usebox\web}
\end{picture}
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\chain}
\savebox{\chain}{
    \begin{picture}(20,70)(0,0)
    \put(2.5,2.5){\oval(5,5)[bl]}
    \put(2.5,0){\vector(1,0){5}}
    \put(7.5,2.5){\oval(5,5)[br]}
    \put(10,2.5){\vector(0,1){32.5}}
    \put(10,35){\line(0,1){32.5}}
    \put(12.5,67.5){\oval(5,5)[tl]}
    \put(12.5,70){\vector(1,0){5}}
    \put(17.5,67.5){\oval(5,5)[tr]}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,77.5){\vector(0,-1){15}}
\put(20,57.5){\vector(0,-1){15}}
\put(20,37.5){\vector(0,-1){15}}
\put(25,15){\usebox\chain}
\put(50,77.5){\vector(0,-1){15}}
\put(50,57.5){\vector(0,-1){15}}
\put(50,37.5){\vector(0,-1){15}}
\put(55,15){\usebox\chain}
\put(80,77.5){\vector(0,-1){15}}
\put(80,57.5){\vector(0,-1){15}}
\put(80,37.5){\vector(0,-1){15}}
\end{picture}
\end{minipage}

\caption{Dependences in DO and FORALL without
INDEPENDENT assertions}
\label{fig-dep}
\end{figure}

\begin{figure}

%Draw the picture once, use it twice
\newsavebox{\easy}
\savebox{\easy}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(10,-2.5){\usebox\nodes}
    \put(20,77.5){\vector(0,-1){15}}
    \put(20,57.5){\vector(0,-1){15}}
    \put(20,37.5){\vector(0,-1){15}}
    \put(50,77.5){\vector(0,-1){15}}
    \put(50,57.5){\vector(0,-1){15}}
    \put(50,37.5){\vector(0,-1){15}}
    \put(80,77.5){\vector(0,-1){15}}
    \put(80,57.5){\vector(0,-1){15}}
    \put(80,37.5){\vector(0,-1){15}}
    \end{picture}
}

\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\usebox\easy
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\usebox\easy
\end{minipage}

\caption{Dependences in DO and FORALL with
INDEPENDENT assertions}
\label{fig-indep}
\end{figure}

}

%
%
% End of pictures
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


In essence, 3 types of data dependencies are relevant in the context of this
proposal.

\begin{enumerate}
\item  `SEQUENTIAL DO LOOP' : the right hand sides values of depend on 
       computations made in other iterations of the loop: it is often 
       referred to as loop carried dependencies, and it is shown on right 
       hand side 
       of figure 1.1. In this case parallel execution is not possible, at
       least within the computational model adopted by HPF.
\item `FORALL LOOP': there are no loop carried dependencies, and assignments
        can be made simultaneously for each iterations. That is, processors 
        may be  synchronized after each assignment in the block. 
        This situation is shown on left hand side of fig.  1.1.
\item `INDEPENDENT LOOP':  there are no loop carried dependencies, but 
       operands of right hand side expressions may depend on results obtained
       earlier in the same iteration, as shown in fig. 1.2. In other words, 
       iterations of the loop can be performed independently from each other,
        however, the sequence of assignment in an iteration must not be 
       changed to preserve semantics of the loop. As a consequence, a 
       parallel execution is possible by assigning processors distinct 
       subsets of loop instantiations  and synchronization is forced after 
       all iterations are completed.   

\end{enumerate}
The semantics of these constructs cannot be adequately expressed in terms
of Fortran 90 syntax. It can be done using HPF extensions, however, a further
generalization of FORALL is necessary.

\subsubsection{General FORALL Construct}

\footnote{This is an extension of previously proposed FORALL construct, as
defined in the draft 0.2 of the HPF proposal}
The `FORALL LOOP' (case 2 above) can be expressed by the FORALL construct
                                                                    \CODE
    FORALL (I=1:M)
      lhs1=rhs1   
      ...
      lhsN=rhsN
    ENDFORALL
                                                                    \EDOC
which is just a shorthand for a tedious
                                                                    \CODE
    FORALL (I=1:M) lhs1=rhs1
    ...
    FORALL (I=1:M) lhsN=rhsN
                                                                    \EDOC
with an important semantical modifications of FORALL that evaluation of all 
valid combinations of loop indices is made
once before actual execution of the FORALL construct body, and it implies
an assertion of the programmer that data dependencies allow
for parallel execution with synchronization of processors after each 
assignment.
   
The full form of the FORALL construct
                                                                    \CODE
    FORALL (I=1:M, MASK)
      statement1
      ...
      statementN
    ELSEFORALL
      statementN+1
      ...
      statementsN+K
    ENDFORALL
                                                                    \EDOC
can be interpreted as a following sequence of opreations
\begin{enumerate}
\item determine a set of valid combinations of indices
\item evaluate mask expressions for all valid combination of indices
\item execute the block of statements (from statement1 through statementN)
      masked by MASK
\item execute the block of statetments (from statementN+1 through 
      statementN+K) masked by .NOT.MASK  
\end{enumerate}

\subsubsection{Procedures in FORALL statement/construct}

`INDEPENDENT LOOP' has been addressed in previous drafts of the HPF proposal
by G. Steele (TMC), Ch. Koelbel (Rice), and others. They proposed to
annotate a loop with that kind of data dependencies with the
                             \CODE
     !HPF$  INDEPENDENT
                             \EDOC
directive. Indeed, such a solution reflects the intended semantics. On the 
other hand, it introduces some syntactic challenges, and it tempts
to violate the HPF computational model by requesting private instances of 
variables. 

All that can be avoided by noticing that the body of the loop may be defined
as a procedure, with the loop index, say i, or an array  as (one of) dummy 
argument(s). By construction, each invocation of the function is independent 
from others (for other values of i or other element of the array). Now, since
each instantiation of the function creates its own, distinct set of local 
variables the problem of scope of the variables and determinism is 
solved. Thus the proposed solution for the independent loop is to use 
elementally invocated procedures as operands of the left hand side 
expression of the FORALL assignments. This can be 
illustrated by an example given by D. Loveman (DEC) and C. Koelbel (Rice) 
\footnote{draft 0.2 of the HPF proposal, page 34}:
                                              \CODE                     
                                        
       REAL A(1000)    
!HPF$  TEMPLATE(1000)
!HPF$  PROCESSORS P(10)
!HPF$  DISTRIBUTE T(BLOCK) ONTO P
!HPF$  ALIGN A(:) WITH T(:)

       ...
       FORALL(I=1:1000) A(I)=FOO(I)
       ...
       END

       FUNCTION F00(I)
       INTEGER I,J,K
       J=1
       K=I
       DO WHILE(K.GT.1)
          J=J+1
          IF(MOD(K,2).EQ.0) THEN
           K=K/2
          ELSE
           K=K*3+1
          ENDIF
       ENDDO
       FOO=K
       END
                                                                 \EDOC
as opposed to
                                                                 \CODE

       REAL A(1000)
       INTEGER I,J,K

       ...
!HPF$ INDEPENDENT
       DO I=1,1000
         J=1
         K=I
         DO WHILE(K.GT.1)
            J=J+1
            IF(MOD(K,2).EQ.0) THEN
             K=K/2
            ELSE
             K=K*3+1
            ENDIF
         ENDDO
         A(I)=K
       ENDDO
                                                                     \EDOC

By using FORALL with a function, whole chunk of Fortran 90 code is 
preserved (in practice, when dealing with various 'dusty decks' many parts of 
Fortran 77 codes can be saved this way), and local variables are used
in a way consistent with the HPF computational model.

In this way, it is possible to express conveniently majority of embarrassingly
parallel algorithms without introducing new syntactic features. However,
as in case of the FORALL construct, semantically, it implies assertions on
data dependencies. It is dsirable to annotate classes of functions and
subroutines which can be used in FORALL statements/constructs by a dedicated
HPF directives.

Procedures which can be invoked elementally in array assignments, beside
being explicitly annotated, must satisfy some restrictions. The basic idea
is that any instantiation of the procedure must not influence other
instantiations, that is, all invocation may be made in arbitrary order
without violating determinism of results. As a consequence, local variable
must not have SAVE attribute, must not (re)distribute nor (re)align global
data. The access to the global data must be restricted, too. There are two
possibilities of imposing restrictions of that kind, described in the 
following two sections.

\subsubsection{INDEPENDENT procedure}

\footnote{This is an extension to previously proposed PURE procedures by
 J. Merlin (Southampton) and Ch. Koelbel (Rice)}
The procedure does not assign to global data. In that case, there is no
restrictions on 'read only' access to global data, passed to it as
arguments with explicitly specified intent IN. Basically, it is equivalent
to the PURE procedure, defined in draft v0.2 of the HPF proposal. Here,
an extenstion of that idea is proposed.

An INDEPENDENT procedure may have scalar arguments declared with intent OUT, 
and an INDEPENDENT procedure may not produce side effects in the sense that
its execution is equivalent to execution of a single iteration of an
INDEPENDENT LOOP (c.f. fig. 1.2).

                                                                  \CODE     
        REAL, ARRAY(N) :: A,X       
        ...
        FORALL (I=1:N) A(I)=FOO(I,A,X(I))
        ...
                                                                  \EDOC
In this example, a global array A is passed to the function FOO, and the
function returns two scalar variables to be assigned to global arrays
A and X. According to the owner compute rule, evaluation of the function
is carried by the logical processor owning the element of the template,
which the array element A(I) is aligned to.  Sequentialization of
this FORALL assignment is

                                                                  \CODE
       DO I=1,N
        TEM1(I)=FOO(I,A,TEM2(I)) 
       ENDDO

       DO I=1,N
        A(I)=TEM1(I)
        X(I)=TEM2(I)
       ENDDO
                                                                 \EDOC

A directive !HPF$INDEPENDENT is used to annotate  INDEPENDENT procedures.

An example of usage of an INDEPENDENT function is given below:
                                                                 \CODE
        ...
      REAL, ARRAY(N,M) :: A,B
!HPF$ TEMPLATE FRED(N,M)
!HPF$ DISTRIBUTE FRED(BLOCK,BLOCK)
!HPF$ ALIGN A with FRED
!HPF$ ALIGN B with FRED
      ...
      INTERFACE
       REAL FUNCTION FOO(I,J,N,M,A,Y)
!HPF$   INDEPENDENT FOO
       INTENT (IN) :: I,J,N,M,A
       INTENT (OUT) :: Y
       REAL, ARRAY(N,M) :: A
!HPF$   TEMPLATE FELIX(N,M)
!HPF$   DISTRIBUTE FELIX *
!HPF$   ALIGN A * 
      END INTERFACE 
      ...
      DO I=1,NEVENTS
      FORALL (I=1:N,J=1:M) A(I,J)=FOO(I,J,N,M,A,B(I,J))
      IF(MAXVAL(B).LT.XLIMIT) EXIT
      ENDDO
     ...

      REAL FUNCTION FOO(I,J,N,M,A,Y)
!HPF$ INDEPENDENT FOO
      INTENT (IN) :: I,J,N,M,A
      INTENT (OUT) :: Y
      REAL, ARRAY(N,M) :: A
!HPF$ TEMPLATE FELIX(N,M)
!HPF$ DISTRIBUTE FELIX *
!HPF$ ALIGN A * 
      ...
      COUNT=1.0
      X=A(I,J)
      IF(I.GT.1) THEN
         X=X+A(I-1,J)
         COUNT=COUNT+1.0
      ENDIF
      IF(I.LT.N) THEN
         X=X+A(I+1,J)
         COUNT=COUNT+1.0
      ENDIF
      IF(J.GT.1) THEN
         X=X+A(I,J-1)
         COUNT=COUNT+1.0
      ENDIF
      IF(J.LT.M) THEN
         X=X+A(I,J+1)
         COUNT=COUNT+1.0
      ENDIF
      FOO=X/COUNT  
      Y=ABS(A(I,J),FOO)
      END
                                                                 \EDOC

\subsubsection{LOCAL procedures}

The procedure has access only to that part global data which are aligned to
the template element owned by a logical processor executing the procedure. 
For example

                                                                  \CODE
       REAL, ARRAY(N) :: A,B,C
!HPF$  TEMPLATE FRED(N)
!HPF$  ALIGN A with FRED
!HPF$  ALIGN B(I) with FRED(I+1)
!HPF$  ALIGN C with FRED
!HPF$  DISTRIBUTE FRED(CYCLIC)
       INTERFACE
         SUBROUTINE FOO(I,N,A,B,C)
!HPF$    LOCAL FOO
         REAL, ARRAY(N) :: A,B,C
!HPF$    TEMPLATE FRED(N)
!HPF$    ALIGN A *
!HPF$    ALIGN B(I) *
!HPF$    ALIGN C *
!HPF$    DISTRIBUTE FRED *
       END INTERFACE           

       FORALL (I=1:N) CALL FOO(I,N,A,B,C)
                                                                  \EDOC
Here the subroutine FOO inherits the distribution and alignment
of arrays A, B, and C (as it is described in Chapter 3 of the HPF 
proposal), and it is free to assign new values to array elements 
aligned to the element of the template FRED(I), in this case A(I), B(I-1) 
and C(I). On the other hand, the function cannot access the other 
elements of arrays A, B and C. To make this arrangement unambiguous, there 
must be exaclty one template defined in the LOCAL procedure. The template
index (indices for 2 or more dimensionial templates) must be passed as
arguments to the LOCAL procedure.

This construct can be used for 
embarrassingly parallel evaluation of arrays of coefficient, Monte Carlo 
simulations, etc., followed by a 'regular' matrix algebra, HPF reduction 
functions or other computations conveniently expressable in HPF syntax.

The sequentialization of the FORALL statement with subroutine is simply:
                                                                   \CODE
      DO I=1,N
       CALL FOO(I,A,B,C)
      ENDDO
                                                                   \EDOC 
where subroutine FOO has access only to single elements of arrays A,B,C,
namely A(I),B(I-1), and C(I). All arrays are aligned to the same template,
and an instantiation of the subroutine is executed by the processors
owning the corresponding element of the template, here FRED(I).
     
A directive !HPF$LOCAL is used to annotate LOCAL procedures.

An example of usage of the LOCAL function is given below:
                                                           \CODE
     ...
     
C    NP=NUMBER_OF_PROCESSORS     

      REAL,  ARRAY(NP) :: PX,PX2,SEED
CHPF$ TEMPLATE T(NP)
CHPF$ DISTRIBUTE T(CYCLIC)
CHPF$ ALIGN PX with T
CHPF$ ALIGN PX2 with T
CHPF$ ALIGN SEED with T
      INTERFACE
       SUBROUTINE FOO(I,SEED,X,X2)
!HPF$  LOCAL FOO
       REAL,  ARRAY(:) :: X,X2,SEED
CHPF$  ALIGN PX with *
CHPF$  ALIGN PX2 with *
CHPF$  ALIGN SEED with *
      END INTERFACE

      FORALL (I=1:NP) SEED(I)=something(I)
      DO I=1,NEVENTS
       N=I*NP
       FORALL (I=1:NP) CALL FOO(I,SEED,PX,PX2)
       X=X+SUM(PX)
       X2=X2+SUM(PX2)
       IF(SQRT(X2/N-(X/N)**2).LT.0.01) EXIT
      ENDDO
      AVERAGE=X/N
      ...
      END

      SUBROUTINE FOO(I,SEED,X,X2)
!HPF$ LOCAL FOO
      REAL,  ARRAY(:) :: X,X2,SEED
CHPF$ ALIGN PX with *
CHPF$ ALIGN PX2 with *
CHPF$ ALIGN SEED with *
      Z=RANDOM(SEED(I))
      Y=RANDOM(SEED(I))
      X(I)=(Z**2+Y**2)/2.0
      X2(I)=X(I)**2
      END
                                         \EDOC

The difference between a LOCAL and PURE procedure is that they access
global data different way. 

The difference between LOCAL function and FOREIGN procedure are that:

\begin{itemize}
\item LOCAL function supports the HPF computational model, while FOREIGN
procedure
  does not
\item a reference to a non-local data in FOREIGN procedure is possible by
  explicit message passing while in LOCAL procedure it is prohibited.
\item a LOCAL function is invoked elementally.
\end{itemize}

\subsubsection{User Defined Elemantal Functions}

\footnote{Introduced by J. Merlin (Southampton) and Ch. Koelbel (Rice).}
Fortran 90 introduces the concept of `elemental procedures', which are 
defined for scalar arguments but may also be applied to conforming 
array-valued arguments.  The latter type of reference to an elemental 
procedure is called an `elemental' reference.    For an elemental function, 
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g.\ SIN(X).  For an elemental 
subroutine, the effect on each element of an INTENT(OUT) or INTENT(INOUT) 
array argument is as would be obtained by calling the subroutine with 
the corresponding elements of the arguments.  An example is the intrinsic 
subroutine MVBITS.

However, Fortran~90 restricts elemental reference to a subset of 
the intrinsic procedures --- programmers cannot define their own 
elemental procedures.  Obviously, elemental invocation is equivalent 
to concurrent invocation, so extra constraints beyond those for normal 
Fortran procedures are required to allow this to be done safely
(e.g.\ deterministically).  Appropriate constraints in this case are
introduced by requesting the function to be a pure function.

\section{PURE Procedures}

\footnote{This section is copied from the FORALL draft (Oct. 14, 1992). The
only modification is that PURE function are allowed to be within body of
INDEPENDENT procedures and to be actual argument in an INDEPENDENT procedure
reference.}
A {\it pure function} is one that produces no side effects. This means that 
the only effect of a pure function reference on the state of a program
is to return a result - it does not modify the values, pointer associations
or data mapping of any of its arguments or global data, and performs no
I/O. A {\it pure subroutine} is one that produces no side effects except for
modifying the values and/or pointer associations of certain arguments.

A PURE procedure (i.e. function or subroutine) may be used in any way that
a normal procedure can. In addition, a procedure is required to be PURE in
any of the following context:
\begin{itemize}
\item an elemental reference
\item within the body of a PURE procedure
\item as an actual argument in a PURE procedure reference
\end{itemize}
A PURE procedure may be used
\begin{itemize}
\item in a FORALL statement or construct
\item within the body of an INDEPENDENT procedure
\item as an actual argument in an INDEPENDENT procedure reference 
\end{itemize}

A PURE procedure must have explicit interface, and it must be defined to be
PURE in both its definition and interface. The form of this declaration is
a directive 
                                                                     \BNF
pure-directive  \IS   PURE function-name
                                                                     \FNB
\subsubsection{Pure function definition}

To define pure functions, Rule~R1215 of the Fortran~90 standard is changed 
to:
                                                                 \BNF
function-subprogram \IS         function-stmt
                                [pure-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-function-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.2 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it pure-directive\/}, it must match the {\it function-name\/} in the 
{\it function-stmt\/}.

        \item In a pure function, a local variable must not have the 
SAVE attribute. (Note that this means that a local variable cannot be 
initialised in a {\it type-declaration-stmt\/} or a
{\it data-stmt\/}, which imply the SAVE attribute.)

        \item A pure function must not use a dummy argument, a global 
variable, or an object that is storage associated with a global variable,
or a subobject thereof, in the following contexts:
        \begin{itemize}
                \item as the assignment variable of an {\it assignment-stmt\/}
or {\it forall-assignment\/};
                \item as a DO variable or implied DO variable, or as a 
{\it subscript-name\/} in a {\it forall-triplet-spec\/};
                \item in an {\it assign-stmt\/};
                \item as the {\it pointer-object\/} or {\it target\/}
of a {\it pointer-assignment-stmt\/};
                \item as the {\it expr\/} of an {\it assignment-stmt\/}
or {\it forall-assignment\/} whose assignment variable is of a derived 
type, or is a pointer to a derived type, that has a pointer component 
at any level of component selection;
                \item as an {\it allocate-object\/} or {\it stat-variable\/}
in an {\it allocate-stmt\/} or {\it deallocate-stmt\/}, or as a
{\it pointer-object\/} in a {\it nullify-stmt\/};
                \item as an actual argument associated with a dummy 
argument with INTENT (OUT) or (INOUT) or with the POINTER attribute.
        \end{itemize}

        \item Any procedure referenced in a pure function, including
one referenced via a defined operation or assignment, must be pure.

        \item In a pure function, a dummy argument or local variable 
must not appear in an {\it align-directive\/}, {\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a pure function, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A pure function must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a function is pure, a {\it pure-directive\/} must be given.

The above constraints are designed to guarantee that a pure function
is free from side effects (i.e.\ modifications of data visible outside
the function), which means that it is safe to reference concurrently, 
as explained earlier.

The second constraint ensures that a pure function does not retain
an internal state between calls, which would allow side-effects between 
calls to the same procedure.

The third constraint ensures that dummy arguments and global variables
are not modified by the function.
In the case of a dummy or global pointer, this applies to both its 
pointer association and its target value, so it cannot be subject to 
a pointer assignment or to an ALLOCATE, DEALLOCATE or NULLIFY
statement.
Incidentally, these constraints imply that only local variables and the
dummy result variable can be subject to assignment or pointer assignment.

In addition, a dummy or global data object cannot be the {\it target\/}
of a pointer assignment (i.e.\ it cannot be used as the right hand side
of a pointer assignment to a local pointer or to the result variable), 
for then its value could be modified via the pointer.

In connection with the last point, it should be noted that an ordinary 
(as opposed to pointer) assignment to a variable of derived type that has 
a pointer component at any level of component selection may result in a 
{\em pointer\/} assignment to the pointer component of the variable.
That is certainly the case for an intrinsic assignment.  In that case
the expression on the right hand side of the assignment has the same type 
as the assignment variable, and the assignment results in a pointer 
assignment of the pointer components of the expression result to the
corresponding components of the variable (see section 7.5.1.5 of the 
Fortran~90 standard).  However, it may also be the case for a 
{\em defined\/} assignment to such a variable, even if the data type of 
the expression has no pointer components;  the defined assignment may still 
involve pointer assignment of part or all of the expression result to the 
pointer components of the assignment variable.  Therefore, a dummy or 
global object cannot be used as the right hand side of any assignment to 
a variable of derived type with pointer components, for then it, or part 
of it, might be the target of a pointer assignment, in violation of the 
restriction mentioned above.

(Incidentally, the last two paragraphs only prevent the reference of 
a dummy or global object as the {\em only\/} object on the right hand
side of a pointer assignment or an assignment to a variable with pointer
components.  There are no constraints on its reference as an operand, 
actual argument, subscript expression, etc.\ in these circumstances).

Finally, a dummy or global data object cannot be used in a procedure 
reference as an actual argument associated with a dummy argument of
INTENT (OUT) or (INOUT) or with a dummy pointer, for then it may be
modified by the procedure reference.  
This constraint, like the others, can be statically checked, since any
procedure referenced within a pure function must be either a pure 
function, which does not modify its arguments, or a pure subroutine, 
whose interface must specify the INTENT or POINTER attributes of its 
arguments (see below).
Incidentally, notice that in this context it is assumed that an actual 
argument associated with a dummy pointer is modified, since Fortran~90 
does not allow its intent to be specified.

Constraint 4 ensures that all procedures called from a pure function 
are themselves pure and hence side effect free, except, in the case of
subroutines, for modifying actual arguments associated with dummy pointers 
or dummy arguments with INTENT(OUT) or (INOUT).  As we have just 
explained, it can be checked that global or dummy objects are not used
in such arguments, which would violate the required side-effect freedom.

Constraints 5 and 6 protect dummy and global data objects from realignment 
and redistribution (another type of side effect).  
In addition, constraint 5 prevents explicit declaration of the mapping 
(i.e.\ alignment and distribution) of dummy arguments and local variables.  
This is because the function may be invoked concurrently, with each 
invocation operating on a segment of data whose distribution is specific 
to that invocation.  Thus, the distribution of a dummy object must be 
`assumed' from the corresponding actual argument.  
Also, it is left to the implementation to determine a suitable mapping 
of the local variables, which would typically depend on the mapping of 
the dummy arguments.

Constraint 7 prevents I/O, whose order would be non-deterministic in 
the context of concurrent execution.  A PAUSE statement requires input
and so is disallowed for the same reason.


\subsubsection{Pure subroutine definition}

To define pure subroutines, Rule~R1219 is changed to:
                                                                 \BNF
subroutine-subprogram \IS       subroutine-stmt
                                [pure-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-subroutine-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.3 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it pure-directive\/}, it must match the {\it sub\-rou\-tine-name\/} in the 
{\it subroutine-stmt\/}.

        \item The {\it specification-part\/} of a pure subroutine must 
specify the intents of all non-pointer and non-procedure dummy arguments.

        \item In a pure subroutine, a local variable must not have the 
SAVE attribute. (Note that this means they cannot be initialised in a 
{\it type-declaration-stmt\/} or a {\it data-stmt\/}.)

        \item A pure subroutine must not use a dummy parameter with 
        INTENT(IN), a global variable, or an 
object that is storage associated with a global variable, or a subobject 
thereof, in the following contexts:
        \begin{itemize}
                \item as the assignment variable of an {\it assignment-stmt\/}
or {\it forall-assignment\/};
                \item as a DO variable or implied DO variable, or as a 
{\it subscript-name\/} in a {\it forall-triplet-spec\/;
                \item in an {\it assign-stmt\/};
                \item as the {\it pointer-object\/} or {\it target\/}
of a {\it pointer-assignment-stmt\/};
                \item as the {\it expr\/} of an {\it assignment-stmt\/}
or {\it forall-assignment\/} whose assignment variable is of a derived 
type, or is a pointer to a derived type, that has a pointer component 
at any level of component selection;
                \item as an {\it allocate-object\/} or {\it stat-variable\/}
in an {\it allocate-stmt\/} or {\it deallocate-stmt\/}, or as a
{\it pointer-object\/} in a {\it nullify-stmt\/};
                \item as an actual argument associated with a dummy 
argument with INTENT (OUT) or (INOUT) or with the POINTER attribute.
        \end{itemize}

        \item Any procedure referenced in a pure subroutine, including
one referenced via a defined operation or assignment, must be pure.

        \item In a pure subroutine, a dummy argument or local variable 
must not appear in an {\it align-directive\/}, {\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a pure subroutine, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A pure subroutine must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a subroutine is pure, a {\it pure-directive\/} must be
given.

The constraints for pure subroutines are based on the same principles 
as for pure functions, except that now side effects to dummy arguments 
are permitted.  


\subsubsection{Pure procedure interfaces}
\label{pure-proc-interface}

To define interface specifications for pure procedures, Rule~R1204 is 
changed to:
                                                                \BNF
interface-body \IS      function-stmt
                        [pure-directive]
                        [specification-part]
                        end-function-stmt
                \OR     subroutine-stmt
                        [pure-directive]
                        [specification-part]
                        end-subroutine-stmt
                                                                \FNB
with the following constraint in addition to those in
Section~12.3.2.1 of the Fortran~90 standard:
\begin{constraints}

        \item An {\it interface-body\/} of a pure subroutine must specify
the intents of all non-pointer and non-procedure dummy arguments.

\end{constraints}

The procedure characteristics defined by an interface body must be
consistent with the procedure's definition.
Regarding pure procedures, this is interpreted as follows:
\begin{enumerate}
        \item A procedure that is declared pure at its definition may be
declared pure in an interface block, but this is not required.
        \item A procedure that is not declared pure at its definition must 
not be declared pure in an interface block.
\end{enumerate}
That is, if an interface body contains a {\it pure-directive\/}, then the 
corresponding procedure definition must also contain it, though the 
reverse is not true.
When a procedure definition with a {\it pure-directive\/}
is compiled, the compiler may check that it satisfies the necessary 
constraints.


\subsection{Pure procedure reference}
To define pure procedure references, the following extra constraint is 
added to Section~12.4.1 of the Fortran~90 standard:
\begin{constraints}

        \item In a reference to a pure procedure, a {\it procedure-name\/} 
{\it actual-arg\/} must be the name of a pure procedure.

\end{constraints}

\section{INDEPENDENT Procedures}

An {\it independent procedure} is one that all its instantiations can be
executed independently on each other and the results are deterministic. 
This means that an INDEPENDENT function 
may have an access to global data passed to it as arguments with explicit
intent IN and it returns scalar variables to be assigned to global arrays
in a FORALL statement or construct.

An INDEPENDENT procedure (i.e. function or subroutine) may be used only in
FORALL statement or construct.

An INDEPENDENT procedure must have explicit interface, and it must be defined
to be INDEPENDENT in both its definition and interface. The form of this 
declaration is a directive 
                                                                     \BNF
independent-directive  \IS   INDEPENDENT function-name
                                                                     \FNB

\subsubsection{Independent function definition}

To define independent functions, Rule~R1215 of the Fortran~90 standard is 
changed to:
                                                                 \BNF
function-subprogram \IS         function-stmt
                                [independent-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-function-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.2 of the 
Fortran~90 standard:
\begin{constraints}
        \item If a {\it procedure-name\/} is present in the 
{\it independent-directive\/}, it must match the {\it function-name\/} 
in the {\it function-stmt\/}.

        \item In an independent function, a local variable must not have 
the SAVE attribute. (Note that this means they cannot be initialised in a 
{\it type-declaration-stmt\/} or a {\it data-stmt\/}.)

        \item The {\it specification-part\/} of a independent function must 
specify the intents of all non-pointer and non-procedure dummy arguments.

        \item In an independent function, only {\it scalar variables} are
allowed with intent OUT.  

        \item An independent function must not contain assignments to
global data objects.

        \item Any procedure referenced in an independent function, including
one referenced via a defined operation or assignment, must be independent
or pure.

        \item In an independent function, a dummy argument or local 
variable must not appear in an {\it align-directive\/}, 
{\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a independent function, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A independent function must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}

To declare that a function is independent, a {\it indepenendent-directive\/} 
must be given.


\subsubsection{independent subroutine definition}

To define independent subroutines, Rule~R1219 is changed to:
                                                                 \BNF
subroutine-subprogram \IS       subroutine-stmt
                                [independent-directive]
                                [specification-part]
                                [execution-part]
                                [inrnal-subprogram-part]
                                end-subroutine-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.3 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it independent-directive\/}, it must match the {\it sub\-rou\-tine-name\/} 
in the {\it subroutine-stmt\/}.

        \item In an independent subroutine, a local variable must not have 
the SAVE attribute. (Note that this means they cannot be initialised in a 
{\it type-declaration-stmt\/} or a {\it data-stmt\/}.)

        \item The {\it specification-part\/} of a pure subroutine must 
specify the intents of all non-pointer and non-procedure dummy arguments.

        \item In an independent subroutine, all subscripts defined in
{\it forall-triplet-list} must be passed to that as arguments with explicit
intent IN.

        \item In an independent subroutine, only {\it scalar variables} are
allowed with intent OUT.  

        \item An independent subroutine must not cointain assignments to
global data objects.

        \item Any procedure referenced in an independent subroutine, including
one referenced via a defined operation or assignment, must be independent
or pure.

        \item In an independent subroutine, a dummy argument or local 
variable must not appear in an {\it align-directive\/}, 
{\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In an independent subroutine, a global variable must not appear
in {\it realign-directive\/} or {\it redistribute-directive\/}.

        \item An independent subroutine must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a subroutine is independent, a {\it independent-directive\/} 
must be given.


\subsubsection{independent procedure interfaces}
\label{independent-proc-interface}

To define interface specifications for independent procedures, Rule~R1204 is 
changed to:
                                                                \BNF
interface-body \IS      function-stmt
                        [independent-directive]
                        [specification-part]
                        end-function-stmt
                \OR     subroutine-stmt
                        [independent-directive]
                        [specification-part]
                        end-subroutine-stmt
                                                                \FNB
with the following constraint in addition to those in
Section~12.3.2.1 of the Fortran~90 standard:
\begin{constraints}

        \item An {\it interface-body\/} of an independent subroutine must 
specify the intents of all non-pointer and non-procedure dummy arguments.

\end{constraints}

The procedure characteristics defined by an interface body must be
consistent with the procedure's definition.
Regarding independent procedures, this is interpreted as follows:
\begin{enumerate}
        \item A procedure that is declared independent at its definition 
may be declared independent in an interface block, but this is not required.
        \item A procedure that is not declared independent at its definition 
must not be declared independent in an interface block.
\end{enumerate}
That is, if an interface body contains a {\it independent-directive\/}, then 
the corresponding procedure definition must also contain it, though the 
reverse is not true. When a procedure definition with an 
{\it independent-directive\/}
is compiled, the compiler may check that it satisfies the necessary 
constraints.


\subsection{Independent procedure reference}
To define independent procedure references, the following extra constraint is 
added to Section~12.4.1 of the Fortran~90 standard:
\begin{constraints}

        \item In a reference to an independent procedure, a 
{\it procedure-name\/}  {\it actual-arg\/} must be the name of an
independent  procedure.

\end{constraints}


\section{LOCAL Procedures}
A {\it local procedure} is one that operate on global variables aligned to a 
single template element or local variables. 

A LOCAL procedure (i.e. function or subroutine) may be used only in FORALL
statement or construct.

A LOCAL procedure must have explicit interface, and it must be defined to be
LOCAL in both its definition and interface. The form of this declaration is
a directive 
                                                                     \BNF
local-directive  \IS   LOCAL function-name
                                                                     \FNB

\subsubsection{Local function definition}

To define local functions, Rule~R1215 of the Fortran~90 standard is 
changed to:
                                                                 \BNF
function-subprogram \IS         function-stmt
                                [local-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-function-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.2 of the 
Fortran~90 standard:
\begin{constraints}
        \item If a {\it procedure-name\/} is present in the 
{\it local-directive\/}, it must match the {\it function-name\/} 
in the {\it function-stmt\/}.

        \item In an local function, a local variable must not have 
the SAVE attribute. (Note that this means they cannot be initialised in a 
{\it type-declaration-stmt\/} or a {\it data-stmt\/}.)

        \item The {\it specification-part\/} of a local function must 
specify the intents of all non-pointer and non-procedure dummy arguments.

        \item Replicated global data are not allowed as arguments.

        \item All global variables passed as arguments to a local function
must be alligned to the same template both in the calling procedure and
the called function, and the distribution of the global variables must
be the same as in calling procedure. 

        \item Indices of the template pointing which element of the template 
is to be used by an invocation of the local function must be passed as an 
argument with explicit intent IN.

        \item A local function must not access global data objects which
are not aligned to the template element owned by the invovation of the
function.

        \item Any procedure referenced in a local function, including
one referenced via a defined operation or assignment, must be local.

        \item In a local function, a dummy argument or local 
variable must not appear in an {\it align-directive\/}, 
{\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a local function, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A local function must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}

To declare that a function is local, a {\it indepenendent-directive\/} 
must be given.


\subsubsection{local subroutine definition}

To define local subroutines, Rule~R1219 is changed to:
                                                                 \BNF
subroutine-subprogram \IS       subroutine-stmt
                                [local-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-subroutine-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.3 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it local-directive\/}, it must match the {\it sub\-rou\-tine-name\/} 
in the {\it subroutine-stmt\/}.

        \item In an local subroutine, a local variable must not have 
the SAVE attribute. (Note that this means they cannot be initialised in a 
{\it type-declaration-stmt\/} or a {\it data-stmt\/}.)

        \item The {\it specification-part\/} of a local subroutine must 
specify the intents of all non-pointer and non-procedure dummy arguments.

        \item In an local subroutine, all subscripts defined in
{\it forall-triplet-list} must be passed to that as arguments with explicit
intent IN.

        \item Replicated global data are not allowed as arguments.

        \item all global variables passed as arguments to a local function
must be alligned to the same template both in the calling procedure and
and called function, and the distribution of the global variables must
be the same as in calling procedure. 

        \item indices of the template pointing which element of the template is
owned by an invocation of the function must be passed as an argument with
explicit intent IN.

        \item A local function must not access global data objects which
are not aligned to the template element owned by the invovation of the
function.


        \item In a local subroutine, a dummy argument or local 
variable must not appear in an {\it align-directive\/}, 
{\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a local subroutine, a global variable must not appear
in {\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A local subroutine must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a subroutine is local, a {\it local-directive\/} 
must be given.


\subsubsection{local procedure interfaces}
\label{local-proc-interface}

To define interface specifications for local procedures, Rule~R1204 is 
changed to:
                                                                \BNF
interface-body \IS      function-stmt
                        [local-directive]
                        [specification-part]
                        end-function-stmt
                \OR     subroutine-stmt
                        [local-directive]
                        [specification-part]
                        end-subroutine-stmt
                                                                \FNB
with the following constraint in addition to those in
Section~12.3.2.1 of the Fortran~90 standard:
\begin{constraints}

        \item An {\it interface-body\/} of an local subroutine must 
specify the intents of all non-pointer and non-procedure dummy arguments.

\end{constraints}

The procedure characteristics defined by an interface body must be
consistent with the procedure's definition.
Regarding local procedures, this is interpreted as follows:
\begin{enumerate}
        \item A procedure that is declared local at its definition 
may be declared local in an interface block, but this is not required.
        \item A procedure that is not declared local at its definition 
must not be declared local in an interface block.
\end{enumerate}
That is, if an interface body contains a {\it local-directive\/}, then 
the corresponding procedure definition must also contain it, though the 
reverse is not true. When a procedure definition with an 
{\it local-directive\/}
is compiled, the compiler may check that it satisfies the necessary 
constraints.


\subsection{Local procedure reference}
To define local procedure references, the following extra constraint is 
added to Section~12.4.1 of the Fortran~90 standard:
\begin{constraints}

        \item In a reference to an local procedure, a 
{\it procedure-name\/}  {\it actual-arg\/} must be the name of an
local  procedure.

\end{constraints}


\section{Element Array Assignment - FORALL}
\label{forall-stmt}

\footnote{Version of September 21, 1992 - David
Loveman, Digital Equipment Corporation and Charles Koelbel, Rice
University.
Approved at second reading on September 10, 1992. It is extended by allowing
independent or local subroutine calls in a FORALL body}
The element array
assignment statement (FORALL statement) is used to specify an array
assignment in terms of array elements or groups of array sections.
The element array assignment may be
masked with a scalar logical expression.  

Rule R215 for {\it executable-construct} is extended to include the 
{\it forall-stmt}.

\subsection{General Form of Element Array Assignment}

                                                                       \BNF
forall-stmt          \IS FORALL (forall-triplet-spec-list
                       [,scalar-mask-expr ]) forall-assignment

forall-triplet-spec  \IS subscript-name = subscript : subscript 
                          [ : stride]
                                                                       \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

                                                                       \BNF
forall-assignment    \IS array-element = expr
                     \OR array-element => target
                     \OR array-section = expr
                     \OR CALL independent-subroutine-subprogram
                     \OR CALL local-subroutine-subprogram
                                                                       \FNB

\noindent
Constraint:  The {\it array-section} or {\it array-element} in a {\it
forall-assignment} must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: In the cases of simple assignment, the {\it array-element} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent
Constraint: In the case of pointer assignment, the {\it array-element} 
and {\it target} have the same constraints as the {\it pointer-object} 
and {\it target}, respectively, in a {\it pointer-assignment-stmt}.

\noindent
Constraint: In the cases of array section assignment, the {\it 
array-section} and 
{\it expr} have the same constraints as the {\it variable} and {\it expr} 
in an {\it assignment-stmt}.

\noindent Constraint\footnote{this constraint superseed that present in the
original FORALL proposal}: procedure reference (~Rule1209) allowed in 
{\it expr} (~Rule723 and ~Rule701) must be {\it pure-function-reference},
{\it independent-procedure-reference} or {\it local-procedure-reference}.
\noindent
Constraint\footnote{this constraint is an extension to the original FORALL
proposal}: When an independent or local subroutine is called in a FORALL
statement, all subsripts defined in {\it forall-triplet-spec-list} must be
passed to that subroutine as arguments with explicit intend IN.

For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
1}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
forall-assignment} is not executed.


\subsection{Interpretation of Element Array Assignments}  

Execution of an element array assignment consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Evaluation in any order of the {\it expr} or {\it target} and all 
subscripts contained in the 
{\it array-element} or {\it array-section} in the {\it forall-assignment} 
for all active combinations of {\em subscript-name} values.
In the case of pointer assignment where the {\it target} is not a 
pointer, the evaluation consists of identifying the object referenced 
rather than computing its value.

\item Assignment of the computed {\it expr} values to the corresponding 
elements specified by {\it array-element} or {\it array-section}.
The assignments may be made in any order.
In the case of a pointer assignment where the {\it target} is not a 
pointer, this assignment consists of associatin the {\it array-element} 
with the object referenced.

\end{enumerate}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 


The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.

\footnote{this is a modification of the original FORALL proposal}
PURE, INDEPENDENT and LOCAL procedures allowed in a FORALL statement does not
affect other expressions' evaluations, either for the same combination of
{\it subscript-name} values or for a different combination. In addition, it
is possible that the compiler can perform more extensive optimizations when
all function are declared PURE, INDEPENDENT or LOCAL.

\subsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

                                                                   \CODE
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn , mask ) &
      forall-assignment
                                                                   \EDOC
and if the {\it forall-assignment} has a form
                                                                   \CODE     
      a(e1,...,em) = rhs
                                                                   \EDOC

\noindent
then it is equivalent to the following standard Fortran 90 code:

\raggedbottom
                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
          ...
  END DO
END DO

!then evaluate the expr in the forall-assignment for all valid 
!combinations of subscript names for which the scalar mask 
!expression is true (it is safe to avoid saving the subscript 
!expressions because of the conditions on FORALL expressions)

DO v1=templ1,tempu1,temps1
  DO v2=tel2,tempu2,temps2
   ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          temprhs(v1,v2,...,vn) = rhs
        END IF
      END DO
          ...
  END DO
END DO

!then perform the assignment of these values to the corresponding 
!elements of the array being assigned to

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2 ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(e1,...,em) = temprhs(v1,v2,...,vn)
        END IF
      END DO
          ...
  END DO
END DO
                                                                      \EDOC
\flushbottom
and if the {\it forall-assignment} has a form
                                                                   \CODE     
      CALL independent-subroutine(e1,...,em,A1,...,Ap, &
         B1(e1,...,em),...,Bq(e1,...,em))
  where 
      A1,...,Ap are arguments with intent IN
      B1(e1,...,em),...,Bq(e1,...,em) are scalar arguments with intent OUT

                                                                   \EDOC

\noindent
then it is equivalent to the following standard Fortran 90 code:

\raggedbottom
                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
          ...
  END DO
END DO

!then call the subroutine in the forall-assignment for all valid 
!combinations of subscript names for which the scalar mask  

DO v1=templ1,tempu1,temps1
  DO v2=tel2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          call subroutine(v1,v2,...,vn,A1,...,Ap,  &
              1(v1,v2,...,vn),...,tempq(v1,v2,...,vn))
        END IF
      END DO
          ...
  END DO
END DO

!then perform the assignment of these values to the corresponding 
!elements of the array being assigned to

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          B1(e1,...,em) = temp1(v1,v2,...,vn)
          ...
          Bq(e1,...,em) = tempq(v1,v2,...,vn)
        END IF
      END DO
          ...
  END DO
END DO
                                                                      \EDOC
\flushbottom
Finally, if the {\it forall-assignment} has a form
                                                                   \CODE     
      CALL local-subroutine(e1,...,em,A1,...,Ap,B1,...,Bq)
  where 
      A1,...,Ap are arguments with intent IN
      B1(e1,...,em),...,Bq(e1,...,em) are arguments with intent OUT
                                                                   \EDOC

\noindent
then it is equivalent to the following standard Fortran 90 code:

\raggedbottom
                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

!then evaluate the scalar mask expression

DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
      END DO
          ...
  END DO
END DO

!then call the subroutine in the forall-assignment for all valid 
!combinations of subscript names for which the scalar mask  

DO v1=templ1,tempu1,temps1
  DO v2=tel2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          call subroutine(v1,v2,...,vn,A1,...,Ap,B1,...,Bq)
        END IF
      END DO
          ...
  END DO
END DO

                                                                      \EDOC
\flushbottom

\section{FORALL Construct}

\label{forall-construct}

\footnote{Version of August 20, 1992 -
David Loveman, Digital Equipment Corporation and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992. It is modified by introducing ELSEFORALL in place of
WHERE and/or WHERE-ELSEWHERE constructs within FORALL body}
The FORALL construct is a generalization of the element array
assignment statement allowing multiple assignments, masked array 
assignments, and nested FORALL statements to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it
forall-construct}.

\subsection{General Form of the FORALL Construct}

                                                                    \BNF
forall-construct   \IS FORALL (forall-triplet-spec-list [,scalar-mask-expr ])
                           forall-body-stmt-list
                       ELSEFORALL
                           forall-body-stmt-list
                       END FORALL

forall-body-stmt     \IS forall-assignment
                     \OR forall-stmt
                     \OR forall-construct
                                                                    \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\noindent
Constraint:  Any left-hand side {\it array-section} or {\it 
array-element} in any {\it forall-body-stmt}
must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
within a {\it forall-construct}, then the inner FORALL may not redefine 
any {\it subscript-name} used in the outer {\it forall-construct}.
This rule applies recursively in the event of multiple nesting levels.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
1)}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + 1) / m3 \rfloor \leq 0\), the {\it
forall-assignment}s are not  executed.


\subsection{Interpretation of the FORALL Construct}

Execution of a FORALL construct consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
One set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true and
a second one is the subset of the valid combinations for which the mask
evaluates to false.

\item Execute the first {\it forall-body-stmts} in the order they appear for
the first set of the valid combination of {\it subscript-name}.
Each statement is executed completely (that is, for all active 
combinations of {\it subscript-name} values) according to the following 
interpretation:

\begin{enumerate}

\item Assignment statements, pointer assignment statements, and array
assignment statements (i.e.
statements in the {\it forall-assignment} category) evaluate the 
right-hand side {\it expr} and any left-and side subscripts for all 
active {\it subscript-name} values,
then assign those results to the corresponding left-hand side references.

\item FORALL statements and FORALL constructs first evaluate the 
subscript and stride expressions in 
the {\it forall-triplet-spec-list} for all active combinations of the 
outer FORALL constructs.
The set of valid combinations of {\it subscript-names} for the inner 
FORALL is then the union of the sets defined by these bounds and strides 
for each active combination of the outer {\it subscript-names}.
For example, the valid set of the inner FORALL in the second example in 
the last section is the upper triangle (not including the main diagonal) 
of the \(n \times n\) matrix a.
The scalar mask expression is then evaluated for all valid combinations 
of the inner FORALL's {\it subscript-names} to produce the set of active 
combinations.
If there is no scalar mask expression, it is assumed to be always true.
Each statement in the inner FORALL is then executed for each valid 
combination (of the inner FORALL), recursively following the 
interpretations given in this section.

\end{enumerate}

\item Execute the second {\it forall-body-stmts} for the second set of active
{\it subscript-name} the same way as in 3.

\end{enumerate}

If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL construct itself. 

A single assignment or array assignment statement in a {\it 
forall-construct} must obey the same restrictions as a {\it 
forall-assignment} in a simple {\it forall-stmt}.
(Note that the lowest level of nested statements must always be an 
assignment statement.)
For example, an assignment may not cause the same array element to be 
assigned more than once.
Different statements may, however, assign to the 
same array element, and assignments made in one
statement may affect the execution of a later statement.

\subsection{Scalarization of the FORALL Construct}

A {\it forall-construct} othe form:

                                                                \CODE
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
                                                                \EDOC

where each si is an assignment is equivalent to the following scalar code:

                                                                \CODE
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
                                                                \EDOC

A {\it forall-construct} of the form:

                                                                \CODE
FORALL ( v=l:u:s, mask )
    a(l:u:s) = rhs1 
ELSEFORALL
    a(l:u:s) = rhs2
END FORALL
                                                                \EDOC

is equivalent to the following standard Fortran 90 code:

                                                                \CODE
!evaluate subscript and stride expressions in any order

templ = l
tempu = u
temps = s

!then evaluate the FORALL mask expression

DO v=templ,tempu,temps
 tempmask(v) = mask
END DO

!then evaluate the masks 

DO v1=templ,tempu,temps
    tempmask(v) = mask(v)
  END IF
END DO

!then evaluate the first block of statements

DO v=templ,tempu,temps
  IF (tempmask(v)) THEN
      temprhs1(v) = rhs1
  END IF
END DO
DO v1=templ,tempu,temps
  IF (tempmask(v)) THEN
   a(v)=temprhs1(v)  
  END IF
END DO

!then evaluate the second block of statements

DO v=templ,tempu,temps
  IF (not.tempmask(v)) THEN
      temprhs2(v) = rhs2
  END IF
END DO
DO v1=templ,tempu,temps
  IF (.not.tempmask(v)) THEN
   a(v)=temprhs2(v)
  END IF
END DO


                                                                   \EDOC


A {\it forall-construct} of the form:

                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  FORALL ( v2=l2:u2:s2, mask2 )
    a(e1,e2) = rhs1
        b(e3,e4) = rhs2
  END FORALL
END FORALL
                                                                  \EDOC

is equivalent to the following standard Fortran 90 code:


                                                                   \CODE
!evaluate subscript and stride expressions in any order

templ1 = l1
tempu1 = u1
temps1 = s1

!then evaluate the FORALL mask expression

DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

!then evaluate the inner FORALL bounds, etc

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    templ2(v1) = l2
    tempu2(v1) = u2
    temps2(v1) = s2
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      tempmask2(v1,v2) = mask2
    END DO
  END IF
END DO

!first statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs1(v1,v2) = rhs1
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        a(e1,e2) = temprhs1(v1,v2)
      END IF
    END DO
  END IF
END DO

!second statement

DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs2(v1,v2) = rhs2
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
{>      IF ( tempmask2(v1,v2) ) THEN
        b(e3,e4) = temprhs2(v1,v2)
      END IF
    END DO
  END IF
END DO
                                                                   \EDOC


\section{Pure Procedures and Elemental Reference}
\footnote{Version of October 14, 
1992 - John Merlin, University of Southampton, 
and Charles Koelbel, Rice University. Approved at first reading on
September 10, 1992, subject to technical revisions for correctness.
The suggestions made there have been incorporated in this draft.}
\subsubsection{Elemental reference of pure functions}

A non-intrinsic pure function may be referenced {\em elementally\/} 
in array expressions, with a similar interpretation to the elemental
reference of Fortran~90 elemental intrinsic functions, provided it
satisfies the additional constraints that:
\begin{enumerate}
        \item Its non-procedure dummy arguments and dummy result are 
scalar and do not have the POINTER attribute.
        \item The length of any character dummy argument or result is 
independent of argument values (though it may be assumed, or depend on the 
lengths of other character arguments and/or a character result).
\end{enumerate}
We call non-intrinsic pure functions that satisfy these constraints 
`elemental non-intrinsic functions'.

The interpretation of an elemental reference of such a function is as 
follows (adapted from Section 12.4.3 of the Fortran~90 standard):
\begin{quotation}

A reference to an elemental non-intrinsic function is an elemental
reference if one or more non-procedure actual arguments are arrays
and all array arguments have the same shape.  If any actual argument 
is a function, its result must have the same shape as that of the 
corresponding function dummy procedure.  A reference to an elemental 
intrinsic function is an elemental reference if one or more actual 
arguments are arrays and all arrays have the same shape.

The result of such a reference has the same shape as the array arguments,
and the value of each element of the result, if any, is obtained by 
evaluating the function using the scalar and procedure arguments and
the corresponding elements of the array arguments.  The elements of
the result may be evaluated in any order.

For example, if \verb@foo@ is a pure function with the following interface:
                                                \CODE
    INTERFACE
      REAL FUNCTION foo (x, y, z, dummy_func)
        !HPF$ PURE foo
        REAL, INTENT(IN) :: x, y, z
        INTERFACE        ! interface for 'dummy_func'
          REAL FUNCTION dummy_func (x)
            !HPF$ PURE dummy_func
            REAL, INTENT(IN) :: x
          END FUNCTION my_func
        END INTERFACE
      END FUNCTION foo
    END INTERFACE
                                                \EDOC
and \verb@a@ and \verb@b@ are arrays of shape \verb@(m,n)@ and \verb@sin@
is the Fortran~90 elemental intrinsic function, then:
                                                \CODE
    foo (a, 0.0, b, sin)
                                                \EDOC
is an array expression of shape \verb@(m,n)@ whose \verb@(i,j)@ element
has the value:
                                                \CODE
    foo (a(i,j), 0.0, b(i,j), sin)
                                                \EDOC
\end{quotation}

To define elemental references of elemental non-intrinsic functions, 
the following extra constraints are added after Rule~R1209 
({\it function-reference\/}):
\begin{constraints}

        \item A non-intrinsic function that is referenced elementally 
must be a pure function with an explicit interface, and must satisfy 
the following additional constraints:
        \begin{itemize}
                \item Its non-procedure dummy arguments and dummy result
must be scalar and must not have the POINTER attribute.
                \item The length of any character dummy argument or a 
character dummy result must not depend on argument values (though it may 
be assumed, or depend on the lengths of other character arguments and/or a 
character result).
        \end{itemize}

        \item In an elemental reference of a non-intrinsic function,
a {\it function-name\/} {\it actual-arg\/} must have a result whose shape 
agrees with that of the corresponding function dummy procedure.

\end{constraints}

The reasons for these constraints are explained in the next section.


\subsubsection{Elemental reference of pure subroutines}

A non-intrinsic pure subroutine may be referenced {\em elementally\/}, 
with a similar interpretation to the elemental reference of Fortran~90 
elemental intrinsic subroutines, provided it satisfies the additional 
constraints that:
\begin{enumerate}
        \item Its non-procedure dummy arguments are scalar and do not 
have the POINTER attribute.
        \item The length of any character dummy argument is independent 
of argument values (though it may be assumed, or depend on the lengths of 
other character arguments).
\end{enumerate}
We call non-intrinsic pure subroutines that satisfy these constraints 
`elemental non-intrinsic subroutines'.

The interpretation of an elemental reference of such a subroutine 
is as follows (adapted from Section 12.4.5 of the Fortran~90 standard):
\begin{quotation}

A reference to an elemental non-intrinsic subroutine is an elemental
reference if all actual arguments corresponding to INTENT(OUT) and
INTENT(INOUT) dummy arguments are arrays that have the same shape 
and the remaining non-procedure actual arguments are conformable with 
them.  If any actual argument is a function, its result must have the 
same shape as that of the corresponding function dummy procedure.
A reference to an elemental intrinsic subroutine is an elemental 
reference if all actual arguments corresponding to INTENT(OUT) and 
(INTENT(INOUT) dummy arguments are arrays that have the same shape and 
the remaining actual arguments are conformable with them.

The values of the elements of the arrays that correspond to INTENT(OUT)
and INTENT(INOUT) dummy arguments are the same as if the subroutine were 
invoked separately, in any order, using the scalar and procedure arguments 
and corresponding elements of the array arguments.

\end{quotation}

To define elemental references of elemental non-intrinsic subroutines, 
the following constraints are added after Rule~R1210 ({\it call-stmt\/}):
\begin{constraints}

        \item A non-intrinsic subroutine that is referenced elementally 
must be a pure subroutine with an explicit interface, and must satisfy 
the following additional constraints:
        \begin{itemize}
                \item Its non-procedure dummy arguments must be scalar 
and must not have the POINTER attribute.
                \item The length of any character dummy argument must 
not depend on argument values (though it may be assumed, or depend on 
the lengths of other character arguments).
        \end{itemize}

        \item In an elemental reference of a non-intrinsic subroutine,
a {\it function-name\/} {\it actual-arg\/} must have a result whose shape 
agrees with that of the corresponding function dummy procedure.

\end{constraints}

It is perhaps worth outlining the reasons for the extra constraints 
imposed on pure procedures in order for them to be referenced elementally.  

The dummy result of a function or `output' arguments of a subroutine
are not allowed to have the POINTER attribute because of a Fortran~90
technicality, namely, that under elemental reference the corresponding 
actual arguments must be array variables, and Fortran~90 does not permit 
an array of pointers to be referenced.\footnote{
        See the final constraint after Rule~R613 of the Fortran~90 standard.
Note the difference between an {\em array of pointers\/}, which cannot 
be declared or referenced in Fortran~90, and a {\em pointer array\/},
which can.
}
The `input' arguments of an elemental reference are prohibited from 
having the POINTER attribute for consistency with the output arguments 
or result.  However, this last constraint does not impose 
any real restrictions on an elemental reference, as the corresponding 
actual arguments {\em can\/} be pointers, in which case they are 
`de-referenced' and their tats are associated with the dummy arguments.  
In fact, the only reason for a dummy argument to be a pointer is so that
its pointer association can be changed, which is not allowed for `input'
arguments.  (Incidentally, since a pure function has only `input' 
arguments, there would be no loss of generality in disallowing dummy 
pointers in pure functions generally.)  Note that the prohibition of 
dummy pointers in pure subroutines that are elementally referenced means 
that all their non-procedure dummy arguments can have their intent 
explicitly specified (and indeed this is required by the constraints for 
pure subroutine interfaces---see Section \ref{pure-proc-interface}) which 
assists the checking of argument usage.

In an elemental reference, any actual argument that is a function
must have a result whose shape agrees with that of the corresponding 
function dummy procedure.  That is, elemental usage does not extend to 
function arguments, as Fortran~90 does not support the concept of an `array' 
of functions.
Naively it might appear that a function actual argument that is associated 
with a scalar dummy function could return an array result provided it 
conforms with the other array arguments of the elemental reference.  
However, this is not meaningful under elemental reference, as an 
array-valued function cannot be decomposed into an `array' of scalar 
function references, as would be required in this context.

Finally, the length of any character dummy argument or a character
dummy result cannot depend on argument {\em values\/} (though it can
be assumed, or depend on the lengths of other character arguments and/or
a character result).  This ensures that under elemental reference, all 
elements of an array argument or result of character type will have the 
same length, as required by Fortran~90.


\end{document}


From chk@cs.rice.edu  Mon Oct 19 02:48:33 1992
Received: from moe.rice.edu by titan.cs.rice.edu (AA28788); Mon, 19 Oct 92 02:48:33 CDT
Received: from titan.cs.rice.edu (cs.rice.edu) by moe.rice.edu (AA22984); Mon, 19 Oct 92 02:48:27 CDT
Received: from DialupEudora (charon.rice.edu) by titan.cs.rice.edu (AA28715); Mon, 19 Oct 92 02:33:50 CDT
Message-Id: <9210190733.AA28715@titan.cs.rice.edu>
Date: Mon, 19 Oct 1992 02:37:34 -0600
To: hpff-core@cs.rice.edu, hpff-forall@rice.edu
From: chk@cs.rice.edu
Subject: Final FORALL draft
X-Attachments: :Macintosh HD:3737:stmt-chapter.tex:

This is the "official" draft of FORALL and INDEPENDENT for the HPFF meeting
Oct 21-23.  It does not contain the recent Syracuse proposals.  It does
contain Tin-Fook Ngai's revised proposal, and several changes to the FORALL
and PURE sections based on John Merlin's and Rex Page's comments.

                                                Chuck

%chapter-head.tex

%Version of August 5, 1992 - David Loveman, Digital Equipment Corporation

\documentstyle[twoside,11pt]{report}
\pagestyle{headings}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin=.25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin=-.5in
\textwidth=6.0in
\textheight=9.0in
\parindent=2em

%the file syntax-macs.tex is physically included below

%syntax-macs.tex

%Version of July 29, 1992 - Guy Steele, Thinking Machines

\newdimen\bnfalign         \bnfalign=2in
\newdimen\bnfopwidth       \bnfopwidth=.3in
\newdimen\bnfindent        \bnfindent=.2in
\newdimen\bnfsep           \bnfsep=6pt
\newdimen\bnfmargin        \bnfmargin=0.5in
\newdimen\codemargin       \codemargin=0.5in
\newdimen\intrinsicmargin  \intrinsicmargin=3em
\newdimen\casemargin       \casemargin=0.75in
\newdimen\argumentmargin   \argumentmargin=1.8in

\def\IT{\it}
\def\RM{\rm}
\let\CHAR=\char
\let\CATCODE=\catcode
\let\DEF=\def
\let\GLOBAL=\global
\let\RELAX=\relax
\let\BEGIN=\begin
\let\END=\end


\def\FUNNYCHARACTIVE{\CATCODE`\a=13 \CATCODE`\b=13 \CATCODE`\c=13 \CATCODE`\d=13
		     \CATCODE`\e=13 \CATCODE`\f=13 \CATCODE`\g=13 \CATCODE`\h=13
		     \CATCODE`\i=13 \CATCODE`\j=13 \CATCODE`\k=13 \CATCODE`\l=13
		     \CATCODE`\m=13 \CATCODE`\n=13 \CATCODE`\o=13 \CATCODE`\p=13
		     \CATCODE`\q=13 \CATCODE`\r=13 \CATCODE`\s=13 \CATCODE`\t=13
		     \CATCODE`\u=13 \CATCODE`\v=13 \CATCODE`\w=13 \CATCODE`\x=13
		     \CATCODE`\y=13 \CATCODE`\z=13 \CATCODE`\[=13 \CATCODE`\]=13
                     \CATCODE`\-=13}

\def\RETURNACTIVE{\CATCODE`\
=13}

\makeatletter
\def\section{\@startsection {section}{1}{\z@}{-3.5ex plus -1ex minus 
 -.2ex}{2.3ex plus .2ex}{\large\sf}}
\def\subsection{\@startsection{subsection}{2}{\z@}{-3.25ex plus -1ex minus 
 -.2ex}{1.5ex plus .2ex}{\large\sf}}
\def\alternative#1 #2#3{\def\@tempa{#1}\def\@tempb{A}\ifx\@tempa\@tempb\else
    \expandafter\@altbumpdown\string#2\@foo\fi
    #2{Version #1: #3}}
\def\@altbumpdown#1#2\@foo{\global\expandafter\advance\csname c@#2\endcsname-1}

\def\@ifpar#1#2{\let\@tempe\par \def\@tempa{#1}\def\@tempb{#2}\futurelet
    \@tempc\@ifnch}

\def\?#1.{\begingroup\def\@tempq{#1}\list{}{\leftmargin\intrinsicmargin}\relax
  \item[]{\bf\@tempq.} \@intrinsictest}
\def\@intrinsictest{\@ifpar{\@intrinsicpar\@intrinsicdesc}{\@intrinsicpar\re
lax}}
\long\def\@intrinsicdesc#1{\list{}{\relax
  \def\@tempb{ Arguments}\ifx\@tempq\@tempb
			  \leftmargin\argumentmargin
			  \else \leftmargin\casemargin \fi
  \labelwidth\leftmargin  \advance\labelwidth -\labelsep
  \parsep 4pt plus 2pt minus 1pt
  \let\makelabel\@intrinsiclabel}#1\endlist}
\long\def\@intrinsicpar#1#2\\{#1{#2}\@ifstar{\@intrinsictest}{\endlist\endgr
oup}}
\def\@intrinsiclabel#1{\setbox0=\hbox{\rm #1}\ifnum\wd0>\labelwidth
  \box0 \else \hbox to \labelwidth{\box0\hfill}\fi}
\def\Case(#1):{\item[{\it Case (#1):}]}
\def\ {\@ifnextchar({\def\@tempq{#1}\@intrinsicopt}{\item[#1]}}
\def\@intrinsicopt(#1){\item[{\@tempq} (#1)]}

\def\MATRIX#1{\relax
    \@ifnextchar,{\@MATRIXTABS{}#1,\@FOO, \hskip0pt plus
1filll\penalty-1\@gobble
  }{\@ifnextchar;{\@MATRIXTABS{}#1,\@FOO; \hskip0pt plus
1filll\penalty-1\@gobble
  }{\@ifnextchar:{\@MATRIXTABS{}#1,\@FOO: \hskip0pt plus
1filll\penalty-1\@gobble
  }{\@ifnextchar.{\hfill\penalty1\null\penalty10000\hskip0pt plus 1filll
		  \@MATRIXTABS{}#1,\@FOO.\penalty-50\@gobble
  }{\@MATRIXTABS{}#1,\@FOO{ }\hskip0pt plus 1filll\penalty-1}}}}}

\def\@MATRIXTABS#1#2,{\@ifnextchar\@FOO{\@MATRIX{#1#2}}{\@MATRIXTABS{#1#2&}}}
\def\@MATRIX#1\@FOO{\(\left[\begin{array}{rrrrrrrrrr}#1\end{array}\right]\)}

\def\@IFSPACEORRETURNNEXT#1#2{\def\@tempa{#1}\def\@tempb{#2}\futurelet\@temp
c\@ifspnx}

{
\FUNNYCHARACTIVE
\GLOBAL\DEF\FUNNYCHARDEF{\RELAX
    \DEFa{{\IT\CHAR"61}}\DEFb{{\IT\CHAR"62}}\DEFc{{\IT\CHAR"63}}\RELAX
    \DEFd{{\IT\CHAR"64}}\DEFe{{\IT\CHAR"65}}\DEFf{{\IT\CHAR"66}}\RELAX
    \DEFg{{\IT\CHAR"67}}\DEFh{{\IT\CHAR"68}}\DEFi{{\IT\CHAR"69}}\RELAX
    \DEFj{{\IT\CHAR"6A}}\DEFk{{\IT\CHAR"6B}}\DEFl{{\IT\CHAR"6C}}\RELAX
    \DEFm{{\IT\CHAR"6D}}\DEFn{{\IT\CHAR"6E}}\DEFo{{\IT\CHAR"6F}}\RELAX
    \DEFp{{\IT\CHAR"70}}\DEFq{{\IT\CHAR"71}}\DEFr{{\IT\CHAR"72}}\RELAX
    \DEFs{{\IT\CHAR"73}}\DEFt{{\IT\CHAR"74}}\DEFu{{\IT\CHAR"75}}\RELAX
    \DEFv{{\IT\CHAR"76}}\DEFw{{\IT\CHAR"77}}\DEFx{{\IT\CHAR"78}}\RELAX
    \DEFy{{\IT\CHAR"79}}\DEFz{{\IT\CHAR"7A}}\DEF[{{\RM\CHAR"5B}}\RELAX
    \DEF]{{\RM\CHAR"5D}}\DEF-{\@IFSPACEORRETURNNEXT{{\CHAR"2D}}{{\IT\CHAR"2D}}}}
}

%%% Warning!  Devious return-character machinations in the next several lines!
%%%           Don't even *breathe* on these macros!
{\RETURNACTIVE\global\def\RETURNDEF{\def
{\@ifnextchar\FNB{}{\@stopline\@ifnextchar
{\@NEWBNFRULE}{\penalty\@M\@startline\ignorespaces}}}}\global\def\@NEWBNFRULE
{\vskip\bnfsep\@startline\ignorespaces}\global\def\@ifspnx{\ifx\@tempc\@spto
ken \let\@tempd\@tempa \else \ifx\@tempc
\let\@tempd\@tempa \else \let\@tempd\@tempb \fi\fi \@tempd}}
%%% End of bizarro return-character machinations.

\def\IS{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth  \hskip-\bnfalign
  \hbox to \bnfalign{\unhbox\@curfield\hfill}\hbox to \bnfopwidth{\bf is
\hfill}}
\def\OR{\@stopfield\global\setbox\@curfield\hbox\bgroup
  \hskip-\bnfindent \hskip-\bnfopwidth \hbox to \bnfopwidth{\bf or \hfill}}
\def\R#1 {\hbox to 0pt{\hskip-\bnfmargin R#1\hfill}}
\def\XBNF{\FUNNYCHARDEF\FUNNYCHARACTIVE\RETURNDEF\RETURNACTIVE
  \def\@underbarchar{{\char"5F}}\tt\frenchspacing
  \advance\@totalleftmargin\bnfmargin \tabbing
  \hskip\bnfalign\hskip\bnfopwidth\hskip\bnfindent\=\kill\>\+\@gobblecr}
\def\endXBNF{\-\endtabbing}

\def\BNF{\BEGIN{XBNF}}
\def\FNB{\END{XBNF}}

\begingroup \catcode `|=0 \catcode`\\=12
|gdef|@XCODE#1\EDOC{#1|endtrivlist|end{tt}}
|endgroup

\def\CODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces \@XCODE}
\def\ICODE{\begin{tt}\advance\@totalleftmargin\codemargin \@verbatim
   \def\@underbarchar{{\char"5F}}\frenchspacing \@vobeyspaces
   \FUNNYCHARDEF\FUNNYCHARACTIVE \UNDERBARACTIVE\UNDERBARDEF \@XCODE}

\def\@underbarsub#1{{\ifmmode _{#1}\else {$_{#1}$}\fi}}
\let\@underbarchar\_
\def\@underbar{\let\@tempq\@underbarsub\if\@tempz A\let\@tempq\@underbarchar\fi
  \if\@tempz B\let\@tempq\@underbarchar\fi\if\@tempz
C\let\@tempq\@underbarchar\fi
  \if\@tempz D\let\@tempq\@underbarchar\fi\if\@tempz
E\let\@tempq\@underbarchar\fi
  \if\@tempz F\let\@tempq\@underbarchar\fi\if\@tempz
G\let\@tempq\@underbarchar\fi
  \if\@tempz H\let\@tempq\@underbarchar\fi\if\@tempz
I\let\@tempq\@underbarchar\fi
  \if\@tempz J\let\@tempq\@underbarchar\fi\if\@tempz
K\let\@tempq\@underbarchar\fi
  \if\@tempz L\let\@tempq\@underbarchar\fi\if\@tempz
M\let\@tempq\@underbarchar\fi
  \if\@tempz N\let\@tempq\@underbarchar\fi\if\@tempz
O\let\@tempq\@underbarchar\fi
  \if\@tempz P\let\@tempq\@underbarchar\fi\if\@tempz
Q\let\@tempq\@underbarchar\fi
  \if\@tempz R\let\@tempq\@underbarchar\fi\if\@tempz
S\let\@tempq\@underbarchar\fi
  \if\@tempz T\let\@tempq\@underbarchar\fi\if\@tempz
U\let\@tempq\@underbarchar\fi
  \if\@tempz V\let\@tempq\@underbarchar\fi\if\@tempz
W\let\@tempq\@underbarchar\fi
  \if\@tempz X\let\@tempq\@underbarchar\fi\if\@tempz
Y\let\@tempq\@underbarchar\fi
  \if\@tempz Z\let\@tempq\@underbarchar\fi\@tempq}
\def\@under{\futurelet\@tempz\@underbar}

\def\UNDERBARACTIVE{\CATCODE`\_=13}
\UNDERBARACTIVE
\def\UNDERBARDEF{\def_{\protect\@under}}
\UNDERBARDEF

\catcode`\$=11  

%the following line would allow derived-type component references 
%FOO%BAR in running text, but not allow LaTeX comments
%without this line, write FOO\%BAR
%\catcode`\%=11 

\makeatother

%end of file syntax-macs.tex


\title{{\em D R A F T} \\High Performance Fortran \\ FORALL Proposal}
\author{High Performance Fortran Forum}
\date{October 16, 1992}

\hyphenation{RE-DIS-TRIB-UT-ABLE sub-script Wil-liam-son}

\begin{document}

\maketitle

\newpage

\pagenumbering{roman}

\vspace*{4.5in}

This is the result of a LaTeX run of a draft of a single chapter of 
the HPFF Final Report document.

\vspace*{3.0in}

\copyright 1992 Rice University, Houston Texas.  Permission to copy 
without fee all or part of this material is granted, provided the 
Rice University copyright notice and the title of this document 
appear, and notice is given that copying is by permission of Rice 
University.

\tableofcontents

\newpage

\pagenumbering{arabic}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%put text of chapter here


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%put \end{document} here
%statements.tex


%Revision history:
%August 2, 1992 - Original version of David Loveman, Digital Equipment
%	Corporation and Charles Koelbel, Rice University
%August 19, 1992 - chk - cleaned up discrepancies with Fortran 90 array 
%	expressions
%August 20, 1992 - chk - added DO INDEPENDENT section, Guy Steele's 
%	pointer proposals
%August 24, 1992 - chk - ELEMENTAL functions proposal
%August 31, 1992 - chk - PURE functions proposal
%September 3, 1992 - chk - reorganized sections
%September 21, 1992 - chk - began incorporating updates from Sept
%	10-11 meeting
%October 14, 1992 - chk - Incorporated ON and revised PURE 


\newenvironment{constraints}{
        \begin{list}{Constraint:}{
                \settowidth{\labelwidth}{Constraint:}
                \settowidth{\labelsep}{w}
                \settowidth{\leftmargin}{Constraint:w}
                \setlength{\rightmargin}{0cm}
        }
}{
        \end{list}
}


\chapter{Statements}
\label{statements}

\section{Overview}

\footnote{Version of September 21, 1992 
- Charles Koelbel, Rice University.}
The purpose of the FORALL construct is to provide a convenient syntax for 
simultaneous assignments to large groups of array elements.
In this respect it is very similar to the functionality provided by array 
assignments and WHERE constructs.
FORALL differs from these constructs primarily in its syntax, which is 
intended to be more suggestive of local operations on each element of an 
array.
It is also possible to specify slightly more general array regions than 
are allowed by the basic array triplet notation.
Both single-statement and block FORALLs are defined in this proposal.

The FORALL statement, in both its single-statment and block forms, was
accepted by the High Performance Fortran Forum working group on its
second reading September 10, 1992.
This vote was contingent on a more complete definition of PURE
functions.
The idea of PURE functions was accepted by the HPFF working group at
its first reading on September 10, 1992.
However, the definition at that time was not completely acceptable due to
technical errors; those errors discussed at that time have been
revised in this draft.
The single-statement form of FORALL was accepted by the HPFF working
group as part of the official HPF subset in a first reading on
September 11, 1992; the block FORALL was excluded from the subset at
the same time.

The purpose of the INDEPENDENT directive is to allow the programmer to
give additional information to the compiler.
The user can assert that no data object is defined by one iteration of
a loop and used (read or written) by another; similar information can
be provided about the combinations of index values in a FORALL
statement.
A compiler may rely on this information to make optimizations, such as
parallelization or reorganizing communication.
If the assertion is true, the semantics of the program are not
changed; if it is false, the program is not standard-conforming and
has no defined meaning.
The ``Other Proposals'' section contains a number of additional
assertions with this flavor.

The INDEPENDENT assertion was accepted by the High Performance Fortran
Forum working group on its second reading on September 10, 1992.
The group also directed the FORALL subgroup to further explore methods for
allowing reduction operations to be accomplished in INDEPENDENT loops.

The following proposals are designed as a modification of the Fortran 90 
standard; all references to rule numbers and section numbers pertain to 
that document unless otherwise noted.


\section{Element Array Assignment - FORALL}
 

\label{forall-stmt}

\footnote{Version of October 18, 1992 - David
Loveman, Digital Equipment Corporation and Charles Koelbel, Rice
University.
Approved at second reading on September 10, 1992.
Some rephrasings applied since then.}
The element array
assignment statement (FORALL statement) is used to specify an array
assignment in terms of array elements or groups of array sections.
The element array assignment may be
masked with a scalar logical expression.  
In functionality, it is similar to array assignment statements;
however, more general array sections can be assigned in FORALL.

Rule R215 for {\it
executable-construct} is extended to include the {\it forall-stmt}.

\subsection{General Form of Element Array Assignment}

                                                                \BNF
forall-stmt          \IS FORALL (forall-triplet-spec-list
                       [,scalar-mask-expr ]) forall-assignment
                                                                \FNB

\noindent Constraint: Any procedure referenced in a {\it forall-stmt\/}, 
   including one referenced by a defined operation or assignment in the 
   {\it forall-assignment}, 
   must be a ``pure'' function as defined in Section~\ref{forall-pure}

                                                                \BNF
forall-triplet-spec  \IS subscript-name = subscript : subscript 
                          [ : stride]
                                                                \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

                                                                       \BNF
forall-assignment    \IS assignment-stmt
                     \OR pointer-assignment-stmt
                                                                       \FNB

  Constraint: Variables defined in an assignment statement corresponding
  to different active values of subscript names from the forall triplet
  specification must be distinct and must have no subobjects in common.

\noindent
Constraint: The {\it variable\/} of an {\it assignment-stmt\/} must 
be a distinct object for each active combination (\ref{forall-interp}) 
of {\it subscript-name\/} values.
In this context, two objects are considered distinct if they have no 
subobjects in common. 

\noindent
Constraint:  The {\it pointer-object} defined in a {\it 
pointer-assignment-stmt\/} must be a distinct object for every active 
combination (\ref{forall-interp}) of {\it subscript-name\/} values.
Note that the {\it pointer-object} has no subobjects.

For each subscript name in the {\it forall-assignment}, the set of
permitted values is determined on entry to the statement and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
m3}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + m3) / m3 \rfloor \leq 0\), the {\it
forall-assignment} is not executed.

A ``pure'' function is defined in Section~\ref{forall-pure}
and is syntactically guarnateed not to have side effects.

Examples of element array assignments are:

                                                                  \CODE
REAL H(N,N), X(N,N), Y(N,N)
TYPE MONARCH
    INTEGER, POINTER :: P
END TYPE MONARCH
TYPE(MONARCH) :: A(N)
INTEGER B(N)
      ...
FORALL (I=1:N, J=1:N) H(I,J) = 1.0 / REAL(I + J - 1)

FORALL (I=1:N, J=1:N, Y(I,J) .NE. 0.0) X(I,J) = 1.0 / Y(I,J)

! Set up a butterfly pattern
FORALL (J=1:N)  A(J)%P => B(1+IEOR(J-1,2**K))
                                                                  \EDOC 

\subsection{Interpretation of Element Array Assignments}
\label{forall-interp}  

Execution of an element array assignment consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Evaluation in any order of the {\it expr} or {\it target} and all 
subscripts contained in the 
{\it array-element} or {\it array-section} in the {\it forall-assignment} 
for all active combinations of {\em subscript-name} values.
In the case of pointer assignment where the {\it target} is not a 
pointer, the evaluation consists of identifying the object referenced 
rather than computing its value.

\item Assignment of the computed {\it expr} values to the corresponding 
elements specified by {\it array-element} or {\it array-section}.
The assignments may be made in any order.
In the case of a pointer assignment where the {\it target} is not a 
pointer, this assignment consists of associating the {\it array-element} 
with the object referenced.

\end{enumerate}

If the scalar mask expression is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL statement itself. 


The {\it forall-assignment} must not cause any element of the array
being assigned to be assigned a value more than once.

Since a function called from a FORALL construct must be pure, it 
is impossible for that function's evaluation to affect other expressions' 
evaluations, either for the same combination of 
{\it subscript-name} values or for a different combination.
In addition, it is possible that the compiler can perform 
more extensive optimizations when all functions are declared PURE.


\subsection{Scalarization of the FORALL Statement}

A {\it forall-stmt} of the general form:

                                                                   \CODE
FORALL (v1=l1:u1:s1, v2=l1:u2:s2, ..., vn=ln:un:sn , mask ) &
      a(e1,...,em) = rhs
                                                                   \EDOC

\noindent
is equivalent to the following standard Fortran 90 code:

\raggedbottom
                                                                   \CODE
! Evaluate subscript and stride expressions.
! These assignments may be executed in any order.
templ1 = l1
tempu1 = u1
temps1 = s1
templ2 = l2
tempu2 = u2
temps2 = s2
  ...
templn = ln
tempun = un
tempsn = sn

! Evaluate the scalar mask expression, and evaluate the
! forall-assignment subexpressions where the mask is true.
! The iterations of this loop nest may be executed in any order.
! The assignments in the loop body may be executed in any order,
! provided that the mask element is evaluated before any other 
! expression in the same iteration.
! The loop body need not be executed atomically.
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        tempmask(v1,v2,...,vn) = mask
        IF (tempmask(v1,v2,...,vn)) THEN
          temprhs(v1,v2,...,vn) = rhs
          tempe1(v1,v2,...,vn) = e1
          tempe2(v1,v2,...,vn) = e2
          ....
          tempem(v1,v2,...,vn) = em
        END IF
      END DO
	  ...
  END DO
END DO

! Perform the assignment of these values to the corresponding 
! elements of the array being assigned to
! The iterations of this loop nest may be executed in any order.
DO v1=templ1,tempu1,temps1
  DO v2=templ2,tempu2,temps2
    ...
      DO vn=templn,tempun,tempsn
        IF (tempmask(v1,v2,...,vn)) THEN
          a(tempe1(v1,...vn),...,tempem(v1,...vn)) = &
            temprhs(v1,v2,...,vn)
        END IF
      END DO
	  ...
  END DO
END DO
                                                                      \EDOC
\flushbottom

\subsection{Consequences of the Definition of the FORALL Statement}

This section should be moved to the comments chapter in the final
draft.

\begin{itemize}

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item Each of the {\it subscript-name}s must appear within the
subscript expression(s) on the left-hand-side.  
(This is a syntactic
consequence of the semantic rule that no two execution instances of the
body may assign to the same array element.)

\item Right-hand sides and subscripts on the left hand side of a {\it 
forall-assignment} are
evaluated only for valid combinations of subscript names for which the
scalar mask expression is true.

\item The intent of ``pure'' functions is to provide a class of
functions without side-effects, and to allow this side-effect freedom
to be checked syntactically.

\end{itemize}


\section{FORALL Construct}

\label{forall-construct}

\footnote{Version of October 18, 1992 -
David Loveman, Digital Equipment Corporation and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992.
Small textual changes made since then.}
The FORALL construct is a generalization of the element array
assignment statement allowing multiple assignments, masked array 
assignments, and nested FORALL statements to be
controlled by a single {\it forall-triplet-spec-list}.  Rule R215 for
{\it executable-construct} is extended to include the {\it
forall-construct}.

\subsection{General Form of the FORALL Construct}

                                                                \BNF
forall-construct   \IS FORALL (forall-triplet-spec-list [,scalar-mask-expr ])
                               forall-body-stmt-list
                            END FORALL
                                                                \FNB

\noindent Constraint: Any procedure referenced in a {\it forall-construct\/}, 
   including one referenced by a defined operation or assignment in a 
   {\it forall-body-stmt}, 
   must be a ``pure'' function as defined in Section~\ref{forall-pure}

                                                                \BNF
forall-body-stmt     \IS forall-assignment
                     \OR where-stmt
                     \OR forall-stmt
                     \OR forall-construct
                                                                \FNB

\noindent
Constraint:  {\it subscript-name} must be a {\it scalar-name} of type
integer.

\noindent
Constraint:  A {\it subscript} or a {\it stride} in a {\it
forall-triplet-spec} must not contain a reference to any {\it
subscript-name} in the {\it forall-triplet-spec-list}.

\noindent
Constraint:  Any left-hand side {\it array-section} or {\it 
array-element} in any {\it forall-body-stmt}
must reference all of the {\it forall-triplet-spec
subscript-names}.

\noindent
Constraint: If a {\it forall-stmt} or {\it forall-construct} is nested 
within a {\it forall-construct}, then the inner FORALL may not redefine 
any {\it subscript-name} used in the outer {\it forall-construct}.
This rule applies recursively in the event of multiple nesting levels.

For each subscript name in the {\it forall-assignment}s, the set of
permitted values is determined on entry to the construct and is
\[  m1 + (k-1) * m3, where~k = 1, 2, ..., \lfloor \frac{m2 - m1 +
m3)}{m3} \rfloor  \]
and where {\it m1}, {\it m2}, and {\it m3} are the values of the first
subscript, the second subscript, and the stride respectively in the
{\it forall-triplet-spec}.  If {\it stride} is missing, it is as if it
were present with a value of the integer 1.  The expression {\it
stride} must not have the value 0.  If for some subscript name 
\(\lfloor (m2 -m1 + m3) / m3 \rfloor \leq 0\), the {\it
forall-assignment}s are not  executed.

Examples of the FORALL construct are:

                                                                 \CODE
FORALL ( I = 2:N-1, J = 2:N-1 )
  A(I,J) = A(I,J-1) + A(I,J+1) + A(I-1,J) + A(I+1,J)
  B(I,J) = A(I,J)
END FORALL

FORALL ( I = 1:N-1 )
  FORALL ( J = I+1:N )
    A(I,J) = A(J,I)
  END FORALL
END FORALL

FORALL ( I = 1:N, J = 1:N )
  A(I,J) = MERGE( A(I,J), A(I,J)**2, I.EQ.J )
  WHERE ( .NOT. DONE(I,J,1:M) )
    B(I,J,1:M) = B(I,J,1:M)*X
  END WHERE
END FORALL
                                                                \EDOC


\subsection{Interpretation of the FORALL Construct}

Execution of a FORALL construct consists of the following steps:

\begin{enumerate}

\item Evaluation in any order of the subscript and stride expressions in 
the {\it forall-triplet-spec-list}.
The set of valid combinations of {\it subscript-name} values is then the 
cartesian product of the sets defined by these triplets.

\item Evaluation of the {\it scalar-mask-expr} for all valid combinations 
of {\em subscript-name} values.
The mask elements may be evaluated in any order.
The set of active combinations of {\it subscript-name} values is the 
subset of the valid combinations for which the mask evaluates to true.

\item Execute the {\it forall-body-stmts} in the order they appear.
Each statement is executed completely (that is, for all active 
combinations of {\it subscript-name} values) according to the following 
interpretation:

\begin{enumerate}

\item Statements in the {\it forall-assignment} category (i.e.\ 
assignment statements and pointer assignment statements) evaluate the 
right-hand side {\it expr} and any left-hand side subscripts for all 
active {\it subscript-name} values,
then assign the right-hand side results to the corresponding left-hand 
side references.

\item WHERE statements evaluate their {\it mask-expr} for all active 
combinations of values of {\it subscript-name}s.
All elements of all masks may be evaluated in any order. 
The assignments within the WHERE branch of the statement are then 
executed in order using the above interpretation of array assignments 
within the FORALL, but the only array elements assigned are those 
selected by both the active {\it subscript-names} and the WHERE mask.
Finally, the assignments in the ELSEWHERE branch are executed if that 
branch is present.
The assignments here are also treated as array assignments, but elements 
are only assigned if they are selected by both the active combinations 
and by the negation of the WHERE mask.

\item FORALL statements and FORALL constructs first evaluate the 
subscript and stride expressions in 
the {\it forall-triplet-spec-list} for all active combinations of the 
outer FORALL constructs.
The set of valid combinations of {\it subscript-names} for the inner 
FORALL is then the union of the sets defined by these bounds and strides 
for each active combination of the outer {\it subscript-names}.
For example, the valid set of the inner FORALL in the second example in 
the last section is the upper triangle (not including the main diagonal) 
of the \(n \times n\) matrix a.
The scalar mask expression is then evaluated for all valid combinations 
of the inner FORALL's {\it subscript-names} to produce the set of active 
combinations.
If there is no scalar mask expression, it is assumed to be always true.
Each statement in the inner FORALL is then executed for each valid 
combination (of the inner FORALL), recursively following the 
interpretations given in this section.

\end{enumerate}

\end{enumerate}

If the scalar mask expresion is omitted, it is as if it were
present with the value true.

The scope of a
{\it subscript-name} is the FORALL construct itself. 

A single {\it forall-assignment\/}  must obey the same restrictions in a 
{\it forall-construct\/} as in a simple {\it forall-stmt}.
(Note that the lowest level of nested statements must always be an 
assignment statement.)
For example, an assignment may not cause the same array element to be 
assigned more than once.
Different statements may, however, assign to the 
same array element, and assignments made in one
statement may affect the execution of a later statement.

\subsection{Scalarization of the FORALL Construct}

A {\it forall-construct} of the form:

                                                                \CODE
FORALL (... e1 ... e2 ... en ...)
    s1
    s2
     ...
    sn
END FORALL
                                                                \EDOC
where each si is a forall-assignment is equivalent to the following scalar code:
                                                                \CODE
temp1 = e1
temp2 = e2
 ...
tempn = en
FORALL (... temp1 ... temp2 ... tempn ...) s1
FORALL (... temp1 ... temp2 ... tempn ...) s2
   ...
FORALL (... temp1 ... temp2 ... tempn ...) sn
                                                                \EDOC
A similar statement can be made using FORALL constructs when the 
si may be WHERE or FORALL constructs.

A {\it forall-construct} of the form:
                                                                \CODE
FORALL ( v1=l1:u1:s1, mask )
  WHERE ( mask2(l2:u2:s2) )
    a(l3:u3:s3) = rhs1
  ELSEWHERE
    a(l4:u4:s4) = rhs2
  END WHERE
END FORALL
                                                                \EDOC
is equivalent to the following standard Fortran 90 code:
                                                                \CODE
! Evaluate subscript and stride expressions.
! These assignments can be made in any order
templ1 = l1
tempu1 = u1
temps1 = s1

! Evaluate the FORALL mask expression.
! The iterations of this loop may be executed in any order.
DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

! Evaluate the bounds and masks for the WHERE
! The iterations of this loop may be executed in any order.
! The assignments in the loop body may be executed in any order,
! provided the mask bounds and stride are computed before the mask.
! The loop body need not be executed atomically.
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    tmpl2(v1) = l2
    tmpu2(v1) = u2
    tmps2(v1) = s2
    tmpl3(v1) = l3
    tmpu3(v1) = u3
    tmps3(v1) = s3
    tmpl4(v1) = l4
    tmpu4(v1) = u4
    tmps4(v1) = s4
    tempmask2(tmpl2(v1):tmpu2(v1):tmps2(v1)) = &
      mask2(tmpl2(v1):tmpu2(v1):tmps2(v1))
  END IF
END DO

! Evaluate the WHERE branch
! The iterations of this loop may be executed in any order
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(tmpl2(v1):tmpu2(v1):tmps2(v1)) )
      temprhs1(tmpl2(v1):tmpu2(v1),tmps2(v1)) = rhs1
    END WHERE
  END IF
END DO
! The iterations of this loop may be executed in any order
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( tempmask2(tmpl2(v1):tmpu2(v1):tmps2(v1)) )
      a(tmpl3(v1):tmpu3(v1):tmps3(v1)) = &
        temprhs1(tmpl2(v1):tmpu2(v1):tmps2(v1))
    END WHERE
  END IF
END DO

! Evaluate the ELSEWHERE branch
! The iterations of this loop may be executed in any order.
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(tmpl2(v1):tmpu2(v1):tmps2(v1)) )
      temprhs2(tmpl2(v1):tmpu2(v1):tmps2(v1)) = rhs2
    END WHERE
  END IF
END DO
! The iterations of this loop may be executed in any order.
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    WHERE ( .not. tempmask2(tmpl2(v1):tmpu2(v1):tmps2(v1)) )
      a(tmpl4(v1):tmpu4(v1):tmps4(v1)) = &
        temprhs2(tmpl2(v1):tmpu2(v1):tmps4(v1))
    END WHERE
  END IF
END DO
                                                                   \EDOC
The extension to multiple dimensions (in either the FORALL index space or 
the array dimensions) is straightforward.

A {\it forall-construct} of the form:
                                                                   \CODE
FORALL ( v1=l1:u1:s1, mask )
  FORALL ( v2=l2:u2:s2, mask2 )
    a(e1,e2) = rhs1
	b(e3,e4) = rhs2
  END FORALL
END FORALL
                                                                  \EDOC
is equivalent to the following standard Fortran 90 code:
                                                                   \CODE
! Evaluate subscript and stride expressions and outer mask.
! These assignments may be executed in any order.
templ1 = l1
tempu1 = u1
temps1 = s1
! The iterations of this loop may be executed in any order.
DO v1=templ1,tempu1,temps1
 tempmask(v1) = mask
END DO

! Evaluate the inner FORALL bounds, etc
! The iterations of this loop may be executed in any order.
! The assignments in the loop body may be executed in any order,
! provided that the mask bounds are computed before the mask
! itself.
! The loop body need not be executed atomically.
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    templ2(v1) = l2
    tempu2(v1) = u2
    temps2(v1) = s2
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      tempmask2(v1,v2) = mask2
    END DO
  END IF
END DO

! Evaluate first statement
! The iterations of this loop may be executed in any order.
! The assignments in this loop body may be executed in any order.
! The loop body need not be executed atomically.
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs1(v1,v2) = rhs1
        tmpe1(v1,v2) = e1
        tmpe2(v1,v2) = e2
      END IF
    END DO
  END IF
END DO
! The iterations of this loop may be executed in any order.
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        a(tmpe1(v1,v2),tmpe2(v1,v2)) = temprhs1(v1,v2)
      END IF
    END DO
  END IF
END DO

! Evaluate second statement.
! Ordering constraints are as for the first statement.
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        temprhs2(v1,v2) = rhs2
        tmpe3(v1,v2) = e3
        tmpe4(v1,v2) = e4
      END IF
    END DO
  END IF
END DO
DO v1=templ1,tempu1,temps1
  IF (tempmask(v1)) THEN
    DO v2 = templ2(v1),tempu2(v1),temps2(v1)
      IF ( tempmask2(v1,v2) ) THEN
        b(tmpe3(v1,v2),tmpe4(v1,v2)) = temprhs2(v1,v2)
      END IF
    END DO
  END IF
END DO
                                                                   \EDOC


\subsection{Consequences of the Definition of the FORALL Construct}

This section should be moved to the comments chapter of the final
draft.

\begin{itemize}

\item A block FORALL means roughly the same as replicating the FORALL
header in front of each array assignment statement in the block, except
that any expressions in the FORALL header are evaluated only once,
rather than being re-evaluated before each of the statements in the body.
(The exceptions to this rule are nested FORALL statements and WHERE
statements, which introduce syntactic and functional complications
into the copying.)

\item One may think of a block FORALL as synchronizing twice per
contained assignment statement: once after handling the rhs and other 
expressions
but before performing assignments, and once after all assignments have
been performed but before commencing the next statement.  (In practice,
appropriate dependence analysis will often permit the compiler to
eliminate unnecessary synchronizations.)

\item The {\it scalar-mask-expr} may depend on the {\it subscript-name}s.

\item In general, any expression in a FORALL is evaluated only for valid 
combinations of all surrounding subscript names for which all the
scalar mask expressions are true.

\item Nested FORALL bounds and strides can depend on outer FORALL {\it 
subscript-names}.  They cannot redefine those names, even temporarily (if 
they did there  would be no way to avoid multiple assignments to the same 
array element).

\item Dependences are allowed from one statement to later statements, but 
never from an assignment statement to itself.

\end{itemize}


\section{Pure Procedures and Elemental Reference}

\label{forall-pure}

\footnote{Version of October 18, 
1992 - John Merlin, University of Southampton, 
and Charles Koelbel, Rice University. Approved at first reading on
September 10, 1992, subject to technical revisions for correctness.
The suggestions made there have been incorporated in this draft.
Other changes have also been made to correct technical and aesthetic 
problems.}
A {\it pure function\/} is one that produces no side effects.  This
means that the only effect of a pure function reference on the state 
of a program is to return a result---it does not modify the values, 
pointer associations or data mapping of any of its arguments or global 
data, and performs no I/O.
A {\em pure subroutine\/} is one that produces no side effects
except for modifying the values and/or pointer associations of certain
arguments.  

A pure procedure (i.e.\ function or subroutine) may be used in any way 
that a normal procedure can.
In addition, a procedure is required to be pure if it is used in any 
of the following contexts:
\begin{itemize}
        \item a FORALL statement or construct;
        \item an elemental reference (see section \ref{elem-ref-of-pure-procs});
        \item within the body of a pure procedure;
        \item as an actual argument in a pure procedure reference.
\end{itemize}

The side-effect freedom of a pure function ensures that it can be invoked
concurrently in a FORALL or elemental reference without undesirable
consequences such as non-determinism, and additionally assists the efficient
implementation of concurrent execution.  A pure subroutine can be
invoked concurrently in an elemental reference, and since its side effects
are limited to a known subset of its arguments (as we shall see later), 
an implementation can check that a reference obeys Fortran~90's restrictions 
on argument association and is consequently deterministic.


\subsection{Pure procedure declaration and interface}

If a non-intrinsic procedure is used in a context that requires it to be 
pure, then its interface must be explicit in the scope of that use, 
and both its interface body (if provided) and its definition must contain 
the PURE declaration.  The form of this declaration is 
a directive immediately after the {\it function-stmt\/} or {\it
subroutine-stmt\/} of the procedure interface body or definition:
                                                                 \BNF
pure-directive \IS !HPF$ PURE [procedure-name]
                                                                 \FNB

Intrinsic functions, including HPF intrinsic functions, are always pure 
and require no explicit declaration of this fact;  intrinsic subroutines 
are pure if they are elemental (e.g.\ MVBITS) but not otherwise.
A statement function is pure if and only if all functions that it
references are pure.

\subsubsection{Pure function definition}

To define pure functions, Rule~R1215 of the Fortran~90 standard is changed 
to:
                                                                 \BNF
function-subprogram \IS         function-stmt
                                [pure-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-function-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.2 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it pure-directive\/}, it must match the {\it function-name\/} in the 
{\it function-stmt\/}.

        \item In a pure function, a local variable must not have the 
SAVE attribute. (Note that this means that a local variable cannot be 
initialised in a {\it type-declaration-stmt\/} or a
{\it data-stmt\/}, which imply the SAVE attribute.)

        \item A pure function must not use a dummy argument, a global 
variable, or an object that is storage associated with a global variable,
or a subobject thereof, in the following contexts:
        \begin{itemize}
                \item as the assignment variable of an {\it assignment-stmt\/}
or {\it forall-assignment\/};
                \item as a DO variable or implied DO variable, or as a 
{\it subscript-name\/} in a {\it forall-triplet-spec\/};
                \item in an {\it assign-stmt\/};
                \item as the {\it pointer-object\/} or {\it target\/}
of a {\it pointer-assignment-stmt\/};
                \item as the {\it expr\/} of an {\it assignment-stmt\/}
or {\it forall-assignment\/} whose assignment variable is of a derived 
type, or is a pointer to a derived type, that has a pointer component 
at any level of component selection;
                \item as an {\it allocate-object\/} or {\it stat-variable\/}
in an {\it allocate-stmt\/} or {\it deallocate-stmt\/}, or as a
{\it pointer-object\/} in a {\it nullify-stmt\/};
                \item as an actual argument associated with a dummy 
argument with INTENT (OUT) or (INOUT) or with the POINTER attribute.
        \end{itemize}

        \item Any procedure referenced in a pure function, including
one referenced via a defined operation or assignment, must be pure.

        \item In a pure function, a dummy argument or local variable 
must not appear in an {\it align-directive\/}, {\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a pure function, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A pure function must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a function is pure, a {\it pure-directive\/} must be given.

The above constraints are designed to guarantee that a pure function
is free from side effects (i.e.\ modifications of data visible outside
the function), which means that it is safe to reference concurrently, 
as explained earlier.

The second constraint ensures that a pure function does not retain
an internal state between calls, which would allow side-effects between 
calls to the same procedure.

The third constraint ensures that dummy arguments and global variables
are not modified by the function.
In the case of a dummy or global pointer, this applies to both its 
pointer association and its target value, so it cannot be subject to 
a pointer assignment or to an ALLOCATE, DEALLOCATE or NULLIFY
statement.
Incidentally, these constraints imply that only local variables and the
dummy result variable can be subject to assignment or pointer assignment.

In addition, a dummy or global data object cannot be the {\it target\/}
of a pointer assignment (i.e.\ it cannot be used as the right hand side
of a pointer assignment to a local pointer or to the result variable), 
for then its value could be modified via the pointer.

In connection with the last point, it should be noted that an ordinary 
(as opposed to pointer) assignment to a variable of derived type that has 
a pointer component at any level of component selection may result in a 
{\em pointer\/} assignment to the pointer component of the variable.
That is certainly the case for an intrinsic assignment.  In that case
the expression on the right hand side of the assignment has the same type 
as the assignment variable, and the assignment results in a pointer 
assignment of the pointer components of the expression result to the
corresponding components of the variable (see section 7.5.1.5 of the 
Fortran~90 standard).  However, it may also be the case for a 
{\em defined\/} assignment to such a variable, even if the data type of 
the expression has no pointer components;  the defined assignment may still 
involve pointer assignment of part or all of the expression result to the 
pointer components of the assignment variable.  Therefore, a dummy or 
global object cannot be used as the right hand side of any assignment to 
a variable of derived type with pointer components, for then it, or part 
of it, might be the target of a pointer assignment, in violation of the 
restriction mentioned above.

(Incidentally, the last two paragraphs only prevent the reference of 
a dummy or global object as the {\em only\/} object on the right hand
side of a pointer assignment or an assignment to a variable with pointer
components.  There are no constraints on its reference as an operand, 
actual argument, subscript expression, etc.\ in these circumstances).

Finally, a dummy or global data object cannot be used in a procedure 
reference as an actual argument associated with a dummy argument of
INTENT (OUT) or (INOUT) or with a dummy pointer, for then it may be
modified by the procedure reference.  
This constraint, like the others, can be statically checked, since any
procedure referenced within a pure function must be either a pure 
function, which does not modify its arguments, or a pure subroutine, 
whose interface must specify the INTENT or POINTER attributes of its 
arguments (see below).
Incidentally, notice that in this context it is assumed that an actual 
argument associated with a dummy pointer is modified, since Fortran~90 
does not allow its intent to be specified.

Constraint 4 ensures that all procedures called from a pure function 
are themselves pure and hence side effect free, except, in the case of
subroutines, for modifying actual arguments associated with dummy pointers 
or dummy arguments with INTENT(OUT) or (INOUT).  As we have just 
explained, it can be checked that global or dummy objects are not used
in such arguments, which would violate the required side-effect freedom.

Constraints 5 and 6 protect dummy and global data objects from realignment 
and redistribution (another type of side effect).  
In addition, constraint 5 prevents explicit declaration of the mapping 
(i.e.\ alignment and distribution) of dummy arguments and local variables.  
This is because the function may be invoked concurrently, with each 
invocation operating on a segment of data whose distribution is specific 
to that invocation.  Thus, the distribution of a dummy object must be 
`assumed' from the corresponding actual argument.  
Also, it is left to the implementation to determine a suitable mapping 
of the local variables, which would typically depend on the mapping of 
the dummy arguments.

Constraint 7 prevents I/O, whose order would be non-deterministic in 
the context of concurrent execution.  A PAUSE statement requires input
and so is disallowed for the same reason.


\subsubsection{Pure subroutine definition}

To define pure subroutines, Rule~R1219 is changed to:
                                                                 \BNF
subroutine-subprogram \IS       subroutine-stmt
                                [pure-directive]
                                [specification-part]
                                [execution-part]
                                [internal-subprogram-part]
                                end-subroutine-stmt
                                                                \FNB
with the following additional constraints in Section~12.5.2.3 of the 
Fortran~90 standard:
\begin{constraints}

        \item If a {\it procedure-name\/} is present in the 
{\it pure-directive\/}, it must match the {\it sub\-rou\-tine-name\/} in the 
{\it subroutine-stmt\/}.

        \item The {\it specification-part\/} of a pure subroutine must 
specify the intents of all non-pointer and non-procedure dummy arguments.

        \item In a pure subroutine, a local variable must not have the 
SAVE attribute. (Note that this means they cannot be initialised in a 
{\it type-declaration-stmt\/} or a {\it data-stmt\/}.)

        \item A pure subroutine must not use a dummy parameter with 
        INTENT(IN), a global variable, or an 
object that is storage associated with a global variable, or a subobject 
thereof, in the following contexts:
        \begin{itemize}
                \item as the assignment variable of an {\it assignment-stmt\/}
or {\it forall-assignment\/};
                \item as a DO variable or implied DO variable, or as a 
{\it subscript-name\/} in a {\it forall-triplet-spec\/};
                \item in an {\it assign-stmt\/};
                \item as the {\it pointer-object\/} or {\it target\/}
of a {\it pointer-assignment-stmt\/};
                \item as the {\it expr\/} of an {\it assignment-stmt\/}
or {\it forall-assignment\/} whose assignment variable is of a derived 
type, or is a pointer to a derived type, that has a pointer component 
at any level of component selection;
                \item as an {\it allocate-object\/} or {\it stat-variable\/}
in an {\it allocate-stmt\/} or {\it deallocate-stmt\/}, or as a
{\it pointer-object\/} in a {\it nullify-stmt\/};
                \item as an actual argument associated with a dummy 
argument with INTENT (OUT) or (INOUT) or with the POINTER attribute.
        \end{itemize}

        \item Any procedure referenced in a pure subroutine, including
one referenced via a defined operation or assignment, must be pure.

        \item In a pure subroutine, a dummy argument or local variable 
must not appear in an {\it align-directive\/}, {\it realign-directive\/},
{\it distribute-directive\/}, {\it redistribute-directive\/},
{\it realignable-directive\/}, {\it redistributable-directive\/} or
{\it combined-directive}.

        \item In a pure subroutine, a global variable must not appear in
{\it realign-directive\/} or {\it redistribute-directive\/}.

        \item A pure subroutine must not contain a {\it pause-stmt\/},
{\it stop-stmt\/} or I/O statement (including a file operation).

\end{constraints}
To declare that a subroutine is pure, a {\it pure-directive\/} must be
given.

The constraints for pure subroutines are based on the same principles 
as for pure functions, except that now side effects to dummy arguments 
are permitted.  


\subsubsection{Pure procedure interfaces}
\label{pure-proc-interface}

To define interface specifications for pure procedures, Rule~R1204 is 
changed to:
                                                                \BNF
interface-body \IS      function-stmt
                        [pure-directive]
                        [specification-part]
                        end-function-stmt
                \OR     subroutine-stmt
                        [pure-directive]
                        [specification-part]
                        end-subroutine-stmt
                                                                \FNB
with the following constraint in addition to those in
Section~12.3.2.1 of the Fortran~90 standard:
\begin{constraints}

        \item An {\it interface-body\/} of a pure subroutine must specify
the intents of all non-pointer and non-procedure dummy arguments.

\end{constraints}

The procedure characteristics defined by an interface body must be
consistent with the procedure's definition.
Regarding pure procedures, this is interpreted as follows:
\begin{enumerate}
        \item A procedure that is declared pure at its definition may be
declared pure in an interface block, but this is not required.
        \item A procedure that is not declared pure at its definition must 
not be declared pure in an interface block.
\end{enumerate}
That is, if an interface body contains a {\it pure-directive\/}, then the 
corresponding procedure definition must also contain it, though the 
reverse is not true.
When a procedure definition with a {\it pure-directive\/}
is compiled, the compiler may check that it satisfies the necessary 
constraints.


\subsection{Pure procedure reference}
To define pure procedure references, the following extra constraint is 
added to Section~12.4.1 of the Fortran~90 standard:
\begin{constraints}

        \item In a reference to a pure procedure, a {\it procedure-name\/} 
{\it actual-arg\/} must be the name of a pure procedure.

\end{constraints}


\subsection{Elemental reference of pure procedures}
\label{elem-ref-of-pure-procs}

Fortran 90 introduces the concept of `elemental procedures', which are 
defined for scalar arguments but may also be applied to conforming 
array-valued arguments.  The latter type of reference to an elemental 
procedure is called an `elemental' reference.    For an elemental function, 
each element of the result, if any, is as would have been obtained by
applying the function to corresponding elements of the arguments.
Examples are the mathematical intrinsics, e.g.\ SIN(X).  For an elemental 
subroutine, the effect on each element of an INTENT(OUT) or INTENT(INOUT) 
array argument is as would be obtained by calling the subroutine with 
the corresponding elements of the arguments.  An example is the intrinsic 
subroutine MVBITS.

However, Fortran~90 restricts elemental reference to a subset of 
the intrinsic procedures --- programmers cannot define their own 
elemental procedures.  Obviously, elemental invocation is equivalent 
to concurrent invocation, so extra constraints beyond those for normal 
Fortran procedures are required to allow this to be done safely
(e.g.\ deterministically).  Appropriate constraints in this case are
the same as for function calls in FORALL;  indeed, the latter are 
virtually equivalent to elemental reference of the function in an 
array assignment, given the close correspondence between FORALL and 
array assignment.  Hence, pure procedures may also be referenced 
elementally, subject to certain additional constraints given below.

\subsubsection{Elemental reference of pure functions}

A non-intrinsic pure function may be referenced {\em elementally\/} 
in array expressions, with a similar interpretation to the elemental
reference of Fortran~90 elemental intrinsic functions, provided it
satisfies the additional constraints that:
\begin{enumerate}
        \item Its non-procedure dummy arguments and dummy result are 
scalar and do not have the POINTER attribute.
        \item The length of any character dummy argument or result is 
independent of argument values (though it may be assumed, or depend on the 
lengths of other character arguments and/or a character result).
\end{enumerate}
We call non-intrinsic pure functions that satisfy these constraints 
`elemental non-intrinsic functions'.

The interpretation of an elemental reference of such a function is as 
follows (adapted from Section 12.4.3 of the Fortran~90 standard):
\begin{quotation}

A reference to an elemental non-intrinsic function is an elemental
reference if one or more non-procedure actual arguments are arrays
and all array arguments have the same shape.  If any actual argument 
is a function, its result must have the same shape as that of the 
corresponding function dummy procedure.  A reference to an elemental 
intrinsic function is an elemental reference if one or more actual 
arguments are arrays and all arrays have the same shape.

The result of such a reference has the same shape as the array arguments,
and the value of each element of the result, if any, is obtained by 
evaluating the function using the scalar and procedure arguments and
the corresponding elements of the array arguments.  The elements of
the result may be evaluated in any order.

For example, if \verb@foo@ is a pure function with the following interface:
                                                \CODE
    INTERFACE
      REAL FUNCTION foo (x, y, z, dummy_func)
        !HPF$ PURE foo
        REAL, INTENT(IN) :: x, y, z
        INTERFACE        ! interface for 'dummy_func'
          REAL FUNCTION dummy_func (x)
            !HPF$ PURE dummy_func
            REAL, INTENT(IN) :: x
          END FUNCTION dummy_func
        END INTERFACE
      END FUNCTION foo
    END INTERFACE
                                                \EDOC
and \verb@a@ and \verb@b@ are arrays of shape \verb@(m,n)@ and \verb@sin@
is the Fortran~90 elemental intrinsic function, then:
                                                \CODE
    foo (a, 0.0, b, sin)
                                                \EDOC
is an array expression of shape \verb@(m,n)@ whose \verb@(i,j)@ element
has the value:
                                                \CODE
    foo (a(i,j), 0.0, b(i,j), sin)
                                                \EDOC
\end{quotation}

To define elemental references of elemental non-intrinsic functions, 
the following extra constraints are added after Rule~R1209 
({\it function-reference\/}):
\begin{constraints}

        \item A non-intrinsic function that is referenced elementally 
must be a pure function with an explicit interface, and must satisfy 
the following additional constraints:
        \begin{itemize}
                \item Its non-procedure dummy arguments and dummy result
must be scalar and must not have the POINTER attribute.
                \item The length of any character dummy argument or a 
character dummy result must not depend on argument values (though it may 
be assumed, or depend on the lengths of other character arguments and/or a 
character result).
        \end{itemize}

        \item In an elemental reference of a non-intrinsic function,
a {\it function-name\/} {\it actual-arg\/} must have a result whose shape 
agrees with that of the corresponding function dummy procedure.

\end{constraints}

The reasons for these constraints are explained in the next section.


\subsubsection{Elemental reference of pure subroutines}

A non-intrinsic pure subroutine may be referenced {\em elementally\/}, 
with a similar interpretation to the elemental reference of Fortran~90 
elemental intrinsic subroutines, provided it satisfies the additional 
constraints that:
\begin{enumerate}
        \item Its non-procedure dummy arguments are scalar and do not 
have the POINTER attribute.
        \item The length of any character dummy argument is independent 
of argument values (though it may be assumed, or depend on the lengths of 
other character arguments).
\end{enumerate}
We call non-intrinsic pure subroutines that satisfy these constraints 
`elemental non-intrinsic subroutines'.

The interpretation of an elemental reference of such a subroutine 
is as follows (adapted from Section 12.4.5 of the Fortran~90 standard):
\begin{quotation}

A reference to an elemental non-intrinsic subroutine is an elemental
reference if all actual arguments corresponding to INTENT(OUT) and
INTENT(INOUT) dummy arguments are arrays that have the same shape 
and the remaining non-procedure actual arguments are conformable with 
them.  If any actual argument is a function, its result must have the 
same shape as that of the corresponding function dummy procedure.
A reference to an elemental intrinsic subroutine is an elemental 
reference if all actual arguments corresponding to INTENT(OUT) and 
(INTENT(INOUT) dummy arguments are arrays that have the same shape and 
the remaining actual arguments are conformable with them.

The values of the elements of the arrays that correspond to INTENT(OUT)
and INTENT(INOUT) dummy arguments are the same as if the subroutine were 
invoked separately, in any order, using the scalar and procedure arguments 
and corresponding elements of the array arguments.

\end{quotation}

To define elemental references of elemental non-intrinsic subroutines, 
the following constraints are added after Rule~R1210 ({\it call-stmt\/}):
\begin{constraints}

        \item A non-intrinsic subroutine that is referenced elementally 
must be a pure subroutine with an explicit interface, and must satisfy 
the following additional constraints:
        \begin{itemize}
                \item Its non-procedure dummy arguments must be scalar 
and must not have the POINTER attribute.
                \item The length of any character dummy argument must 
not depend on argument values (though it may be assumed, or depend on 
the lengths of other character arguments).
        \end{itemize}

        \item In an elemental reference of a non-intrinsic subroutine,
a {\it function-name\/} {\it actual-arg\/} must have a result whose shape 
agrees with that of the corresponding function dummy procedure.

\end{constraints}

It is perhaps worth outlining the reasons for the extra constraints 
imposed on pure procedures in order for them to be referenced elementally.  

The dummy result of a function or `output' arguments of a subroutine
are not allowed to have the POINTER attribute because of a Fortran~90
technicality, namely, that under elemental reference the corresponding 
actual arguments must be array variables, and Fortran~90 does not permit 
an array of pointers to be referenced.\footnote{
        See the final constraint after Rule~R613 of the Fortran~90 standard.
Note the difference between an {\em array of pointers\/}, which cannot 
be declared or referenced in Fortran~90, and a {\em pointer array\/},
which can.
}
The `input' arguments of an elemental reference are prohibited from 
having the POINTER attribute for consistency with the output arguments 
or result.  However, this last constraint does not impose 
any real restrictions on an elemental reference, as the corresponding 
actual arguments {\em can\/} be pointers, in which case they are 
`de-referenced' and their targets are associated with the dummy arguments.  
In fact, the only reason for a dummy argument to be a pointer is so that
its pointer association can be changed, which is not allowed for `input'
arguments.  (Incidentally, since a pure function has only `input' 
arguments, there would be no loss of generality in disallowing dummy 
pointers in pure functions generally.)  Note that the prohibition of 
dummy pointers in pure subroutines that are elementally referenced means 
that all their non-procedure dummy arguments can have their intent 
explicitly specified (and indeed this is required by the constraints for 
pure subroutine interfaces---see Section \ref{pure-proc-interface}) which 
assists the checking of argument usage.

In an elemental reference, any actual argument that is a function
must have a result whose shape agrees with that of the corresponding 
function dummy procedure.  That is, elemental usage does not extend to 
function arguments, as Fortran~90 does not support the concept of an `array' 
of functions.
Naively it might appear that a function actual argument that is associated 
with a scalar dummy function could return an array result provided it 
conforms with the other array arguments of the elemental reference.  
However, this is not meaningful under elemental reference, as an 
array-valued function cannot be decomposed into an `array' of scalar 
function references, as would be required in this context.

Finally, the length of any character dummy argument or a character
dummy result cannot depend on argument {\em values\/} (though it can
be assumed, or depend on the lengths of other character arguments and/or
a character result).  This ensures that under elemental reference, all 
elements of an array argument or result of character type will have the 
same length, as required by Fortran~90.


\subsection{Examples of pure procedure usage}

\subsubsection{FORALL statements and constructs}

Pure functions may be used in expressions in FORALL statements and 
constructs, unlike general functions.  
Because a {\it forall-assignment}
may be an {\it array-assignment} the pure function can have an array
result.  
For example:
                                                              \CODE
INTERFACE
  FUNCTION f (x)
    !HPF$ PURE f
    REAL, DIMENSION(3) :: f, x
  END FUNCTION f
END INTERFACE
REAL  v (3,10,10)
...
FORALL (i=1:10, j=1:10)  v(:,i,j) = f (v(:,i,j)) 
                                                              \EDOC


\subsubsection{Elemental references}
Examples of elemental function usage are
                                                              \CODE
INTERFACE 
  REAL FUNCTION foo (x, y, z)
    !HPF$ PURE foo
    REAL, INTENT(IN) :: x, y, z
  END FUNCTION foo
END INTERFACE

REAL a(100), b(100), c(100)
REAL p, q, r

a(1:n) = foo (a(1:n), b(1:n), c(1:n))
a(1:n) = foo (a(1:n), q, r)
a = sin(b)
                                                              \EDOC
An example involving a WHERE-ELSEWHERE construct is
                                                              \CODE
INTERFACE
  REAL FUNCTION f_egde (x)
    !HPF$ PURE
    REAL x
  END FUNCTION f_edge
  REAL FUNCTION f_interior (x)
    !HPF$ PURE
    REAL x
  END FUNCTION f_interior
END INTERFACE

REAL a (10,10)
LOGICAL edges (10,10)

WHERE (edges)
  a = f_egde (a)
ELSE WHERE
  a = f_interior (a)
END WHERE
                                                          \EDOC

Examples of elemental subroutine usage are
                                                                \CODE
INTERFACE 
  SUBROUTINE solve_simul(tol, y, z)
    !HPF$ PURE solve_simul
    REAL, INTENT(IN) :: tol
    REAL, INTENT(INOUT) :: y, z
  END SUBROUTINE
END INTERFACE

REAL a(100), b(100), c(100)
INTEGER bits(10)

CALL solve_simul( 0.1, a, b )
CALL solve_simul( c, a, b )
CALL mvbits( bits, 0, 4, bits, 4) ! Fortran 90 elemental intrinsic
                                                                \EDOC

User-defined elemental procedures have several potential advantages.
They are a convenient programming tool, as the same procedure 
can be applied to actual arguments of any rank.

In addition, the implementation of an elemental function returning an
array-valued result in an array expression is likely to be more 
efficient than that of an equivalent array function.  One reason is 
that it requires less temporary storage for the result (i.e.\ storage 
for a single result versus storage for the entire array of results).  
Another is that it saves on looping if an array expression is 
implemented by sequential iteration over the component elemental 
expressions (as may be done for the `segment' of the array expression 
local to each process).  This is because, in the sequential version, 
the elemental function can be invoked elementally in situ within the 
expression.  The array function, on the other hand, must be executed 
before the expression is evaluated, storing its result in a temporary 
array for use within the expression.  Looping is then required during 
the execution of the array function body as well as the expression 
evaluation.


\subsection{MIMD parallelism via pure procedures}

We have seen that a pure procedure may be invoked concurrently at each
`element' of an array if it is referenced elementally or in a FORALL 
statement or construct (where an `element' may itself be an array in
a non-elemental reference).  In these cases, a limited form of MIMD 
parallelism can be obtained by means of branches within the pure procedure 
which depend on arguments associated with array elements or their 
subscripts (the latter especially in a FORALL context).  For example:
                                                              \CODE
    FUNCTION f (x, i)
      !HPF$ PURE f
      REAL x       ! associated with array element
      INTEGER i    ! associated with array subscript
      IF (x > 0.0) THEN     ! content-based conditional
        ...
      ELSE IF (i==1 .OR. i==n) THEN    ! subscript-based conditional
        ...
      ENDIF
    END FUNCTION

    ...
    REAL a(n)
    INTEGER i
    ...
    FORALL (i=1:n)  a(i) = f( a(i), i)
    ...
    a = f( a, (/i,i=1,n/) )     ! an elemental reference equivalent
                                ! to the above FORALL

                                                              \EDOC
This may sometimes provide an alternative to using
WHERE-ELSEWHERE constructs or sequences of masked FORALLs with their 
potential synchronisation overhead. 


\subsection{Comments}

This section should be moved to the comments chapter of the final draft.

\subsubsection{Pure procedures}

\begin{itemize}

\item The constraints for a pure procedure guarantee
freedom from side-effects, thus ensuring that it can be invoked
concurrently at each
`element' of an array (where an ``element'' may itself be a data-structure, 
including an array).

\item All constraints can be statically checked, thus providing safety
for the programmer.

Of course, a price that must be paid for this additional security is
that the constraints must be quite rigorous, which means that it
is possible to write a function that is side-effect free in behaviour
but which nevertheless fails to satisfy the constraints 
(e.g.\ a function that contains an assignment to a global variable,
but in a branch that is not executed in any invocation of the function
during a particular program execution).


\item It is expected that most High Performance Fortran library 
procedures will conform to the constraints required of pure procedures
(by the very nature of library procedures), and so can be declared pure 
and referenced in FORALL statements and constructs (if they are functions) 
and within user-defined pure procedures.  It is also anticipated that 
most library procedures will not reference global data, whose use may 
sometimes inhibit concurrent execution (see below).

The constraints on pure procedures are limited to those necessary 
for statically checkable side-effect freedom and the elimination 
of saved internal state.  Subject to these restrictions, maximum 
functionality has been preserved in the definition of pure procedures.
This has been done to make elemental reference and function calls in 
FORALL as widely available as possible, and so that quite general library 
procedures can be classified as pure.  

A drawback of this flexibility is that pure procedures permit certain 
features whose use may hinder, and in the worst case prevent, concurrent 
execution in FORALL and elemental references (that is, such references 
may have to be implemented by sequentialisation).  
Foremost among these features are the access of global data, particularly 
distributed global data, and the fact that the arguments and, for a pure 
function, the result may be pointers or data structures with pointer 
components, including recursive data structures such as lists and trees.
The programmer should be aware of the potential performance penalties 
of using such features.


\item An earlier draft of this proposal contained a constraint disallowing 
pure procedures from accessing global data objects, particularly
distributed data objects.
This constraint has been dropped as inessential to the side-effect freedom 
that the HPF committee requested.
However, it may well be that some machines will have great difficulty 
implementing FORALL without this constraint.


\item One of us (JHM) is still in favour of disallowing access to global 
variables for a number of reasons: 
\begin{enumerate}
\item Aesthetically, it is in keeping with the
nature of a `pure' function, i.e. a function in the mathematical
sense, and in practical terms it imposes no real restrictions on the 
programmer, as global data can be passed-in via the argument list; 
\item Without this constraint HPF programs can no longer be implemented 
by pure message-passing, or at least not efficiently, i.e. without
sequentialising FORALL statements containing function calls and greatly
complicating their implementation; 
\item Absence of this restriction may inhibit optimisation of FORALLs
and array assignments, as the optimisation of assigning the {\it expr\/}
directly to the assignment variable rather than to a temporary intermediate
array now requires interprocedural analysis rather than just local 
analysis.
\end{enumerate}

\end{itemize}

\subsubsection{Elemental references}

\begin{itemize}

\item The original draft proposed allowing pure procedures 
to be invoked elementally even if their dummy arguments or results 
were array-valued.  These provisions have been dropped to avoid 
promoting storage order to a higher level in Fortran~90
(i.e.\ to avoid introducing the concept of `arrays-if-arrays', 
which Fortran~90 seems to strenuously avoid!)   In practical terms,
the current proposal provides the same functionality as the original 
one for functions, though not for subroutines.  If a programmer wants 
elemental function behaviour, but also wants the `elements' to be
array-valued, this can be achieved using FORALL.

\item In typical FORALL or elemental implementation, a pure procedure 
would be called independently in each process, and its dummy arguments 
would be associated with `elements' local to that process. 
This is the reason for disallowing data mapping directives for
local and dummy variables within the bodies of such procedures.
Note that, particularly in elemental invocations, the actual arguments
can be distributed arrays which need not be `co-distributed'; if not,
a typical implementation would in general perform all data communications 
prior to calling the procedure, and would then pass-in the required 
elements locally via its argument list.

However, access to large global data structures such as look-up tables
is often useful within functions that are otherwise mathematically pure,
and these are allowed to be distributed.

\end{itemize}


\section{The INDEPENDENT Directive}

\label{do-independent}

\footnote{Version of August 20, 1992
 - Guy Steele, Thinking Machines Corporation, and 
Charles Koelbel, Rice University.  Approved at second reading on
September 10, 1992; however, the INDEPENDENT subgroup was directed to
examine methods of allowing reductions to be performed within
INDEPENDENT constructs.}
The INDEPENDENT directive can procede a DO loop or FORALL statement or
construct.
Intuitively, it asserts to the compiler that the operations in the
following construct
may be executed independently--that is, in any order, or
interleaved, or concurrently--without changing the semantics
of the program.

The syntax of the INDEPENDENT directive is
                                                  \BNF
independent-dir	\IS	!HPF$INDEPENDENT [ (integer-variable-list) ]
                                                  \FNB

\noindent
Constraint: An {\it independent-dir\/} must immediately precede a DO or FORALL
statement.

\noindent
Constraint: If the {\it integer-variable-list\/} is present, then the
variables named must be the index variables of set of perfectly nested
DO loops or indices from the same FORALL header.

The directive is said to apply to the indices named in its {\it
integer-variable-list}, or equivalently to the loops or FORALL indexed
by those variables.
If no {\it integer-variable-list\/} is present, then it is as if it
were present and contained the index variable for the DO or FORALL
imediately following the directive.


When applied to a nest of DO loops, an INDEPENDENT directive is an
assertion by the programmer that no iteration may affect any other
iteration, either directly or indirectly.
This implies that there are no no exits from the construct other than
normal loop termination, and no I/O is performed by the loop.
A sufficient condition for ensuring this is that
during
the execution of the loop(s), no iteration assigns to any scalar
data object which is 
accessed (i.e.\ read or written) by any other iteration.
The directive is purely advisory and a compiler is free
to ignore them if it cannot make use of the information.


For example:
                                                  \CODE
!HPF$INDEPENDENT
      DO I=1,100
        A(P(I)) = B(I)
      END DO
                                                  \EDOC
asserts that the array P does not have any repeated entries (else they
would cause interference when A was assigned).
It also limits how A and B may be storage associated.
(The remaining examples in this
section assume that no variables are storage or sequence associated.)

Another example:
                                                  \CODE
!HPF$INDEPENDENT (I1,I2,I3)
      DO I1 = 1,N1
        DO I2 = 1,N2
          DO I3 = 1,N3
            DO I4 = 1,N4   !The inner loop is not independent!
              A(I1,I2,I3) = A(I1,I2,I3) + B(I1,I2,I4)*C(I2,I3,I4)
            END DO
          END DO
        END DO
      END DO
                                                  \EDOC
The inner loop is not independent because each element of A is
assigned repeatedly.
However, the three outer loops are independent because they access
different elements of A.
It is not relevant that the outer loops read the same elements from B
and C, because those arrays are not assigned.

The interpretation of INDEPENDENT for FORALL is similar to that for
DO: it asserts that no combination of the indices that INDEPENDENT
applies to may affect another combination.
This is only possible if one combination of index values assigns to a
scalar data object accessed by another
combination.
A DO and a FORALL with the same body are equivalent if they both
have the INDEPENDENT directive.
In the case of a FORALL, any of the variables may be mentioned in the
INDEPENDENT directive:
                                                                \CODE
!HPF$INDEPENDENT (I1,I3)
    FORALL(I1=1:N1,I2=1:N2,I3=1:N3) 
      A(I1,I2,I3) = A(I1,I2-1,I3)
    END FORALL
                                                                \EDOC
This means that for any given values for I1 and I3,
all the right-hand sides for all values of I2 must
be computed before any assignment are done for that
specific pair of (I1,I3) values; but assignments for
one pair of (I1,I3) values need not wait for rhs
evaluation for a different pair of (I1,I3) values.

Graphically, the INDEPENDENT directive can be visualized as
eliminating edges from a precedence graph representing the program.
Figure~\ref{fig-dep} shows the dependences that may normally be
present in a DO an a FORALL.
An arrow from a left-hand-side node (for example, ``lhsa(1)'') 
to a right-hand-side node (e.g. ``rhsb(1)'') means that the RHS
computation may use values assigned in the LHS nodel; thus the
right-hand side must be computed after the left-hand side completes
its store.
Similarly, an arrow from a RHS node to a LHS node means that the LHS
may overwrite a value needed by the RHS computation, again forcing an
ordering.
Edges from the ``BEGIN'' and to the ``END'' nodes represent control
dependences.
The INDEPENDENT directive asserts that the only dependences that a
compiler need enforce are those in Figure~\ref{fig-indep}.
That is, the programmer who uses INDEPENDENT is certifying that if the
compiler only enforces these edges, then the resulting program will be
equivalent to the one in which all the edges are present.
Note that the set of asserted dependences is identical for INDEPENDENT
DO and FORALL constructs.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Here come the pictures!
%

{

%length for use in pictures
\setlength{\unitlength}{0.03in}

%nodes used in all pictures
\newsavebox{\nodes}
\savebox{\nodes}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(50.0,100){\makebox(0,0){BEGIN}}
    \put(20.0,80.0){\makebox(0,0){rhsa(1)}}
    \put(50.0,80.0){\makebox(0,0){rhsa(2)}}
    \put(80.0,80.0){\makebox(0,0){rhsa(3)}}
    \put(20.0,60.0){\makebox(0,0){lhsa(1)}}
    \put(50.0,60.0){\makebox(0,0){lhsa(2)}}
    \put(80.0,60.0){\makebox(0,0){lhsa(3)}}
    \put(20.0,40.0){\makebox(0,0){rhsb(1)}}
    \put(50.0,40.0){\makebox(0,0){rhsb(2)}}
    \put(80.0,40.0){\makebox(0,0){rhsb(3)}}
    \put(20.0,20.0){\makebox(0,0){lhsb(1)}}
    \put(50.0,20.0){\makebox(0,0){lhsb(2)}}
    \put(80.0,20.0){\makebox(0,0){lhsb(3)}}
    \put(50.0,0){\makebox(0,0){END}}
    \put(50.0,100){\oval(25,5)}
    \put(20.0,80.0){\oval(20,5)}
    \put(50.0,80.0){\oval(20,5)}
    \put(80.0,80.0){\oval(20,5)}
    \put(20.0,60.0){\oval(20,5)}
    \put(50.0,60.0){\oval(20,5)}
    \put(80.0,60.0){\oval(20,5)}
    \put(20.0,40.0){\oval(20,5)}
    \put(50.0,40.0){\oval(20,5)}
    \put(80.0,40.0){\oval(20,5)}
    \put(20.0,20.0){\oval(20,5)}
    \put(50.0,20.0){\oval(20,5)}
    \put(80.0,20.0){\oval(20,5)}
    \put(50.0,0){\oval(25,5)}
    \put(50,97.5){\vector(-2,-1){30}}
    \put(50,97.5){\vector(0,-1){15}}
    \put(50,97.5){\vector(2,-1){30}}
    \put(20,17.5){\vector(2,-1){30}}
    \put(50,17.5){\vector(0,-1){15}}
    \put(80,17.5){\vector(-2,-1){30}}
    \end{picture}
}

\begin{figure}

\begin{minipage}{2.70in}
\CODE
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\web}
\savebox{\web}{
    \begin{picture}(60,15)(0,0)
    \put(0,15){\vector(0,-1){15}}
    \put(0,15){\vector(2,-1){30}}
    \put(0,15){\vector(4,-1){60}}
    \put(30,15){\vector(-2,-1){30}}
    \put(30,15){\vector(0,-1){15}}
    \put(30,15){\vector(2,-1){30}}
    \put(60,15){\vector(0,-1){15}}
    \put(60,15){\vector(-2,-1){30}}
    \put(60,15){\vector(-4,-1){60}}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,62.5){\usebox\web}
\put(20,42.5){\usebox\web}
\put(20,22.5){\usebox\web}
\end{picture}
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\begin{picture}(80,105)(10,-2.5)
\small\sf
%save the messy part of the picture & reuse it
\newsavebox{\chain}
\savebox{\chain}{
    \begin{picture}(20,70)(0,0)
    \put(2.5,2.5){\oval(5,5)[bl]}
    \put(2.5,0){\vector(1,0){5}}
    \put(7.5,2.5){\oval(5,5)[br]}
    \put(10,2.5){\vector(0,1){32.5}}
    \put(10,35){\line(0,1){32.5}}
    \put(12.5,67.5){\oval(5,5)[tl]}
    \put(12.5,70){\vector(1,0){5}}
    \put(17.5,67.5){\oval(5,5)[tr]}
    \end{picture}
}
\put(10,-2.5){\usebox\nodes}
\put(20,77.5){\vector(0,-1){15}}
\put(20,57.5){\vector(0,-1){15}}
\put(20,37.5){\vector(0,-1){15}}
\put(25,15){\usebox\chain}
\put(50,77.5){\vector(0,-1){15}}
\put(50,57.5){\vector(0,-1){15}}
\put(50,37.5){\vector(0,-1){15}}
\put(55,15){\usebox\chain}
\put(80,77.5){\vector(0,-1){15}}
\put(80,57.5){\vector(0,-1){15}}
\put(80,37.5){\vector(0,-1){15}}
\end{picture}
\end{minipage}

\caption{Dependences in DO and FORALL without
INDEPENDENT assertions}
\label{fig-dep}
\end{figure}

\begin{figure}

%Draw the picture once, use it twice
\newsavebox{\easy}
\savebox{\easy}{
    \small\sf
    \begin{picture}(80,105)(10,-2.5)
    \put(10,-2.5){\usebox\nodes}
    \put(20,77.5){\vector(0,-1){15}}
    \put(20,57.5){\vector(0,-1){15}}
    \put(20,37.5){\vector(0,-1){15}}
    \put(50,77.5){\vector(0,-1){15}}
    \put(50,57.5){\vector(0,-1){15}}
    \put(50,37.5){\vector(0,-1){15}}
    \put(80,77.5){\vector(0,-1){15}}
    \put(80,57.5){\vector(0,-1){15}}
    \put(80,37.5){\vector(0,-1){15}}
    \end{picture}
}

\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
FORALL ( i = 1:3 )
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END FORALL
\EDOC

\centering
\usebox\easy
\end{minipage}
%
\hfill
%
\begin{minipage}{2.70in}
\CODE
!HPF$ INDEPENDENT
DO i = 1, 3
  lhsa(i) = rhsa(i)
  lhsb(i) = rhsb(i)
END DO
\EDOC

\centering
\usebox\easy
\end{minipage}

\caption{Dependences in DO and FORALL with
INDEPENDENT assertions}
\label{fig-indep}
\end{figure}

}

%
%
% End of pictures
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


The compiler is justified in producing
a warning if it can prove that one of these assertions is incorrect.
It is not required to do so, however.
A program containing any false assertion of this type is not
standard-conforming, and the compiler may take any action it deems necessary.


This directive is of course similar to the DOSHARED directive
of Cray MPP Fortran.  A different name is offered here to avoid
even the hint of commitment to execution by a shared memory machine.
Also, the "mechanism" syntax is omitted here, though we might want
to adopt it as further advice to the compiler about appropriate
implementation strategies, if we can agree on a desirable set
of options.


\section{Other Proposals}

The following are proposals made for modification or replacement of the 
above sections.

\subsection{A Proposal for MIMD Support in HPF}

\label{mimd-support}
	          

\subsubsection{Abstract}

\footnote{Version of July 18, 1992 - Clemens-August Thole, GMD I1.T.
In the interest of time, these features were not considered for inclusion 
in the first round of HPFF.}
This proposal tries to supply sufficient language support in order 
to deal with loosely sysnchronous programs, some of which have been 
identified in my "A vote for explicit MIMD support".
This is a proposal for the support of MIMD parallelism, which extends
Section~\ref{do-independent}. 
It is more oriented
towards the CRAY - MPP Fortran Programming Model and the PCF proposal. 
The fine-grain synchronization of PCF is not proposed for implementation.
Instead of the CRAY-mechanims for assigning work to processors an 
extension of the ON-clause is used.
Due to the lack of fine-grain synchronization the constructs can be
executed
on SIMD or sequential architectures just by ignoring the additional
information.


\subsubsection{Summary of the current situation of MIMD support as part of
HPF}

According to the Charles Koelbel's (Rice) mail dated March 20th "Working
Group 4 -
Issues for discussion" MIMD-support is a topic for discussion within
working
group 4. 

Dave Loveman (DEC) has produced a document on FORALL statements 
(inorporated in Sections~\ref{forall-stmt} and \ref{forall-construct})
which
summarizes the discussion. Marc Snir proposed some extensions. These
constructs allow to describe SIMD extensions in an extended way compared
to array assignments. 

A topic for working papers is the interface of HPF Fortran to program units
which execute in SPMD mode. Proposals for "Local Subroutines" have been
made
by Marc Snir and Guy Steele
(Chapter~\ref{foreign}). Both proposals
define local subroutines as program units, which are executed by all
processors independent of each other. Each processor has only access
to the data contained in its local memory. Parts of distributed data
objects
can be accessed and updated by calls to a special library. Any
message-passing
library might be used for synchronization and communication.
This approach does not really integrate MIMD-support into HPF programming.

The MPP Fortran proposal by Douglas M. Pase, Tom MacDonald, Andrew Meltzer
(CRAY)
contained the following features in order to support integrated MIMD
features:
\begin{itemize}
   \item  parallel directive
   \item  shared loops 
   \item  private variables
   \item  barrier synchronization
   \item  no-barrier directive for removing synchronization
   \item  locks, events, critical sections and atomic update
   \item  functions, to examine the mapping of data objects.
\end{itemize}

Steele's "Proposal for loops in HPF" (02.04.92) included a proposal for a 
directive "!HPF$ INDEPENDENT( integer_variable_list)", which specifies
for the next set of nested loops, that the loops with the specified
loop variables can be executed independent from each other.
(Sectin~\ref{do-independent} is a short version of this proposal.) 

Charles Koelbel gave an overview on different styles for parallel loops
in "Parallel Loops Position Paper". No specific proposal was made.

Min-You Wu "Proposal for FORALL, May 1992" extended Guy Steele's 
"!HPF$ INDEPENDENT" proposal to use the directive in a block style.

Clemens-August Thole "A vote for explicit MIMD support" contains 3 examples
from different application areas, which seem to require MIMD support for
efficient execution. 

\paragraph{Summary}

In contrast to FORALL extensions MIMD support is currently not
well-established
as part of HPF Fortran. The examples in "A vote for explicit MIMD support"
show clearly the need for such features. Local subroutines do not fulfill
the requirements because they force to use a distributed memory programming
model,
which should not be necessary in most cases.

With the exception of parallel sections all interesting features
are contained in the MPP-proposal. I would like to split the discussion
on specifying parallelism, synchronization and mapping into three different
topics. Furthermore I would like to see corresponding features to be
expessed
in the style of of the current X3H5 proposal, if possible, in order to
be in line with upcoming standards.


\subsubsection{Proposal for MIMD support}

In order to support the spezification of MIMD-type of parallelism the
following
features are taken from the "Fortran 77 Binding of X3H5 Model for 
Parallel Programming Constructs": 
\begin{itemize}
    \item   PARALLEL DO construct/directive
    \item   PARALLEL SECTIONS worksharing construct/directive
    \item   NEW statement/directive
\end{itemize}

These constructs are not used with PCF like options for mapping or 
sysnchronisation but are combined with the ON clause for mapping operations
onto the parallel architecture. 

\paragraph{PARALLEL DO}

\subparagraph{Explicit Syntax}

The PARALLEL DO construct specifies parallelism among the 
iterations of a block of code. The PARALLEL DO construct has the same
syntax as a DO statement. For a directive approach the directive
!HPF$ PARALLEL can be used in front of a do statement.
After the PARALLEL DO statement a new-declaration may be inserted.

A PARALLEL DO construct might be nested with other parallel constructs. 

\subparagraph{Interpretation}

The PARALLEL DO is used to specify parallel execution of the iterations of
a block of code. Each iteration of a PARALLEL DO is an independent unit
of work. The iterations of PARALLEL DO must be data independent. Iterations
are data independent if the storage sequence accociated with each variable
are array element that is assigned a value by each iteration is not
referenced
by any other iteration. 

A program is not HPF conforming, if for any iteration a statement is
executed,
which causes a transfer of control out of the block defined by the PARALLEL
DO construct. 

The value of the loop index of a PARALLEL DO is undefined outside the scope
of the PARALLEL DO construct. 


\paragraph{PARALLEL SECTIONS}

The parallel sections construct is used to specify parallelism among
sections
of code.

\subparagraph{Explicit Syntax}


                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        !HPF$ SECTION
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
structured as
                                                              \CODE
        !HPF$ PARALLEL SECTIONS
        [new-declaration-stmt-list]
        [section-block]
        [section-block-list]
        !HPF$ END PARALLEL SECTIONS
                                                              \EDOC
where [section-block] is
                                                              \CODE
        !HPF$ SECTION
        [execution-part]
                                                              \EDOC

\subparagraph{Interpretation}

The parallel sections construct is used to specify parallelism among
sections
of code. Each section of the code is an independent unit of work. A program
is not standard conforming if during the execution of any parallel sections
construct a transfer of control out of the blocks defined by the Parallel
Sections construct is performed. 
In a standard conforming program the sections of code shall be data 
independent. Sections are data independent if the storage sequence
accociated 
with each variable are array element that is assigned a value by each
section
is not referenced by any other section. 


\paragraph{Data scoping}

Data objects, which are local to a subroutine, are different between 
distinct units of work, even if the execute the same subroutine.


\paragraph{NEW statement/directive}

The NEW statement/directive allows the user to generate new instances of 
objects with the same name as an object, which can currently be referenced.


\subparagraph{Explicit Syntax}

A [new-declaration-stmt] is
                                                                \CODE
       !HPF$ NEW variable-name-list
                                                                \EDOC

\subparagraph{Coding rules}

A [varable-name] shall not be
\begin{itemize} 
\item    the name of an assumed size array, dummy argument, common block, 
function or entry point
\item    of type character with an assumed length
\item    specified in a SAVE of DATA statement
\item    associated with any object that is shared for this parallel
construct.
\end{itemize}

\subparagraph{Interpretation}
 
Listing a variable on a NEW statement causes the object to be explicitly
private for the parallel construct. For each unit of work of the parallel 
construct a new instance of the object is created and referenced with the
specific name. 


\subsection{Nested WHERE statements}

\label{nested-where}

\footnote{Version of September 15, 1992 - Guy Steele, Thinking Machines 
Corporation.  This section has not been discussed.}
Here is the text of a proposal once sent to X3J3:
\begin{quote}
Briefly put, the less WHERE is like IF, the more difficult it is to
translate existing serial codes into array notation.  Such codes tend to
have the general structure of one or more DO loops iterating over array
indices and surrounding a body of code to be applied to array elements.
Conversion to array notation frequently involves simply deleting the DO
loops and changing array element references to array sections or whole
array references.  If the loop body contains logical IF statements, these
are easily converted to WHERE statements.  The same is true for translating
IF-THEN constructs to WHERE constructs, except in two cases.  If the IF
constructs are nested (or contain IF statements), or if ELSE IF is used,
then conversion suddenly becomes disproportionately complex, requiring the
user to create temporary variables or duplicate mask expressions and to use
explicit .AND. operators to simulate the effects of nesting.

Users also find it confusing that ELSEWHERE is syntactically and
semantically analogous to ELSE rather than to ELSE IF.

We propose that the syntax of WHERE constructs be extended and
changed to have the form
                                                                \BNF
where-construct       \IS  where-construct-stmt
 				    [ where-body-construct ]...
 				  [ elsewhere-stmt
 				    [ where-body-construct ]... ]...
 				  [ where-else-stmt
 				    [ where-body-construct ]... ]
 				  end-where-stmt
 
 	where-construct-stmt  \IS  WHERE ( mask-expr )
 
 	elsewhere-stmt        \IS  ELSE WHERE ( mask-expr )
 
 	where-else-stmt       \IS  ELSE WHERE
 
 	end-where-stmt        \IS  END WHERE
 
 	mask-expr             \IS  logical-expr
 
 	where-body-construct  \IS  assignment-stmt
 			      \IS  where-stmt
 			      \IS  where-construct
                                                                \FNB       
                                             	

\noindent Constraint: In each assignment-stmt, the mask-expr and the variable
being defined must be arrays of the same shape.  If a
where-construct contains a where-stmt, an elsewhere-stmt,
or another where-construct, then the two mask-expr's must
be arrays of the same shape.
 
The meaning of such statements may be understood by rewrite rules.  First
one may eliminate all occurrences of ELSE WHERE:
                                                                \CODE
WHERE (m1)		
    xxx			
ELSE WHERE (m2)		
    yyy				
END WHERE
	                                                            \EDOC
becomes
                                                                \CODE
WHERE (m1)
    xxx
ELSE
    WHERE (m2)
        yyy
    END WHERE
END WHERE
                                                                \EDOC
where xxx and yyy represent any sequences of statements, so long as the
original WHERE, ELSE WHERE, and END WHERE match, and the ELSE WHERE is the
first ELSE WHERE of the construct (that is, yyy may include additional ELSE
WHERE or ELSE statements of the construct).  Next one eliminates ELSE:
                                                                \CODE
WHERE (m)
    xxx
ELSE
    yyy
END WHERE				WHERE (.NOT. temp)
                                                                \EDOC
becomes
                                                                \CODE
temp = m
WHERE (temp)
    xxx
END WHERE
WHERE (.NOT. temp)
    yyy
END WHERE
                                                                \EDOC

Finally one eliminates nested WHERE constructs:
                                                                \CODE
WHERE (m1)
    xxx
    WHERE (m2)
        yyy
    END WHERE
    zzz
END WHERE
                                                                \EDOC
becomes
                                                                \CODE
temp = m1
WHERE (temp)
    xxx
END WHERE
WHERE (temp .AND. (m2))
    yyy
END WHERE
WHERE (temp)
    zzz
END WHERE
                                                                \EDOC
and similarly for nested WHERE statements.

The effects of these rules will surely be a familiar or obvious possibility
to all the members of the committee; I enumerate them explicitly here only
so that there can be no doubt as to the meaning I intend to support.

Such rewriting rules are simple for a compiler to apply, or the code may
easily be compiled even more directly.  But such transformations are
tedious for our users to make by hand and result in code that is
unnecessarily clumsy and difficult to maintain.

One might propose to make WHERE and IF even more similar by making two
other changes.  First, require the noise word THERE to appear in a WHERE
and ELSE WHERE statement after the parenthesized mask-expr, in exactly the
same way that the noise word THEN must appear in IF and ELSE IF statements.
(Read aloud, the results might sound a trifle old-fashioned--"Where knights
dare not go, there be dragons!"--but technically would be as grammatically
correct English as the results of reading an IF construct aloud.)  Second,
allow a WHERE construct to be named, and allow the name to appear in ELSE
WHERE, ELSE, and END WHERE statements.  I do not feel very strongly one way
or the other about these no doubt obvious points, but offer them for your
consideration lest the possibilities be overlooked.
\end{quote}

Now, for compatibility with Fortran 90, HPF should continue to
use ELSEWHERE instead of ELSE, but this causes no ambiguity:

      WHERE(...)
	...
      ELSE WHERE(...)
	...
      ELSEWHERE
	...
      END WHERE

is perfectly unambiguous, even when blanks are not significant.
Since X3J3 declined to adopt the keyword THERE, it should not be
used in HPF either (alas).

\alternative A
\subsection{EXECUTE-ON-HOME Directive}
\label{on-clause}

\footnote{Version of 
October 14, 1992
--
Tin-Fook Ngai,
Hewlett-Packard Laboratories.
This section has not been disussed.}

The EXECUTE-ON-HOME directive is used to suggest where an
iteration of a DO construct or an indexed parallel assignment should be
executed.  The directive informs the compiler which data access should be
local and which data access may be remote.
 
                                                                       \BNF
execute-on-home-directive  \IS  EXECUTE (subscript-list) ON_HOME align-spec 
[; LOCAL array-name-list]
                                                                       \FNB

The EXECUTE-ON-HOME directive must immediately precede the corresponding
DO loop body, array assignment, FORALL statement, FORALL construct or
individual assignment statement in a FORALL construct.

The scope of an EXECUTE-ON-HOME directive is the entire loop body of the
enclosing DO construct, or the following array assignment, FORALL
statement, FORALL construct or assignment statement in a FORALL construct.

The {\em subscript-list} identifies a distinct iteration index or an indexed
parallel assignment.  The {\em align-spec} identifies a template node.  Every
iteration index or indexed assignment must be associated with one
and only one template node.  The EXECUTE-ON-HOME directive suggests that
the iteration or parallel assignment should be executed on the processor
to where the template node is mapped.  When the EXECUTE-ON-HOME directive
is applied to a subroutine call, it affects only the execution location of
the caller but not the execution location of the called subroutine.

The optional LOCAL directive informs the compiler that all data accesses
to the specified {\em array-name-list} can be handled as local data
accesses if the related HPF data mapping directives are honored.

EXECUTE-ON-HOME directives can be nested, but only the immediately
preceding EXECUTE-ON-HOME directive is effective.


\paragraph{Example 1}
                                                                \CODE 
      REAL A(N), B(N)
!HPF$ TEMPLATE T(N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ DISTRIBUTE T(CYCLIC(2))

!HPF$ INDEPENDENT            
      DO I = 1, N/2 
!HPF$ EXECUTE (I) ON_HOME T(2*I); LOCAL A, B, C
      ! we know that P(2*I-1) and P(2*I) is a permutation 
      ! of 2*I-1 and 2*I
        A(P(2*I - 1)) = B(2*I - 1) + C(2*I - 1)    
        A(P(2*I)) = B(2*I) + C(2*I)
      END DO
                                                                \EDOC 


\paragraph{Example 2}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ EXECUTE (I,J) ON_HOME T(I+1,J-1)
      FORALL (I=1:N-1, J=2:N)   A(I,J) = A(I+1,J-1) + B(I+1,J-1)
                                                                \EDOC


\paragraph{Example 3}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
!HPF$ EXECUTE (I,J) ON_HOME T(I,J)  
      ! apply to the entire FORALL construct
      FORALL (I=1:N-1, J=2:N) 
        A(I,J) = A(I+1,J-1) + B(I+1,J-1)
        B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
                                                                \EDOC


\paragraph{Example 4}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B
      FORALL (I=1:N-1, J=2:N) 
!HPF$ EXECUTE (I,J) ON_HOME T(I,J)  
      ! applies only to the following assignment
        A(I,J) = A(I+1,J-1) + B(I+1,J-1)
        B(I,J) = A(I,J) + B(I+1,J-1)
      END FORALL
                                                                \EDOC


\paragraph{Example 5}
                                                                \CODE
      REAL A(N,N), B(N,N)
!HPF$ TEMPLATE T(N,N)
!HPF$ ALIGN WITH T:: A, B

!HPF$ EXECUTE (I,J) ON_HOME T(I+1,J-1)
      A(1:N-1,2:N) = A(2:N,1:N-1) + B(2:N,1:N-1)
                                                                \EDOC

   
\paragraph{Example 6}

The original program for this example is due to Michael Wolfe of Oregon 
Graduate Institute.

This program performs matrix multiplication \(C = A \times B\).
In each step, array B is rotated by row-blocks, multiplied
diagonal-block-wise in parallel with A, results are accumulated in C 

Note that without the EXECUTE-ON-HOME and LOCAL directive, the compiler
will have a hard time to figure out all A, B and C accesses are actually
local, thus it is unable to generate the best efficient code (i.e.
communication-free and no runtime checking in the parallel loop body).
 
                                                                \CODE
      REAL A(N,N), B(N,N), C(N,N)

      PARAMETER(NOP = NUMBER_OF_PROCESSORS())
!HPF$ REALIGNABLE B
!HPF$ TEMPLATE T(2*N,N)               ! to allow wrap around mapping
!HPF$ ALIGN (I,J) WITH T(I,J):: A, C      
!HPF$ ALIGN B(I,J) WITH T(N+I,J)
!HPF$ DISTRIBUTE T(CYCLIC(N/NOP),*)   ! distributed by row blocks

      IB = N/NOP

      DO IT = 0, NOP-1

      ! rotate B by row-blocks
!HPF$ REALIGN B(I,J) WITH T(N-IT*IB+I,J)  

!HPF$ INDEPENDENT                     ! data parallel loop
        DO IP = 0, NOP-1     
!HPF$ EXECUTE (IP) ON_HOME T(IP*IB+1,1); LOCAL A, B, C
          ITP = MOD( IT+IP, NOP )

          DO I = 1, IB
            DO J = 1, N
              DO K = 1, IB
                C(IP*IB+I,J) = C(IP*IB+I,J) +                      
     1                         A(IP*IB+I,ITP*IB+K)*B(ITP*IB+K,J)
              ENDDO  ! K 
            ENDDO  ! J 
          ENDDO  ! I 
        ENDDO  ! IP

      ENDDO  ! IT
                                                                \EDOC

\alternative B
\subsection{Proposal for a Statement Grouping Syntax and ON Clause}

\footnote{Version of October 5, 1992 -- Clemens-August Thole, GMD-I1.T,
Sankt Augustin.
This section has not been discussed.}
I agree with Tin-Fook, that something like the on-clause should be 
contained in HPF. I brought a proposal with me to the last HPF meeting 
which was distributed by Chuck, but neither the FORALL working group
nor the plenary had time to discuss the proposal.

I would appreciate comments on the various features.

\subsubsection{Introduction}

This proposal introduces an extension to HPF to group several 
statements in order to be able to specify properties for a whole block
of statement at once. A block of statements is called HPF-section.
HPF-sections can be used to describe properties for independent execution
between blocks of statements aswell as the mapping of their execution.

For the specification of a specific mapping of the execution of statements
or HPF-sections the ON-clause is introduced. A subset of a template is used
as reference object onto which the statements are mapped in an canonical
manner. The careful selection of the reference template allows to specify,
how the execution of the code is mapped onto the parallel architecture.


\subsubsection{HPF-sections}

The HPF directives SECTIONS, SECTION, and END SECTIONS are used to specify
grouping of statements. SECTIONS and END SECTIONS specify the beginning
and end of a list of HPF-sections and SECTION the beginning of the next 
HPF-section. The syntax is as follows:
                                                                \BNF
hpf-block \IS        !HPF$ SECTION
                [HPF-section-list]
        !HPF$ END SECTIONS

hpf-section \IS        !HPF$ SECTION
                [execution-part]
                                                                \FNB
\noindent Constraint: For any {\em hpf-section} under no circumstances a 
transfer of control
is performed during the execution of the code outside of its 
{\em execution-part}.

\paragraph{Example}
                                                                \CODE
        !HPF$ SECTIONS
        !HPF$ SECTION
                A = A + B
                B = C + D
        !HPF$ SECTION
                E = B
                IF (E.GT.F) GOTO 10
                        E = 0D0
         10     CONTINUE
        !HPF$ END SECTIONS
                                                                \EDOC
This example specifies a list of two HPF-sections. The control statement in
the second HPF-section is valid because after the transfer of control the
execution continues in the same HPF-section.


\subsubsection{ON-clause}

The ON-clause specifies a subsection of a template, which is used as a reference
object for the execution of the next statement, construct, of HPF-section.
If the left-hand-side of an assignment coinsides in shape with the reference
object, the evaluation of the right-hand-side and the assignment for 
a specific element of the left-hand-side is performed at that processor, onto
which the corresponding element of the reference object is mapped.

\paragraph{Syntax}

Add the following rules:
                                                                \BNF
executable-construct \IS        !HPF$ ON on-spec
                executable-construct

hpf-section \IS        !HPF$ ON on-spec
                hpf-section
        
on-spec \IS        align-spec
                                                                \FNB
The {\it executable-construct} of {\it hpf-section} is called on-clause-target.

\paragraph{Constraints}
\begin{enumerate}
\item No {\it executable-construct} may be used as object of the on-clause,
which
   generates any transfer of control out of the construct itself. This
   includes the entry-statement. 
\item {\it Statement-block}s used in constructs must fulfill the constraints of
   HPF-sections.
\item The shape of the {\it on-spec} must cover in each dimension the shape of
   of any left-hand-side of an assignment statement, which is target of an
   on-clause. If a "*" is used in the {\it on-spec}, this dimension is skipped
   for constructing the shape of the {\it on-spec}.
\item If an on-clause is contained in the on-clause-target, the new {\it
on-spec}
   must be a subsection of the {\it on-spec} of the outer on-clause.
\end{enumerate}

\paragraph{Example}
                                                                \CODE
                REAL, DIMENSION(n) :: a, b, c, d
        !HPF$   TEMPLATE grid(n)
        !HPF$   ALIGN WITH grid :: a, b, c, d

        !HPF$   ON grid(2:n)
                a(1:n-1) = a(2:n) + b(2:n) + c(2:n)
                                                                \EDOC
The on-clause indicates, that the evaluation of the right-hand-side is 
performed on that processors, which hold the data elements of the 
right-hand-side. For the assignment to the left-hand-side data movement is
necessary.

\paragraph{Interpretation}

The interpretation of the on-clause depends on the type of the on-clause-target.

If the on-clause-target is an assignment statement the {\it on-spec} is used to
determine where the assignment statement is executed. If the shape of the 
right-hand-side is identically to the shape of {\it on-spec}, the
computation for
a specific element of the assignment statement is performed where the 
corresponding element of the {\it on-spec} is mapped to. If the shape of the 
{\it on-spec} is larger, the compiler may use any sufficient larger subsection.
The use of "*" in the {\it on-spec} specifies, that the same computations are
mapped onto the corresponding line of processors and several processors
will do the same update. This may save communication operations.
The the case of the where-statement, the forall-statement, and the 
forall-construct the same mapping is applied to the evaluation of the 
conditions and each assignment.

If the on-clause is placed in front of the if-construct, that case-construct,
or the do-construct, the {\it on-spec} is used for the evaluations of the 
conditions as well as the loop bounds and the execution of the statement-blocks,
which are part of the construct. For the statement-blocks the interpretation 
rules for HPF-sections apply.

With respect to the allocate, deallocate, nullify, and I/O related statements
the {\it on-spec} is used for the evaluation of the parameters of the statements
and the evaluation of I/O objects. 

In the case of subroutine calls and functions the {\it on-spec} is used for the
evaluation of the parameters. It determines also the mapping of the resulting 
object. The {\it on-spec} determines also the set of processors, which will be
used for the evaluation of the subroutine. 

In the case of HPF-sections the on-clause is applied to each statement of the
execution part. Control transfer statements are allowed in this case and the 
constraints ensure, that the context on the same {\it on-spec} is not lost.

\paragraph{Additional example}
                                                                \CODE
        REAL, DIMENSION(n,n) :: a, b, c, d
!HPF$   TEMPLATE grid(n,n)
!HPF$   ALIGN WITH grid :: a, b, c, d

!HPF$   ON grid(2:n,2:n)
        DO i=2,n
!HPF$       ON grid(i,2:n)
            DO j=2,n
!HPF$           ON grid(i,j)
                a(i-1,j-1) = a(i,j) + b(i,j)*c(i,j)
            ENDDO
        ENDDO
                                                                \EDOC

\paragraph{Comment}

The compiler should be able to adjust the span of the loops to the local 
extent 
due to the restrictions on the specifiers of the sections of the {\it 
on-spec}.


\subsection{ALLOCATE in FORALL}

\label{forall-allocate}

\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation.
At the September 10-11 meeting, this was not included as part of the
FORALL because it seemed too big a leap from the allowed assignment
statements.}
Proposal:  ALLOCATE, DEALLOCATE, and NULLIFY statements may appear
	in the body of a FORALL.

Rationale: these are just another kind of assignment.  They may have
	a kind of side effect (storage management), but it is a
	benign side effect (even milder than random number generation).

Example:
                                                            \CODE
      TYPE SCREEN
        INTEGER, POINTER :: P(:,:)
      END TYPE SCREEN
      TYPE(SCREEN) :: S(N)
      INTEGER IERR(N)
      ...
!  Lots of arrays with different aspect ratios
      FORALL (J=1:N)  ALLOCATE(S(J)%P(J,N/J),STAT=IERR(J))
      IF(ANY(IERR)) GO TO 99999
                                                            \EDOC

\subsection{Generalized Data References}

\label{data-ref}

\footnote{Version of July 28, 1992 
- Guy Steele, Thinking Machines Corporation.
This was not acted on at the September 10-11 meeting because the
FORALL subgroup wanted to minimize changes to the Fortran~90 standard.}
Proposal:  Delete the constraint in section 6.1.2 of the Fortran 90
	standard (page 63, lines 7 and 8):
\begin{quote}
	Constraint: In a data-ref, there must not be more than one
		part-ref with nonzero rank.  A part-name to the right
		of a part-ref with nonzero rank must not have the
		POINTER attribute.
\end{quote}

Rationale: further opportunities for parallelism.

Example:
                                                                     \CODE
TYPE(MONARCH) :: C(N), W(N)
      ...
! Munch that butterfly
C = C + W * A%P		! Illegal in Fortran 90
                                                                      \EDOC


\subsection{FORALL with INDEPENDENT Directives}
\label{begin-independent}

\footnote{Version of July 21, 1992) - Min-You Wu.
This was rejected at the FORALL subgroup meeting on September 9, 1992,
because it only offered syntactic sugar for capabilities already in
the FORALL INDEPENDENT.  It was also suggested that the BEGIN
INDEPENDENT syntax
should be reserved for other uses, such as MIMD features.}
This proposal is an extension of Guy Steele's INDEPENDENT proposal.
We propose a block FORALL with the directives for independent 
execution of statements.  The INDEPENDENT directives are used
in a block style.  

The block FORALL is in the form of
                                                         \CODE
      FORALL (...) [ON (...)]
        a block of statements
      END FORALL
                                                         \EDOC
where the block can consists of a restricted class of statements 
and the following INDEPENDENT directives:
                                                         \CODE
!HPF$BEGIN INDEPENDENT
!HPF$END INDEPENDENT
                                                         \EDOC
The two directives must be used in pair.  
A sub-block of statements 
parenthesized in the two directives is called an {\em asynchronous} 
sub-block or {\em independent} sub-block.  
The statements that are 
not in an asynchronous sub-block are in {\em synchronized} sub-blocks
or {\em non-independent} sub-block.  
The synchronized sub-block is 
the same as Guy Steele's synchronized FORALL statement, and the 
asynchronous sub-block is the same as the FORALL with the INDEPENDENT 
directive.  
Thus, the block FORALL
                                                          \CODE
      FORALL (e)
        b1
!HPF$BEGIN INDEPENDENT
        b2
!HPF$END INDEPENDENT
        b3
      END FORALL
                                                           \EDOC
means roughly the same as
                                                           \CODE
      FORALL (e)
        b1
      END FORALL
!HPF$INDEPENDENT
      FORALL (e)
        b2
      END FORALL
      FORALL (e)
        b3
      END FORALL
                                                          \EDOC
														  
Statements in a synchronized sub-block are tightly synchronized.
Statements in an asynchronous sub-block are completely independent.
The INDEPENDENT directives indicates to the compiler there is no 
dependence and consequently, synchronizations are not necessary.
It is users' responsibility to ensure there is no dependence
between instances in an asynchronous sub-block.
A compiler can do dependence analysis for the asynchronous sub-blocks
and issue an error message when there exists a dependence or a warning
when it finds a possible dependence.

\subsubsection{What does ``no dependence between instances" mean?}

It means that no true dependence, anti-dependence,
or output dependence between instances.
Examples of these dependences are shown below:
\begin{enumerate}
\item True dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i) = ... 
        ...  = x(i+1)
      END FORALL
                                                            \EDOC
Notice that dependences in FORALL are different from that in a DO loop.
If the above example was a DO loop, that would be an anti-dependence.

\item Anti-dependence:
                                                            \CODE
      FORALL (i = 1:N)
        ...  = x(i+1)
        x(i) = ...
      END FORALL
                                                            \EDOC

\item Output dependence:
                                                            \CODE
      FORALL (i = 1:N)
        x(i+1) = ... 
        x(i) = ...
      END FORALL
                                                            \EDOC
\end{enumerate}

Independent does not imply no communication.  One instance may access 
data in the other instances, as long as it does not cause a dependence.  
The following example is an independent block:
                                                            \CODE
      FORALL (i = 1:N)
!HPF$BEGIN INDEPENDENT
        x(i) = a(i-1)
        y(i-1) = a(i+1)
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC

\subsubsection{Statements that can appear in FORALL}

FORALL statements, WHERE-ELSEWHERE statements, some intrinsic functions 
(and possibly elemental functions and subroutines) can appear in the
FORALL:
\begin{enumerate}
\item FORALL statement
                                                            \CODE
      FORALL (I = 1 : N)
        A(I,0) = A(I-1,0)
        FORALL (J = 1 : N)
!HPF$BEGIN INDEPENDENT
          A(I,J) = A(I,0) + B(I-1,J-1)
          C(I,J) = A(I,J)
!HPF$END INDEPENDENT
        END FORALL
      END FORALL
                                                            \EDOC

\item WHERE
                                                            \CODE
      FORALL(I = 1 : N)
!HPF$BEGIN INDEPENDENT
        WHERE(A(I,:)=B(I,:))
          A(I,:) = 0
        ELSEWHERE
          A(I,:) = B(I,:)
        END WHERE
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
\end{enumerate}


\subsubsection{Rationale}

\begin{enumerate}
\item A FORALL with a single asynchronous sub-block as shown below is 
the same as a do independent (or doall, or doeach, or parallel do, etc.).
                                                            \CODE
      FORALL (e)
!HPF$BEGIN INDEPENDENT
        b1
!HPF$END INDEPENDENT
      END FORALL
                                                            \EDOC
A FORALL without any INDEPENDENT directive is the same as a tightly 
synchronized FORALL.  We only need to define one type of parallel 
constructs including both synchronized and asynchronous blocks.  
Furthermore, combining asynchronous and synchronized FORALLs, we 
have a loosely synchronized FORALL which is more flexible for many 
loosely synchronous applications.

\item With INDEPENDENT directives, the user can indicate which block
needs not to be synchronized.  The INDEPENDENT directives can act 
as barrier synchronizations.  One may suggest a smart compiler 
that can recognize dependences and eliminate unnecessary 
synchronizations automatically.  However, it might be extremely 
difficult or impossible in some cases to identify all dependences.  
When the compiler cannot determine whether there is a dependence, 
it must assume so and use a synchronization for safety, which 
results in unnecessary synchronizations and consequently, high 
communication overhead.
\end{enumerate}


\end{document}


From @ecs.soton.ac.uk,@camra.ecs.soton.ac.uk:jhm@ecs.southampton.ac.uk  Mon Oct 19 03:45:44 1992
Received: from sun2.nsfnet-relay.ac.uk by titan.cs.rice.edu (AA29117); Mon, 19 Oct 92 03:45:44 CDT
Via: uk.ac.southampton.ecs; Mon, 19 Oct 1992 09:45:01 +0100
Via: camra.ecs.soton.ac.uk; Sun, 18 Oct 92 20:45:26 BST
From: John Merlin <jhm@ecs.soton.ac.uk>
Received: from bacchus.ecs.soton.ac.uk by camra.ecs.soton.ac.uk;
          Sun, 18 Oct 92 20:53:16 BST
Date: Sun, 18 Oct 92 20:49:36 BST
Message-Id: <11549.9210181949@bacchus.ecs.soton.ac.uk>
To: hpff-forall@cs.rice.edu
Subject: Proposed additions to pure proposal!

The purpose of the updated 'Pure Procedures' draft distributed by 
Chuck last week was to fix technical bugs in the earlier draft, as
requested by the HPF FORALL group at the last meeting.  Hopefully
it is now technically correct and complete, in that the definition 
given should guarantee side-effect freedom, and takes full account 
of Fortran 90 issues such as pointers, procedure arguments, and 
defined operations and assignments.

However, there are a couple of small additions that I'd like to propose 
for the 'Pure procedures' proposal.  I didn't put them in the latest 
draft as they haven't been discussed yet by the HPF and I didn't want 
to confuse the issue.  Therefore I'll propose and justify them here.  
I'd be grateful if they could be given a 'first reading' at this 
week's meeting and, if passed (provisionally at least) I'll put
them into the 'Pure procedures' document before the next meeting, 
in time for a 'second reading'.

(Note -- the description here is just to convey the general ideas and
isn't carefully worded - it will be carefully considered and
'polished' before insertion into the draft.  I first describe the
two new features, then explain and justify them.)


Proposed additions to 'Pure Procedures' proposal
------------------------------------------------
Addition #1:
------------
Introduce a couple of new directives, for use in the specification-part
of a pure procedure interface and definition:

	!HPF$ READS_GLOBAL
	!HPF$ READS_DISTRIB_GLOBAL

(or something similar!).

Constraint: If a pure procedure references any global data object
(i.e. variable in a common block or module) with explicit, non trivial
distribution, or references a procedure that may do so, then its 
'specification-part' must contain the 'READS_DISTRIB_GLOBAL' directive.


[ N.B. By 'explicit, non trivial distribution' I mean a declared
alignment or distribution which may result in the variable being
distributed over multiple processors.  I'll define this more carefully 
in the final proposal - it obviously depends on the exact form finally 
chosen for the mapping directives!). ]


Constraint: If a pure procedure references global data objects,
none of which has explicit, non trivial distribution, or references
a procedure that may do so, then its 'specification-part' must contain 
a 'READS_GLOBAL' directive.


[ This means it reads global data which is not distributed -- e.g. it
may be replicated, stored in global memory, etc, depending on implementation.
     The clauses about procedure refs mean that, if a pure procedure
calls any procedure whose interface contains a 'READS_DISTRIB_GLOBAL' 
directive, then it must contain one;  otherwise, if it calls any 
procedure whose interface contains a 'READS_GLOBAL' directive, then it 
must contain one.
     Incidentally, as an alternative to having separate directives like
this, I'd be equally happy to have 3 forms of the pure-directive, e.g.:

	pure-directive   \IS  !HPF$ PURE  [procedure-name]
	                 \OR  !HPF$ PURE_GLOBAL  [procedure-name]
	                 \OR  !HPF$ PURE_DISTRIB_GLOBAL  [procedure-name]

to achieve the same effect.  I'm open to suggestions!
     Note that these declarations make no difference to where a
pure procedure may be used - it may still be used in FORALL, elementally,
etc, regardless of whether it accesses global data.  The latter only
concerns implementation and optimisation issues - i.e. they guide the
compiler in the same way as 'SEQUENCE' does.  The implication is
that all side-effect free procedures are pure, but some are purer than
others!]

Addition #2:
------------
Allow some limited forms of data mapping directives for dummy arguments
and local variables within pure procedures (the current draft totally
excludes them).  The exact rules have yet to be decided (based partly on
the outcome of the next meeting) but would be something like:

	A dummy argument may be subject to an  "ALIGN WITH *"  directive;

	A local variable may be subject to an  "ALIGN WITH data-object"
directive, where the 'data-object' is a dummy argument or another local
variable (*not* a template).

Guidance in the use of ALIGN directives for dummy arguments: 
it is recommended that, if a dummy argument may be associated with a
array-valued actual argument that has explicit, non trivial distribution,
then the dummy argument is specified by an  "ALIGN WITH *"  directive.

[This is only a recommendation - correct code will be generated anyway.
However, the generated code may be more efficient if this guideline is
followed.]


Addition 1 - explanation
------------------------
The whole motivation of pure procedures is that they may be invoked
concurrently (e.g. in a FORALL or an elemental reference), each 
invocation operating on a segment of data, with the procedure only 
necessarily being invoked on the processors that own the segment of 
data.  Pure procedures are constrained so that concurrent invocation is 
(i) secure for the programmer and (ii) can be implemented efficiently 
and securely.

The current constraints ensure side-effect freedom, satisfying 
requirement (i).

However, pure procedures can access arbitrarily distributed global
(i.e. common block or module) data.  For such procedures to be
executed concurrently requires global memory (at least for reading), 
either provided by hardware or simulated by software.  
A 'naive' message-passing implementation (i.e. one that has neither 
hardware nor software global memory accesses) would be unable
to support concurrent execution in these circumstances.  Concurrent 
references could still be implemented, but by sequentialisation.

If such a 'naive' HPF implementation does not know whether a procedure 
that is referenced elementally or in FORALL uses arbitrarily distributed 
data, it must make a worst case assumption, and sequentialise the FORALL 
or elemental reference.  Thus, one is lead to the conclusion that all 
elemental references and FORALLs containing function calls must be 
sequentialised.  To avoid this, I would like the procedure interface 
to contain a "!HPF$ READS_DISTRIB_GLOBAL" directive to flag this case 
-- if the directive isn't present, the procedure can be executed 
concurrently.  (This is analogous to the  "SEQUENCE" directive, whose 
presence restricts data distribution).

(Some might argue that the implementation must perform inter-procedural 
analysis to establish whether a procedure reads distributed data.  
However, in my opinion it would be a very bad language design to rely 
on inter-procedural analysis in Fortran.  It may not always be possible, 
and even if it is, it may slow down the compilation severely
- the whole source file must be analysed before any procedure can be
compiled.  Indeed, the whole point of interface blocks in F90 is to 
avoid the necessity for inter-procedural analysis!)

The directive:
	!HPF$ READS_GLOBAL
asserts that a pure procedure accesses common block or module data,
but it isn't explicitly distributed.  (For a 'naive' message-passing
implementation, this would typically mean that it's replicated
and thus available everywhere; systems with hardware or software
global memory support could of course just keep one global copy;
either way, it means that the accessed data is available to all processes
and hence does not inhibit concurrent execution.)  In this case,
concurrent execution is possible, but the global data accesses might
result in dependences which may inhibit certain optimisations.
For example, consider:

	FORALL (I=1:N) A(I) = FUNC ( B(I) )  ! A and B distributed

where FUNC() accesses A though a common block:

	FUNCTION FUNC (X)
	  !HPF$ PURE FUNC
          REAL, INTENT(IN) :: X
	  COMMON /N/ A(N)

	  ... reads elements of A
	END FUNCTION FUNC

The references to A in FUNC() causes a dependence which means that
the rhs of the FORALL must be fully evaluated before any assignment
takes place, i.e. the FORALL must effectively be implemented as:

	FORALL (I=1:N) TEMP(I) = FUNC ( B(I) )
	FORALL (I=1:N) A(I) = TEMP(I)

- the optimisation of assigning the rhs directly to A(I) cannot be
performed.  If FUNC's interface contains neither a "READS_DISTRIB_GLOBAL"
nor a "READS_GLOBAL" directive, this means it doesn't mask any
data dependences, so this optimisation can be performed (subject to
dependence analysis of the FORALL stmt itself).

By the way, I've toyed with ideas for giving more precise information
than these proposed directives provide, e.g. by requiring that pure
procedure interface blocks also specify the global variables or common 
blocks accessed, or by providing information in the directives about 
which global variables are accessed.  I dismissed these as too complicated,
particularly considering the possibility of pure procedure call trees.


Addition 2 - explanation
------------------------
The motivation for this proposal is that a pure function in FORALL 
can have distributed actual arguments, e.g.

	REAL A (N,N)
	!HPF$ DISTRIBUTE A (BLOCK, BLOCK)

	FORALL (I=1:N) ... FUNC (A(I,:), I) ...

Here FUNC operates on rows of A concurrently, perhaps doing a different 
thing to each row, depending on its index I (MIMD parallelism!).  
For each call to FUNC, its first argument is a distributed vector, 
which is ok, since FUNC is invoked in all processes holding segments 
of the vector.

The current proposal disallows any data mapping directives for
dummy arguments and local data.  This is because the dummy arguments
may have a different distribution for each invocation, so their
distribution must be 'assumed' from the actual arguments.
The distribution of the local data must be tied to the processors 
on which the procedure is invoked -- it no good specifying that a 
local variable is owned by processor (1) if the function is only invoked 
on processor (5)!  Thus it is left to the implementation to determine 
a suitable mapping of the local variables, which must be aligned 
somehow with the dummy arguments (rather than just dumped anywhere!).

The problem with this is that when a pure procedure is compiled,
the compiler has no way of knowing whether an array dummy argument is
associated with a distributed or non-distributed actual.  In the former 
case it must generate the most general message-passing code, in the
latter the code can be much more compact and efficient.  An implementation
could choose to do either -- it could always generate the most general 
code, or it could always assume that the actual argument is undistributed, 
in which case it must arrange for the whole of each actual argument to
be present on each process that calls the procedure -- which may 
be inefficient on the caller's side.  If it assumes that the dummies 
can be distributed, then it must choose how to align the local variables
with the dummies, which can require complex analysis (which the HPF
mapping directives are meant to avoid).

Therefore, I'd like a way for the programmer to be able to hint to the
compiler which dummies may be distributed, and how locals are to
be aligned.  The latter is obvious -- locals can be explicitly
aligned to dummies (but not to anything else, e.g. templates, which 
might cause locals to be allocated to processors that aren't involved
in the procedure call).  The former is more difficult -- I suggest that

	ALIGN dummy WITH *

could be used to give this hint (but I'd like to think about it some
more!).  An implementation could assume that dummies with this
declaration may be arbitrarily distributed, and others are 
undistributed.  Of course, no error is caused if the user gives 'wrong'
information, since when a call to the procedure is compiled, the 
implementation knows what assumptions were made (via the interface block)
and so can arrange the distribution or non-distribution of the actuals 
accordingly.

-- John Merlin.

From haupt@npac.syr.edu  Mon Oct 19 14:23:28 1992
Received: from erato.cs.rice.edu by titan.cs.rice.edu (AA11688); Mon, 19 Oct 92 14:23:28 CDT
Received: from gemini.npac.syr.edu by erato.cs.rice.edu (AA16729); Mon, 19 Oct 92 14:23:18 CDT
Received: by gemini.npac.syr.edu id AA10144
  (5.65c/IDA-1.4.4 for hpff-forall@erato.cs.rice.edu); Mon, 19 Oct 1992 15:23:08 -0400
Date: Mon, 19 Oct 1992 15:23:08 -0400
From: Tomasz Haupt <haupt@npac.syr.edu>
Message-Id: <199210191923.AA10144@gemini.npac.syr.edu>
To: hpff-forall@erato.cs.rice.edu
Subject: new FORALL proposal
Cc: haupt@npac.syr.edu


The present draft, (v.02)  has several serious drawbacks. The most important 
are:

- The same directive (PURE) has a different meaning if applied to functions
  and subroutines. We found it unacceptable therefore we proposed used two
  distinct directives: PURE and INDEPENDENT

- directive INDEPENDENT (to annotate independent do loop, as opposed to our 
  INDEPENDENT procedures) is not a mature concept. First of all, since it
  is semantical extension to 'regular' DO loops, the same way as FORALL is,
  it deserves a similar treatment as FORALL. It is not consistent to have
  a part of the G. Steele pictures being served by a new statement (FORALL)
  and others by HPF directive. It is confusing. Moreover, semantics of
  independent loops allows for private, temporary, scalar variables. On the
  other hand it is next to impossible to introduce such features in HPF 
  without significant revisions of the HPF idea or without introducing  
  a whole bunch of new syntactic features. We want to avoid any of them. 
  The solution is surprisingly simple: independent procedures. In this way 
  both HPF computational model and 30-years old tradition of Fortran scope of 
  variables is preserved.   

- We want both user defined elemental functions and independent loops. 
  We found it wrong, however, that restrictions imposed on user
  defined elemental function are the same as restrictions imposed on 
  functions to be executed as an independent loop. Some of the restrictions on
  independent procedures can be relaxed, notably mandatory intent IN for
  arguments.  

- finally, we found that some opportunities of parallel execution has been
  overlooked in the current HPF proposal. This is why we introduced local
  functions (which are part of HPF in contrast to FOREIGN procedures). 
  The local function provides some support for reductions which we were 
  talking about. We have not find a way to introduce more powerful solutions
  without explicit reference to the machine architecture, and we want to 
  avoid it.
  
Although the proposal of a new version of FORALL chapter is made using the
previous one (partially approved) as a template, it is essentially a new
document. It is meant to be discussed as a first reading to get acceptance 
of HPFF as an general idea. I do not believe that it is flawless, 
sufficiently precise and comprehensive. Some omissions, inaccuracies, 
misspellings, etc. are unavoidable at this stage. We believe that at this 
stage the first part of it, the overview of our proposal is the most 
important. The detailed definitions of the proposed features given in the 
other part of the proposal serve as a prototype of what we expect to see in 
the final version.

Finally, please note that although we call our proposal 'an alternative
version of the chapter 4', it is actually an extension of the original one,
addressing our concerns explained above. Thus, it is our vision of how our
ideas may be incorporated in the original proposal. 

The outline ouf our proposal is as follows:

1) three categories of procedures (on top of 'regular' and external/foreign):
    PURE (as before)
    INDEPENDENT (less restrictions; PURE is a special case of INDEPENDENT)
    LOCAL (but not FOREIGN)

    the LOCAL and INDEPENDENT functions may be referenced in FORALL   

2) FORALL statement, as it has been originally introduced with a
   modification that INDEPENDENT and LOCAL procedures are allowed 

3) FORALL construct with FORALL-ELSEFORALL option instead of FORALL+WHERE

4) No INDEPENDENT DO, ASYNCH DO, INDEPENDENT FORALL, etc. This functionality
   is achieved by using FORALL + INDEPENDENT procedure

5) LOCAL functions in FORALL to address embarrassingly parallel problems
   and reductions

6) Elemantal PURE functions (as before)


I believe that it is a good idea to circulate our proposal as a separate
document.

We hope that discussion of our proposal will lead to the best possible 
conclusion: the HPF definition that HPFF wants.

                                  Tom Haupt, Syracuse


From chk@erato.cs.rice.edu  Tue Jan 26 22:41:40 1993
Received: from erato.cs.rice.edu by titan.cs.rice.edu (AA01848); Tue, 26 Jan 93 22:41:40 CST
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08481); Tue, 26 Jan 93 22:41:23 CST
Message-Id: <9301270441.AA08481@erato.cs.rice.edu>
To: hpff@erato.cs.rice.edu
Cc: hpff-core@erato.cs.rice.edu, hpff-distribute@erato.cs.rice.edu,
        hpff-forall@erato.cs.rice.edu, hpff-io@erato.cs.rice.edu,
        hpff-f90@erato.cs.rice.edu, hpff-intrinsics@erato.cs.rice.edu
Word-Of-The-Day: salariat : (n) the class of salaried workers
Subject: HPF Language Specification, version 1.0
Date: Tue, 26 Jan 93 22:41:22 -0600
From: chk@erato.cs.rice.edu


It's available!  (For sure from titan.cs.rice.edu; availability from
other sites will depend on how fast e-mail travels and how dedicated
administrators at other sites are.)  Below are the "standard"
announcement and call for comments.

Many thanks to everyone involved in producing this document, including
(but not limited to!):
	The HPFF working group.
	People who commented on version 0.4 of the spec.
	People who attended (and asked questions at) many
		presentations, including the Supercomputing '92 workshop.
	Our friendly funding agencies: DARPA, NSF, ESPRIT, and the
		employers who bankrolled most of the HPFF committee
		members.
Special thanks to David Loveman, who edited the document.

						Chuck Koelbel
						Executive Director, NSF

----------------------------------------------------------------

The most recent draft of the High Performance Fortran Language
Specification is version 1.0 Darft, dated January 25, 1993.  See
"Version History" below for a description of the changes.

How to Get the High Performance Fortran Language Specification
==============================================================

There are three ways to get a copy of the draft:

	1. Anonymous FTP: The most recent draft is available on 
	   titan.cs.rice.edu in the directory public/HPFF/draft.
	   Several files are kept there, including compressed
	   Postscript files of previous versions of the draft.  The
	   most current version of this draft is 0.4, which can be
	   retrieved as a tar file containing LaTeX source
	   (hpf-v10.tar) or in Postscript format (hpf-v10.ps);
	   both of these are also available as compressed files.
	   Several other sites also have the draft available in one or
           more formats, including think.com, ftp.gmd.de,
	   theory.tc.cornell.edu, and minerva.npac.syr.edu.

	2. Electronic mail: The most recent draft is available from
	   the Softlib server at Rice University.  This can be
	   accessed in two ways:
	     A. Send electronic mail to softlib@cs.rice.edu with "send 
		hpf-v10.ps" in the message body. The report is sent as a 
		Postscript file.
	     B. Send electronic mail to softlib@cs.rice.edu with "send 
		hpf-v10.tar.Z" in the message body. The report is
		sent as a uuencodeded compressed tar file containing
		LaTeX source.
             C. Send electronic mail to netlib@ornl.gov with "send
                hpf-v10.ps from hpf" in the message body.  The report
                is sent as a Postscript file.  This site also has the
                LaTeX source of the draft; use "send index from hpf"
                to see the file names.
             D. Send electronic mail to netlib@research.att.com with
	        "send hpf-v10.ps from hpff" in the message body.  The
		report is sent as a Postscript file.
	   (In all cases, the reply is sent as several messages to
	   avoid mailer restrictions; edit the message bodies together
	   to obtain the whole file.)  The same files can be obtained
	   from David Loveman (loveman@mpsg.enet.dec.com) and Chuck
	   Koelbel (chk@cs.rice.edu), but replies will take longer
	   because real people have to answer the mail.

	3. Hardcopy: The most recent draft is available as technical report 
	   CRPC-TR 92225 from the Center for Research on Parallel
	   Computation at Rice University.  Send requests to
		Theresa Chatman
		CITI/CRPC, Box 1892
		Rice University
		Houston, TX 77251
	   There is a charge of $50.00 for this report to cover copying and 
	   mailing costs.

Disclaimers
===========

A few caveats about the HPF draft:

	A. The current version contains some material that is still
	   under active discussion.  Changes will be fairly frequent
	   until at least December 1992.  New versions will be
           announced on the HPFF mailing list and in the newsgroups
	   comp.parallel, comp.lang.misc, and comp.lang.fortran.

	B. The HPF Language Specification does not necessarily
	   represent the official view of any individual, company,
	   university, government, or other agency.

	C. Please address any questions, comments, or possible
	   inconsistencies in the draft to hpff-comments@cs.rice.edu.
	   Include the chapter number you are commenting on in the
	   "Subject:" line of the message.


Version History
===============

Version 0.1:
August 14, 1992
EXTREMELY preliminary version.  

First collection of the proposals active in the High Performance Fortran 
Forum.  Established much of the outline for later documents, and 
represented most decisions made through the July HPFF meeting.


Version 0.2:
September 9, 1992
Version discussed at the September 10-11 HPFF meeting

Changes:
General cleaning up of version 0.1.
Inclusion of most new proposals at that time.


Version 0.3:
October 12, 1992
Version discussed at the October 22-23 HPFF meeting

Changes:
Numerous minor and major changes due to discussions at the September meeting.
Added a section on "Model of Computation".
Presented alternate chapters for data distribution with and without
templates.
Added two proposals for ON clauses specifying where computation is to
be executed.
Added distribution inquiry intrinsics.
Total rewrite of I/O material, sending most previous material to the
Journal of Development.


Version 0.4:
November 6, 1992
Version to be presented at Supercomputing '92

Changes:
Numerous minor and major changes due to discussions at the October
meeting.
"Acknowledgements" section now much more accurate.
"The HPF Model" (replacing "Model of Computation") substantially
simplified and improved.
"Distribution without Templates" chapter removed.
Many proposals not adopted moved to "Journal of Development".


Version 1.0:
January 25, 1993
Draft final version

Changes:
Many changes for clarity or pedagogical reasons.
The examples in several sections have been significantly enlarged.
INHERIT (for dummy arguments) added to distribution chapter.
Pure procedures may now have dummy arguments with explicit
distributions, if those distributions are inherited from the caller.
Changed the names of the new reductions AND, OR, and EOR to IALL,
IANY, and IPARITY.
Clarified the status of the character array language to be not in the
subset, and as a result, removed the character array intrinsics.
Only very restricted forms of alignment subscript expressions (of the
form \(m*i + n\) where \(m\) and \(n\) are integer expressions) are
part of the subset.
[Bibliography] Correctly spelled ``Mehrotra'' and ``Gerndt''.


----------------------------------------------------------------

REQUEST FOR PUBLIC COMMENT ON HIGH PERFORMANCE FORTRAN

To: The High Performance Computing Community

Invitation:

The High Performance Fortran Forum (HPFF), with participation from over 40 
organizations, has been meeting since January 1992 to discuss and 
define a set of extensions to Fortran called High Performance Fortran 
(HPF). Our goal is to address the problems of writing data parallel 
programs for architectures where the distribution of data impacts 
performance. While we hope that the HPF extensions become widely available, 
HPFF is not sanctioned or supported by any official standards organization. 
At this time, HPFF invites general public review comments on the initial 
version of the language draft. 

The HPF language specification, version 1.0 draft, is now available. This 
document contains all the technical features proposed for the language. 
We plan to make minor revisions to correct errors or clarify
ambiguities in March 1993, at which time we will issue a final draft;
however, we expect that there will be few (if any) major technical
changes from this draft.

HPFF invites comments on the technical content of HPF, as well as on the 
editorial presentation in the document.  To facilitate incorporation
of comments into the final document, we ask that comments be sent
before March 1, 1993.

comments, we ask that 

How to Get the Documents:

Electronic copies of the HPF language specification are available from 
numerous sources. 

    Anonymous FTP sources:      Directory:
    titan.cs.rice.edu           public/HPFF/draft
    think.com                   public/HPFF
    ftp.gmd.de                  hpf-europe
    theory.tc.cornell.edu       pub
    minerva.npac.syr.edu        public

    Email sources:              First line of message:
    netlib@research.att.com     send hpf-v10.ps from hpff
    netlib@ornl.gov             send hpf-v10.ps from hpf
    softlib@cs.rice.edu         send hpf-v10.ps    

The following formats are available (xx will be 04 or 10, depending on 
version). Note that not all formats are available from all sources. 
    hpf-v10.dvi                 DVI file
    hpf-v10.ps                  Postscript
    hpf-v10.ps.Z                Compressed Postscript
    hpf-v10.tar                 Tar file of LaTeX version
    hpf-v10.tar.Z               Compressed tar file

For more detailed instructions, send email to hpff-info@cs.rice.edu. This 
will return a message with expanded detail about accessing the above 
document sources, as well as other information about HPFF. 

We strongly encourage reviewers to obtain an electronic copy of the 
document. However, if electronic access is impossible the draft is also 
available in hard copy form as CRPC Technical Report #92225. This report is 
available for $50 (copying/handling fee) from: 

    Theresa Chatman
    CITI/CRPC, Box 1892
    Rice University
    Houston, TX 77251

Make checks payable to Rice University. This document will be sent surface 
mail unless additional airmail postage is included in the payment. 


How to Submit Comments:

HPFF encourages reviewers to submit comments as soon as possible, with a 
deadline of February 15 for consideration. Please do not submit comments 
for any version of the draft earlier than the 0.4 version. 

Please send comments by email to hpff-comments@cs.rice.edu. To facilitate 
the processing of comments we request that separate comment messages be 
submitted for each chapter of the document and that the chapter be clearly 
identified in the "Subject:" line of the message. Comments about general 
overall impressions of the HPF document should be labeled as Chapter 1. All 
comments on the language specification become the property of Rice 
University. 

If email access is impossible for comment responses, hard copy may be sent 
to 

    HPF Comments
    c/o Theresa Chatman
    CITI/CRPC, Box 1892
    Rice University
    Houston, TX 77251

HPFF plans to process the feedback received at a meeting in March. Best 
efforts will be made to reply to comments submitted. 


Sincerely,


Charles Koelbel
Rice University
HPFF Executive Director