Hpff-Core
Following is a text version of the active CCI items. There will be a 
formatted version of the full text at the meeting this week. 

These are ONLY the active items, and they are sorted by group. So don't be 
surprised that they don't start with #1 - and don't have all the numbers.

Two disclaimers:

#1 - I didn't get home with any record of action for a number of the CCI's 
discussed at the July meeting - These were the "easy" ones discussed in 
subgroups, but not in the full meeting. So unfortunately - They're Back! I 
hope someone remembers. 

#2 - In the dump and edit process from my database, MSWORD managed to do a 
weird and wonderful mangling job on the file. I think I caught all the 
problems, but if some text seems strange - just wait for the "real" copy 
that will be available at the meeting. 

And an offer:
If any of you would like the postscript version of the text (will be at 
the meeting), or a copy of the filemaker-pro database (mac), send me a 
note and I'll ship it to you.
-----------------------------------
Active CCI - for Sept. meeting
-----------------------------------

******************************************************** GROUP C ACTIVE ITEMS ******************************************************** 

Item #18
Henry Zongaro
05/04/95
No action from the July meeting was recorded ... was this resolved or not?
Action: Chuck, Guy, Henry, Jerry
Title: Defined assignment in FORALL
Group C
Updated: 7/12/95
status: in progress
-----------------
Hello,

I was wondering whether there's not a problem with allowing defined
assignment to appear within a FORALL. Consider the following example.

module mod
integer :: a(3) = (/1,2,3/)
contains
pure subroutine def_assign(lhs, rhs)
integer, intent(inout) :: lhs
character, intent(in) :: rhs

lhs = a(ichar(rhs)+1)
end subroutine def_assign
end module mod

program p
use mod
interface assignment(=)
module procedure def_assign
end interface

forall (i = 1:2) a(i) = char(i) ! A sneaky way of passing "i" 
! to def_assign
end program p

The rules of forall specify that the right-hand side and the indices of 
the left-hand side are evaluated, in any order, prior to assignment, which 
also takes place in any order. In the above example, we have 

a(1) = char(1)
a(2) = char(2)

as the two defined assignments which take place. Inside of def_assign, 
there's a host-associated reference to a, so what ends up happening is the 
following:

a(1) = a(2)
a(2) = a(3)

The order in which these assignments occurs affects the result. The value 
of a after the forall statement is executed could be (/2,3,3/) or 
(/3,3,3/).

Basically, the problem is that in defined assignment, completely 
evaluating the right-hand side for all active combinations does not 
necessarily let the compiler precompute everything which might also appear 
on the left-hand side.

Thanks,

Henry
------------------
DISCUSSION AT MEETING:

CCI #18 ...Semantics of a forall are to evaluate all rhs and then store in 
all lhs, but if the assignment operator is user defined this is under user 
control, not compiler control. The user's definition might make the order 
of assignment important. Guy queried how an array assignment was handled 
in this case. Jf a forall are to evaluate all rhs and then store in all 
lhs, but if the assignment operator is user defined this is under user 
control, not compiler control. The user's definition might make the order 
of assignment important. Guy queried how an array assignment was handled 
in this case. Jerry will take this question to X3J3 about the status with 
respect to elemental functions. Guy pointed out that for WHERE F90 forbids 
the defined assignment. Chuck presented a proposal that sounded promising: 
that the evaluation is as if the rhs were assigned into a temp using the 
defined assignment operator and then the lhs is a direct copy of the 
already evaluated values. There was discussion of issues like the type of 
the temp (same as lhs or rhs?). Action Chuck, Guy, Henry, Jerry to 
circulate proposed wording for this definition.

===================================================== 
===================================================== Item #25
Matt Rosing
6/27/95
Status: in progress
No action from July meeting - was this resolved? Group C
Title: Use of pure and independent

Question:

I have some questions about HPF semantics and implementation. These are 
based on the HPF definition I have (dated 5/93). 

I'm trying to determine if HPF can be used to implement a code we have and 
I have questions about spmd execution within HPF. The code has two phases, 
the first fits the data parallel model very well and the second does not. 
Within the second phase, independent operations are done on sections of a 
distributed array. These operations modify the array.

There are apparently two methods to describe non-data parallel code in 
HPF. This includes functions and subroutines declared pure, and loops 
declared independent. There are about five pages of constraints on the use 
of pure functions and it's not clear that I can use them. The main reason 
I believe they won't work is that pure functions are not allowed to have 
side effects and our code has each independent operation modifying a part 
of a distributed array. Is it true that pure functions can't be used for 
this?

Independent do loops, however, appear to have fewer constraints. The only 
constraint seems to be that multiple loop iterations can not write to the 
same location (I'm ignoring IO). I have a few questions to see how far I 
can push this:

1) Can an iteration modify data belonging to another processor? 

2) If so, how do implementations handle the one-sided nature of the 
communication generated by this? The reason I ask is that, if done naively 
by buffering remote writes until the loops are done, this could require as 
much space for buffers as there is space for the data structure being 
operated on.

3) Is there any limitation on the types of routines that can be called 
from within an independent loop?

4) How is the scheduling done? If there are just a bunch of subroutine 
calls within the body of the loop, how does the compiler figure out which 
processor does what?

5) Is it possible to dynamically schedule the loops on the processors for 
load balancing? Although none of the iterations would interfere, the 
scheduling mechanism would.

Some of these questions are probably outside the scope of the language 
definition but I would still appreciate answers from any implementors. The 
reason being that the resulting performance might depend heavily on the 
implementation.


Thanks,

Matt
(m_rosing@pnl.gov)

------------
Chuck Replies 6/29/95
There is a new draft of the HPF spec, available through 
http://www.erc.msstate.edu/hpff/home.html. The changes are minor 
(basically, corrections and clarifications) from the version you're 
looking at.
The entire purpose of PURE is to ensure that those functions do not have
any side effects. So, at face value, this is correct. 
You could rewrite the functions to return arrays (this assumes each 
function modifies one array). You could then call the PURE functions, 
assigning the returned values into the separate array sections. For 
example, the final code calling the functions might look like 

FORALL ( I=1:NUM_BLOCKS )
X(ILOW(I):IHI(I)) = FOO( A, B, C )
END FORALL
This would probably mean a lot of modification for the code, and it might 
break the current generation of compilers. But it would be one way to use 
PURE in this context.
>1) Can an iteration modify data belonging to another processor? 
Yes. Distribution of data has no effect on the semantics of the program. 
Note that HPF has no notion of the processor that an iteration executes 
on. (Of course, the underlying compiler should have some such idea...) 

>2) If so, how do implementations handle the one-sided nature of 
....
As you suggest below, this is something outside the scope of the language 
definition. Very good compilers will do something efficient (like 
strip-mining the loops) to ensure reasonable-sized buffers. Bad compilers 
will run out of memory and generate core dumps. 

>3) Is there any limitation on the types of routines that can be called 
from within an independent loop?
Not in the language. The only limitation is in how called routines behave, 
i.e. that they don't violate the independence conditions. 

>4) How is the scheduling done? If there are just a bunch of subroutine 
....
As with 2 above, this is beyond the scope of the language definition. You 
are correct that this will be difficult to do well. 
Incidentally, discussions of specifying scheduling mechanisms for parallel 
loops are going on in the HPFF 2 meetings. If you have opinions on the 
subject (and I know you do, Matt :-), please feel free to contribute them. 
The final language will be better for it. 

>5) Is it possible to dynamically schedule the loops on the processors 
....
A dynamic scheduling mechanism would be a valid HPF implementation. (Of 
course, it is not the only one.) Independence is not a problem here 
because any interference is between system variables; INDEPENDENT only 
makes an assertion about the user code. 
Again, discussion in the HPFF 2 meetings may be relevant to this question 
in the future.
=====Rik Littlefield replies 6/29/95========== Matt,
You write:
>2) If so, how do implementations handle the one-sided nature of 
....
If you're talking about the electronic structure codes, say creating the 
Fock matrix, then the problem is even worse. There are only O(N^2) data 
elements but something like O(N^4) updates to those elements. (At best, I 
think it's O(N^3) with clever tricks.) ========
NO ACTION FROM JULY MEETING RECORDED _ WAS THIS RESOLVED?
===================================================== 
===================================================== Item # 29
Rob Schreiber
8/3/95
status: new
Title: Calling hpf_local from independent loop Group C & E

Hello,
Question I. Can an extrinsic(hpf_local) be invoked in an independent loop?

In a Forall?

Ex:

Forall (1 = 1:10) a(i) = f(i,a(i))

Note that part of the calling sequence, as specified in Ver 1.1, appendix 
A, is
"The processors are synchronized. In other words, all actions that 
logically precede the call are completed." 

It seems clear that when this was written it was tacitly assumed that the 
call did not occur in an independent loop or forall. 

Part 2: May ony other kind of extrinsic be called in a forall or 
independent loop?

===================================================== 
===================================================== 


********************************************************* GROUP D ACTIVE 
ITEMS
******************************************************** 

Item # 8
Yasuharu Hayashi
04/25/95
status: in progress
NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT?
action: study and discuss again in July
Title: Dummy assertion asterisk
Group D
Updated: 7/13/95

Question:
I have a question about the interpretation of the assertion asterisk when 
the template of a dummy argument is a natural template. According to High 
Performance Fortran Language Specification November 10 ,1994 Version 1.1 
p.51,l.31, 
"If the dummy argument has a natural template (no INHERIT attribute) 
then things are more complicated. In certain situations the programmer is 
justified in inferring a preexisting distribution for the natural template 
......"
When a actual argument is a whole array, the text on p.51,l.35 states only
"In all these situations, the actual argument must be a whole array or 
array section, and the template of the actual must be coextensive with the 
array along any axis having a distribution format other than "*". If the 
actual argument is a whole array, then the pre-existing distribution of 
the natural template of the dummy is identical to that of the actual 
argument".
I think this description is ambiguous. For example : 
PROGRAM EX
REAL A(10,10),B(5,5)
!HPF$ PROCESSORS P(5)
!HPF$ DISTRIBUTE A(BLOCK,*) ONTO P
!HPF$ ALIGN B(*,I) WITH A(2*I,*)
CALL SUB(B)
:
END
SUBROUTINE SUB(BB)
REAL BB(5,5)
!HPF$ PROCESSORS P(5)
!HPF$ DISTRIBUTE *(*,BLOCK(1)) ONTO *P :: BB 
:
END
Is the assertion asterisks for BB in SUB HPF-conforming ? 
Isn't it necessary to add the list as follows which shows what assertion 
asterisks for a natural template are legal when a actual argument is a 
whole array ? "1.If n th axis of a actual argument which is a whole array 
corresponds to T(n) th axis of the template of it and j > i, T(j) must be 
larger than T(i).
2.If the situation is not described below ,no assertion about the 
distribution of the natural template of a dummy is HPF-conforming. (a) If 
the alignment of the actual array axis with its template is 
collapsed, then * should appear in the distribution for the corresponding 
axis of the natural template of the dummy. (b) If the actual array is 
aligned with the axis of its template by 
replication (or "replication-triplet") and that template axis is 
distributed * ,
then no entry should appear in the distribution for the natural template 
of the dummy.
(c) If the actual array is aligned with the axis of its template by 
int-expr and that template axis is distributed * , then no entry should 
appear in the distribution for the natural template of the dummy.
(d) If the alignment of the actual array axis with the axis of 
its template is subscript triplet l:u:s and that axis of its template 
distributed *, then * should appear in the distribution for the 
corresponding axis of the natural template of the dummy. (e) If the 
alignment of the actual array axis with the axis of 
its template is subscript triplet l:u:s and that axis of its template 
distributed BLOCK(n) and LB is the lower bound for that axis of the 
template, then BLOCK(n/s) should appear in the distribution for the 
natural template of the dummy, provided that s divides n evenly and that l 
- LB < s. Question continued:
(f) If the alignment of the actual array axis with the axis of 
its template is subscript triplet l:u:s and that axis of its template 
distributed CYCLIC(n) and LB is the lower bound for that axis of the 
template, then CYCLIC(n/s) should appear in the distribution for the 
natural template of the dummy , provided that s divides n evenly and that 
l - LB < s." (g) If the alignment of the actual array axis with the axis 
of 
its template is subscript triplet l:u:s ,s must be positive. 

Or it might be better to forbid the use of any assertion asterisks in 
DISTRIBUTE directive in case that a dummy argument doesn't have INHERIT 
attribute and the corresponding actual argument isn't ultimately aligned 
with itself since it seems that this solution makes things far simplified 
and cause little actual inconvenience (the same effect can also be 
achieved by ALIGN directive). 

Discussion: Rob replies ...
This example is nonconforming because axis 2 of B is NOT coextensive 
(one-to-one and onto mapping to) axis 1 of the template to which B is 
aligned.
>... example from original message
Now let the example be this:

>PROGRAM EX
>REAL A(10,10),B(5,10)
>!HPF$ PROCESSORS P(5)
>!HPF$ DISTRIBUTE A(BLOCK,*) ONTO P
>!HPF$ ALIGN B(*,I) WITH A(I,*)
>CALL SUB(B)
>:
>END

>SUBROUTINE SUB(BB)
>REAL BB(5,10)
>!HPF$ PROCESSORS P(5)
>!HPF$ DISTRIBUTE *(*,BLOCK(2)) ONTO *P :: BB 
>:
>END
This example is correct. The replication over the second axis of the 
template of the actual is not a problem because that is an axis whose 
distribution format is *. B is not coextensive with that axis because it 
has a one-to-many association with it, but since the template axis has a * 
distribution, coextension is not a requirement. Is this reasonable?
Rob Schreiber
=============meeting discussion
CCI #8 ...Semantics of a forall are to evaluate all rhs and then store in 
all lhs, but if the assignment operator is user defined this is under user 
control, not compiler control. The user's definition might make the order 
of assignment important. Guy queried how an array assignment was handled 
in this case. Jerry will take this question to X3J3 about the status with 
respect to elemental functions. Guy pointed out that for WHERE F90 forbids 
the defined assignment. Chuck presented a proposal that sounded promising: 
that the evaluation is as if the rhs were assigned into a temp using the 
defined assignment operator and then the lhs is a direct copy of the 
already evaluated values. There was discussion of issues like the type of 
the temp (same as lhs or rhs?). Action Chuck, Guy, Henry, Jerry to 
circulate proposed wording for this definition.
======================

NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT?
===================================================== 
===================================================== Item # 10
Kenth Engo
01/11/95
status: in progress
NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT?
who has action item on this?
Title: Permutations in HPF
Group D
Updated: 07/12/95

Question:
I have a general question about the way HPF deals with permutations of 
data on the different parallel architectures. 

In many applications in MIMD and SIMD computations today, one often 
encounter the need to just perform a permutation of the data distributed 
on the parallel computer, i.e. a one-to-one mapping of the data set onto 
itself. It is then very important that the routing of the data is done in 
such a fashion that traffic contetions are eliminated in the 
interconnecting network. One does not want a situation where 10 processors 
are communicating data to the very same processor. This would make all but 
one processor idle, since no more than one processor is allowed to 
communicate with the destination processor at the same time.

What I have in mind is examplified by the following HPF code: 
C 8 PROCESSORS AND AN ARRAY OF 32 ELEMENTS C
!HPF$ PROCESSORS SEDECIM(8)
REAL CENTURY(32)
C
C THE ARRAY IS DISTRIBUTED BY BLOCK.
C
!HPF$ DISTRIBUTE CENTURY(BLOCK) ONTO SEDECIM C
C THE ELEMENTS ARE DISTRIBUTED WITH ELEMENTS 1,2,3,4 ON PROC
C #1, ELEMENTS 5,6,7,8 ON PROC. 2 AND SO ON. 


Suppose one want to redistribute the elements during execution to the 
following arrangement

C DATA REDISTRIBUTED CYCLIC ON THE PROCESSORS C
!HPF$ REDISTRIBUTE CENTURY(CYCLIC) ONTO SEDECIM C
C THE ELEMENTS ARE NOW REDISTRIBUTED WITH ELEMENTS 1,9,17,25 ON
C PROC #1, ELEMENTS 2,10,18,26 ON PROC. 2 AND SO ON. 

During this redistribution a permutation of the data set is performed. How 
is this permutation implemented, and what is actually done. Is there a 
strategy/theory for generally doing this permutation optimally, that is; 
without any traffic contetions in the network? 

I will be grateful if someone could answer this email, and possibly send 
or give me references to literature or people where I can find out more 
about how HPF implements the permutations. 


Best regards,

Kenth Engo


Notes from May meeting
CCI # 10 - not cci but a request for implementation practice. 

Nothing recorded for July.

===================================================== 
===================================================== Item # 11
Henry Zongaro
02/16/95
status: in progress
action: needs more research
Title: pointer with sequence
Group D
Updated: 8/14/95

Question:
Hello,

Things have been quiet here lately, so I thought I'd send a few questions
that I've been hoarding. All page and line references are relative to the 
1.0 HPF Language Spec.

I'd like to hear other people's opinions on these, especially the 2nd and 
3rd items.

1) The response to CCI item 6.3 indicated that variables with the POINTER 
attribute must be distributed or aligned. A related question - can they be 
given the SEQUENCE attribute? Or can a pointer be associated with both 
sequential and nonsequential targets? 

------------
Meeting minutes: needs more research
========
from July meeting - subgroup proposal:
conforming dimensions of the pointer object and target either must be 
(unmapped or both be identcally distributed) or both must be sequential. 


The issue is ... does the pointer exist as an instance by itself, or is it 
just assocated with its bound arg. There is a case of pointers to sections 
of arrays, so just knowing that a pointer goes a block distributed thing, 
one can't talk about pointers to section. Andy Meltzer recalls that was 
related to the fact that allocatable objects don't have their distribution 
until after they are assigned.
A straw poll was taken with the special understanding that a substantial 
vote for "abstain" would mean reconsider. The vote was 7-3-10, so this CCI 
item is returned to committee for further clarification of the issue. 


================================================= 
================================================= Item# 17
A.C. Marshall
5/03/95
status: in progress
NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT?
action: needs more research
Title: Defaults for distribution
Group D
Updated: 07/11/95

Question:
Forgive me for being dim (and only having v1.0 of the draft standard) 
but...

Looking at the syntax rules for DISTRIBUTE (p24 & 26) it would appear to 
me that:

!HPF$ DISTRIBUTE A(BLOCK) !H303/5/8
!HPF$ DISTRIBUTE ONTO P :: A !H301/2/6/10 

are valid but that

!HPF$ DISTRIBUTE A
!HPF$ DISTRIBUTE :: A

are not. Is this just me or is this how things are supposed to be, and if 
so why is it not possible to use default distribution and processor grid 
in the same statement, after all

!HPF$ PROCESSORS P(NUMBER_OF_PROCESSORS()) !HPF$ DISTRIBUTE ONTO P :: A

is valid and has the same effect.

Adam Marshall


notes from May meeting: needs more research: ----
Scott Baden and Chuck Koelbel reply 07/07/95 


This is the way things are "supposed to be" Consult the relevant text 
(page 30, line 20-21, v. 1.1) 


... To prevent syntactic ambiguity, the dist-format-clause 
must be present in the statement form [of a distribute spec] 

Chuck Koelbel adds that the "syntactic ambiguity" referred to here is due 
to the problem of non-significant blanks in Fortran: 


Consider

>!HPF$ DISTRIBUTE PRONTO ONTO LOGY
>!HPF$ DISTRIBUTE PR ONTO ONTOLOGY


or the following example:

>!HPF$ ALIGN TWITHEADS WITH A
>!HPF$ ALIGN T WITH EADSWITHA

>Disallowed for the same reason.

Chuck also continues:


>It's not clear that his example has "the same effect". For example, 
consider

>!HPF$ PROCESSORS P(NUMBER_OF_PROCESSORS()) !HPF$ PROCESSORS 
Q(4,NUMBER_OF_PROCESSORS()/4) !HPF$ DISTRIBUTE :: A
>Does A have a 1-dimensional or 2-dimensional distribution? (Yeah, this 
assumes that NUMBER_OF_PROCESSORS() is divisible by 4...) 


>The reason for requiring at least one of the clauses is, "What 
information
>are you giving if you leave both out?" In effect, 

>!HPF$ DISTRIBUTE :: A

>would be a no-op, and we didn't think there was a need for that. 


Scott Baden
Chuck Koelbel


======================

NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT?
===================================================== 
===================================================== Item # 19
Henry Zongaro
05/04/95
Status: in progress
NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT?
Action: needs more research
Title: Mapping function results
Group D
Updated: 7/13/95

I have a couple of questions related to specification of mappings for 
function results.
1) Consider the following program fragment 
program prog
interface
function f()
integer f(100)
!hpf$	processors p(number_of_processors())
!hpf$	distribute f(block) onto p
end function f
end interface

call sub(f())
end program prog

subroutine sub(i)
integer :: i(100)
!hpf$ processors p(number_of_processors()) !hpf$ distribute i *(block) 
onto *p
end subroutine sub
Is the above HPF conforming? Does distribution of a function result 
variable affect the distribution of the expression returned? The text on 
page 53, lines 27-28 indicates that the alignment of an expression is, in 
general, unpredictable, except in the case of arrays and array sections, 
so I believe the answer to my question is "No". However, this is actually 
spurred by another question relating to the SEQUENCE directive. 

According to page 151, line 47 of the 1.1 HPF Spec., an <association- 
name> can be a <function-name>. When I first read this, I thought the 
explicit reference to <function-name> was there to include result 
variables. Now a co-worker has suggested an alternate interpretation, and 
we were wondering which is correct. Her suggestion was that this is trying 
to allow something like the following: 
program p
integer, external :: f
!hpf$ sequence :: f

i = f()
end program p
Is this correct? Will this make the result of the function sequential? If 
so, that brings up another question:

program p
interface
function f()
integer :: f(10)
!hpf$	sequence :: f
end function f
end interface

call sub(f())
end program p

subroutine sub(a)
integer a(2, 5)
!hpf$ sequence a
end subroutine sub

According to page 155, lines 33-35, an array valued expression cannot be 
specified to be sequential, but if specifying f to be a sequential 
function makes its result value sequential, this would be a contradiction. 
----------------

May meeting minutes:
needs more research
======
Nothing recorded from July meeting.


===================================================== 
===================================================== Item # 21
Michael Hennecker
05/11/95
Status: in progress
NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT?
Title: Question about derived type mappings and documentation. Group D
Question:
Hello,

I have some questions regarding data mapping of objects of derived type:

(1) Is it possible to DISTRIBUTE / ALIGN objects of derived type, 
or are the data mapping attibutes restricted to intrinsic types? 

(2) If mapping of objects of derived type is not possible, shouldn't the 
v1.0 and v1.1 specs for HPF_ALIGNMENT, HPF_DISTRIBUTION and HPF_TEMPLATE

"ALIGNEE	may be of any type." (5.7.15, 5.7.16)
"DISTRIBUTEE may be of any type." (5.7.17) 

read "may be of any intrinsic type." ?

Best regards,
Michael


Reply by CHK 5/15/95
...
It is possible to map objects of derived type. (It is currently not 
possible to map components of derived type objects; this is being 
discussed in the HPFF 95 meetings.) 

Second question is moot, given the first answer. 

Thanks for asking.

Chuck Koelbel
-------
Nothing recorded from July meeting.


===================================================== 
===================================================== Item # 22
Henry Zongaro
6/15/95
Status: in progress
NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT?
Title: Implementor note about processor distributions? Group D

Question:
Hello,

We came across something that didn't seem immediately obvious here, and 
might not be immediately obvious to others, so we were wondering whether a 
note to users and/or implementers might be justified.

On page 30 of the 1.1 Language Spec., it's stated that if the ONTO clause 
of a DISTRIBUTE directive is omitted, an arbitrary processor arrangement 
is chosen for each distributee. In some cases, there may be no suitable 
arrangement; I assume such a program would not be HPF- conforming. For 
example,

program p
integer :: a(10, 10)
!hpf$ distribute a(block(5), block(5))
end program p

Here, the processor arrangement created would have to have an extent of at 
least two in each dimension (which, by the way, constrains how arbitrary 
the selection of a processor arrangement can be), so this program could 
not run in an environment in which the number of processors was fewer than 
four.

Does a note seem worthwhile here, or do others feel such a case is 
immediately obvious?

Thanks,

Henry


--------
Nothing recorded from July meeting


===================================================== 
===================================================== Item # 26
Fabien COELHO
07/01/95
Status: in progress
NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT?
Title: Conditional realignment
Group D

Question:
Hi out there,

Is this kind of thind of thing allowed in HPF ? 

! align A with T
if (some runtime condition)
! realign A with T'
endif
! redistribute T ...

after the redistribution, array A mapping is not known. It depends on the 
runtime condition. I cannot remember anything that may forbid this. I 
guess it is not very nice for the compiler... 
should/could be forbidden ? Or I may be wrong ? 

Fabien.

Chuck Koelbel replies 7/3/95:
Yes, this is allowed. This was the intent of HPF REALIGN and REDISTRIBUTE
- to allow the user to make run-time decisions about data mapping. 
Allowing run-time remapping will indeed require substantial support in the 
run-time system. We discussed this tradeoff, and the concensus was that 
users had valid reasons for wanting this capability, therefore it should 
be in the language. The difficulties with implementation were one reason 
that REALIGN and REDISTRIBUTE were not put in Subset HPF.

In short, you are right that this is legal and hard to implement. You are 
wrong that it is/should be forbidden.

Chuck

--------
Nothing recorded from July meeting


===================================================== 
===================================================== Item # 28
Larry Meadows
7/25/95
Status: in progress
Title: combined directive ordering
Group D
Updated: 8/14/95

Question:
LOST ORIGINAL MESSAGE !!! Ack!
Here's what was addressed at the July HPFF meeting. 

The syntax of combined directives allows: 
ALIGH WITH A(*,:) :: B(:)
Is this a shape-spec-list for array-decl or array-spec for alignment? It 
is confusing to users.


comment from Pres
comment from Pres
But (!!!) I surely would _not_ want the discussion of the topic that Larry 
introduces to slip further into more demands for "orderings" of directives 
and declarations as per the discussion of CCI #12 

========
Notes from July meeting:
Proposal: Add syntax rules to allow shape-list in combined directives only 
with template name and processor names 

ALIGN (:) WITH A(*,:) :: B

The full group recommended that the subgroup come back with a more 
specific proposal..

======== proposed text =====

On page 24, make the following changes:
Line 6, chage "entity-decl-list" to "hpf-entity-decl-list" 

After line 14, add:
"H303 hpf-entity-decl is hpf-entity [(explicit-shape-spec-list)] 
H304 hpf-entity is object-name
or template-name
or processors-name"

After line 20, add:
"Constraint:: If an explicit-shape-spec-list appears, hpf-entity must be a 
template-name or processors-name."

On lines 34-35, change two occurrences of "object-name" to "hpf- entity"

On line 35, before "If both" insert "If an explicit-shape-spec-list 
appears, hpf-entity has the dimension attribute." 

===================================================== 
===================================================== Item # 30
Rob Schreiber
8/3/95
Status: new
Title: collapsing dimensions
Group D

Question:
Question II. This is really a question for implementors, as much as a 
question for language lawyers.
Consider this program

real a(100,200), b(100)
!hpf$ distribute a(block, *)
!hpf$ distribute b(*)

forall (row = 1:100) a(row,:) = f(a(row,*)) ...
pure function f(x)
real x(:)
real f(size(x))
!hpf$ distribute *(*) :: x
!hpf$ align f(:) with x(:)

I cannot find any rule against this. (The issue is whether one may 
distribute an r dimensional object with fewer than r instances of BLOCK or 
CYCLIC(k) in its dist format list.) What is the mapping of b? And should 
one be allowed to describe the mapping of x in this manner, or must one 
use the more cumbersome and specific:pure function f(x, row)
integer row
real x(:)
real f(size(x))
!hpf$ template t(100,200)
!hpf$ align f(:) with x(:)
!hpf$ align x(:) with t(row, :)
!hpf$ distribute t(block, *)
-----------
Hello, (from Rob)
Pres asked me to amplify my previous CCI request concerning the directive 
distribute x(*)

Here is some additional commentary:

The key issue is to let the compiler know what's going on when a 
subroutine is passed an "on one processor only" section of a distributed 
array; the call site is probably in a forall or independent loop. The 
obvious syntax is to say, prescriptively: 
!hpf$ distribute dummy_arg(*)
or descriptively:
!hpf$ distribute dummy_arg*(*)

I was surprised that this is allowed by the HPF syntax: if this 
distribution is specified by the program for an array, that is not a dummy 
arg, I don't know what to make of it. Would it mean to replicate the 
array? To store it on one processor of the compiler's choice? To store it 
on the "front-end? In shared memory? 

I think a reasonable proposal would be as follows: 
-------------------------------------------------------------------------------------- 

In a (re)distribute directive, the number of non-* (i.e. block and 
cyclic[(k)]) entries in the dist-format-list must ordinarily be at least 
one, and must be the same
as the rank of the processors arrangement in the ONTO clause, if present.
If, however, the distributee is a dummy argument, then, if the distribute 
directive is descriptive, the requirement of at least one non-* entry in 
the dist-format-list is waived. Thus 
real dummy(:,:)
!hpf$ distribute dummy *(*,*)
Is valid for a dummy argument; it asserts that the actual argument will be 
distributed on a single processor.
(continued)

(continued clarification)
-------------------------------------------------------------------------------------- 
((continued clarification)
-------------------------------------------------------------------------------------- 
(Advise to language designers:)
It's quite likely that a section of a processors arrangement will be 
allowed in the ONTO clause of (re)distribute. In that case, one could also 
use the following

subroutine act_on_local_info(dummy, iproc) real dummy(:,:)
!hpf$ processors all_procs(number_of_processors()) !hpf$ distribute dummy 
*(*,*) onto all_procs(iproc) 

This would be appropriate in the following contexts: 

program main
real actual_2d(8,16), actual_wide_2d(8,32), actual_3d(8, 16, 10) $hpf$ 
processors procs(8)
!hpf$ distribute (*, block) onto procs :: actual_2d, actual_wide_2d !hpf$ 
distribute (*, block, *) onto procs :: actual_2d, actual_wide_2d 

!hpf$ independent
do j = 1, 1
call act_on_local_info( actual_2d(:, j:j), (j+1)/2 ) ! dummy shape is 
(8,1) call act_on_local_info( actual_wide_2d(:, 2*j-1:2*j), (j+1)/2 ) ! 
dummy shape is (8,2)
call act_on_local_info( actual_3d(:, j, :), (j+1)/2 ) ! dummy shape is 
(8,10) enddo
-------------------------------------------------------------------------------------- 
The alternative to this, as far as I can tell, is to make the programmer 
align the dummy to a template, as follows: 

subroutine act_on_located_info(dummy, iproc) real dummy(:,:)
!hpf$ processors all_procs(number_of_processors()) !hpf$ template, 
distribute onto all_procs :: all_temp(number_of_processors())
!hpf$ align *(*,*) with all_temp(iproc) :: dummy 

program main
real actual_2d(8,16), actual_wide_2d(8,32), actual_3d(8, 16, 10) $hpf$ 
processors procs(8)
!hpf$ distribute (*, block) onto procs :: actual_2d, actual_wide_2d !hpf$ 
distribute (*, block, *) onto procs :: actual_2d, actual_wide_2d 

!hpf$ independent
do j = 1, 16
call act_on_located_info( actual_2d(:, j:j), (j+1)/2 ) ! dummy shape is 
(8,1) call act_on_located_info( actual_wide_2d(:, 2*j-1:2*j), (j+1)/2 ) ! 
dummy shape is (8,2)
call act_on_located_info( actual_3d(:, j, :), (j+1)/2 )	! dummy shape is
(8,10)
enddo

-- Rob
===================================================== 
===================================================== Item # 32
Henry Zongaro
8/31/95
Status: new
Title: Changing distribution of SAVE array Group D

Question:
Hello,

Page 41, lines 21-34 of the HPF 1.1 document specifies that an array or 
template must not be distributed on a processor arrangement at the time 
the arrangement becomes undefined, unless the array or template also 
becomes undefined or the processor arrangement always has identical bounds.

Presumably this was done so that objects with the SAVE attribute would not 
change mappings from one call to the next. Is this rule sufficiently 
strict? Consider the following: 

program p
call sub(5)
call sub(10)
end program p

subroutine sub(n)
integer, save :: a(10)
!hpf$	processors proc(2)
!hpf$	distribute a(block(n)) onto proc
end subroutine sub

In the first call to sub, a is distributed block(5); in the second call, 
it is distributed block(10). This is currently permitted because the 
bounds of the processor arrangement have not changed. 

More complicated examples could be drawn involving ALIGN. 

Similar text appears on page 44, line 43 - page 45, line 3 for templates. 

Comments from Rob 9/1/95

We should tighten the language to prevent this. It's a way to remap, which 
should be done only if the object remapped has the dynamic attribute, and 
only via executable directives. I believe this applies to objects in 
modules, for example:

module mod
real a(10)
end module mod

subroutine sub(n)
use mod
!hpf$ processors procs(2)
!hpf$ distribute a(block(n)) onto procs
end subroutine sub

This should be proscribed; if redistribute is used and if A is dynamic, 
however,
I think it's legal. Right?

===================================================== 
===================================================== Item # 34
Adriaan Joubert
9/06/95
Status: new
Title: Alignment of a single dimension
Group D

Question:
Hello,
I am trying to align one dimension of two 2-dimensional arrays, and cannot 
find a way of expressing this in HPF. The problem is the following

PROGRAM MAIN
REAL, ALLOCATABLE :: A(:,:), Res(:,:)
!HPF$ DISTRIBUTE A(BLOCK,*)
!HPF$ DISTRIBUTE Res(BLOCK,*)
...
ALLOCATE(A(N,M))
ALLOCATE(Res(N,M*2))
Res = SUB (A)
...
CONTAINS
FUNCTION SUB (A) RESULT(B)
REAL, INTENT(in) :: A(:,:)
!HPF$ DISTRIBUTE *(BLOCK,*) :: A
REAL :: B(SIZE(A,1),SIZE(A,2)*2)
!HPF$ DISTRIBUTE (BLOCK,*) :: B
...
END FUNCTION SUB
END PROGRAM MAIN

So the 1st dimension of B and A in the subroutine will be distributed in 
the same way. It seems however that compilers can generate faster code, if 
they know how arrays are aligned with one another. Among other things the 
compiler would have to know that B is exactly aligned with Res. In other 
words I would like to add to the main program something like

!HPF$ TEMPLATE :: MYTemp(N,M*2)
!HPF$ DISTRIBUTE MyTemp(BLOCK,*)
!HPF$ ALIGN WITH MyTemp(:,I) :: A(:,I)
!HPF$ ALIGN WITH MyTemp :: Res
and in the subroutine
!HPF$ TEMPLATE :: MYTemp(SIZE(A,1),SIZE(A,2)*2) !HPF$ DISTRIBUTE 
MyTemp(BLOCK,*)
!HPF$ ALIGN WITH *MyTemp(:,I) :: A(:,I)
!HPF$ ALIGN WITH MyTemp :: B

and then the compiler should be able to figure out that everything is 
nicely aligned. But I cannot do this, as I do not know N and M at compile 
time.

In this case I could probably get away with definitions 
!HPF$ ALIGN WITH A(:,*) :: Res(:,*)
in the main program and
!HPF$ ALIGN WITH A(:,*) :: B(:,*)
in the subroutine. The replication of the second dimension would not 
matter, as it is all on the same processor. 

But if I have a (BLOCK,BLOCK) distribution for both arrays, but I still 
want to ensure that all elements in the first section of every column are 
on the same row of processors, i.e.
REAL A(20,20), B(20,40)

P1: A(1:10,1:10)	P2: A(1:10,11:20)
B(1:10,1:20)	B(1:10,21:40)

P3: A(11:20,1:10)	P4: A(11:20,11:20)
B(11:20,1:20)	B(11:20,21:40)
there seems to be no way of telling the compiler about this with a 
descriptive statement.
(continued under discussion)

(question continued)
Well, what about
!HPF$ ALIGN WITH A(:,I) :: B(:,I*2-1)
!HPF$ ALIGN WITH A(:,I) :: B(:,I*2)
But is this legal? And this can be harder to do if the second dimension is 
(SIZE(A,2)-1)*SIZE(A,2)/2, as in my case. 

I'd appreciate any help on this one. Understanding the distribution 
directives seems to get harder the more you know, instead of easier ;-( 
Adriaan
=========
Rob comments 9/6/95
I see you want allocatable templates, a hole in HPF that we know about. 
But in your case, no template is needed. There is no reason to specify 
replicationin your alignment. You can use 
!HPF$ ALIGN A(I,J) WITH RES(I,J)
and drop the distribute directive for A in the main program. In the 
function, you can use a distribute on B and a descriptive alignment of A 
to B.
.....
..."But is this legal? And this can be harder to do if the second 
dimension is
(SIZE(A,2)-1)*SIZE(A,2)/2, as in my case." ... No, it's not legal. Align 
is not a symmetric relation, and the alignment map cannot be many to one 
unless it collapses a dimension; at least that's my understanding. In case 
this is not clear, your statement is equivalent to
!HPF$ ALIGN B(:,2*I-1) WITH A(:,I)
which only explicitly aligns the odd numbered columns of B! But why align 
the bigger of the two arrays (B) with the smaller (A)? My version of SUB 
would be:
FUNCTION SUB (A) RESULT(B)
REAL, INTENT(in) :: A(:,:)
REAL :: B(SIZE(A,1),SIZE(A,2)*2)
!HPF$ ALIGN *(I,J) WITH B(I,J) :: A
!HPF$ DISTRIBUTE B (BLOCK,*)
...."I'd appreciate any help on this one. Understanding the distribution 
directives seems to get harder the more you know, instead of easier ;-( 
"...
Yup. -- Rob
=======
>From Adam M. 9/5/95
As I understand it Adriaan Joubert wants to tell HPF to set the data up as
follows:
REAL A(20,20), B(20,40)
P1: A(1:10,1:10)	P2: A(1:10,11:20)
B(1:10,1:20)	B(1:10,21:40)
P3: A(11:20,1:10)	P4: A(11:20,11:20)
B(11:20,1:20)	B(11:20,21:40)
I would say (like Rob Schreiber) that you include, in Main, the line 
!HPF$ ALIGN A(I,J) WITH B(I,J*2-1)
or equivalently
!HPF$ ALIGN A(:,J) WITH B(:,J*2-1)
PLUS (distribute them together)
!HPF$ DISTRIBUTE B (BLOCK,BLOCK)
And I would claim that the procedure should be coded as follows: 
FUNCTION SUB (A) RESULT(B)
REAL, INTENT(in) :: A(:,:)
REAL :: B(SIZE(A,1),SIZE(A,2)*2)
!HPF$ ALIGN *(I,J) WITH B(I,J*2-1) :: A
!HPF$ DISTRIBUTE *B (BLOCK,BLOCK)
If I am wrong - why? - Adam


===================================================== 
===================================================== 


********************************************************* GROUP E ACTIVE 
ITEMS and see also Item #29 in Group C 
******************************************************** 

Item # 31
Rob Schreiber
8/4/95
Status: new
Title: number of processors
Group E

Question:
We had a recent discussion of the question "What should 
number_of_processors()" return when the machine consists of <describe your 
own bizarre and nonhomogeneous architecture here>.

Let's say the machine is 2 SMPs (what does SMP stand for?) with 4 
processors and a common memory, each.

In my view, the "right" answer in this case is probably 8. Not 2. Not (/2, 
4/).
That's because I want the following code 

real x(1000, number_of_processors())
!hpf$ independent
do i = 1, number_of_processors()
call embarrassingly_parallel( x(:, i) )
enddo

to run in parallel, with one iteration per processor, on all the available 
processors.

===================================================== 
===================================================== Item # 33
Henry Zongaro
8/31/95
Status: new
Title: Query of function result distribution Group E
A second question I have relates to CCI item #20. The HPF_LOCAL_LIBRARY
procedures all specify that the ARRAY argument must be a local dummy 
argument associated with a global HPF actual argument. It would seem to me 
to be desirable to be able to call these routines with a function result 
as well.
For example,

program p
interface
extrinsic(hpf_local) function f()
integer :: f(100)
!hpf$	processors proc(number_of_processors())
!hpf$	distribute f(blohanks,

Henry
end function f
end interface
end program p

extrinsic(hpf_local) function f()
integer :: f(100/number_of_processors()), g_index 

! Would be nice to be able to call local_to_global to determine ! which 
elements of local result correspond to which elements ! of global result
call local_to_global(f, (/1/), l_index)
end function f

Any opinions?

Thanks,

Henry


Comment from Rob 9/1/95

Sure.

===================================================== 
===================================================== PREVIOUSLY ANSWERED 
ITEM REFERRED TO IN CCI 33 Item # 20
Henry Zongaro
05/04/95
Status: waiting for text
Action: needs more text in document
Title: Declaring array-valued functions
Group E
Updated: 7/13/95

Question:

2) This second question is one that Rich Shapiro was asking towards the 
end of the October, 94 HPFF meeting. Consider the following program. 

program prog
interface
extrinsic(hpf_local) function f()
integer f(10)
!hpf$	processors p(number_of_processors())
!hpf$	distribute f(block) onto p
end function f
end interface

print *, f()
end program prog

extrinsic(hpf_local) function f()
integer f(?)
end function f

How should f be declared in the local subroutine? If this program runs on 
a number of processors which divides 10, the answer is simple. But what if 
the program is run on four processors? Three processors have three 
elements each, but the fourth has one element. There are ways in which you 
could specify this. However, there's no way for the user to know *which* 
processor is the one which will have one element of the result variable. 
Dummy array arguments of HPF_LOCAL procedures don't have this problem, 
because they must be assumed shape. Perhaps function results have to be 
distributed in such a way that the same number of elements of the result 
will be mapped onto all processors.

A related problem: how can an array valued function be specified in an
HPF_LOCAL module?

extrinsic(hpf_local) module mod
contains
function f()
integer f(?)
!hpf$	processors p(number_of_processors())
!hpf$	distribute f(block) onto p
end function f
end module mod

The declaration of f here has to do double duty. It has to provide the 
shape of the result of f for any reference to the function, and it has 
describe the result variable mapped onto a single particular processor. 
Can array valued functions be permitted to appear in HPF_LOCAL modules?

Thanks,

Henry


Subgroup report: - only thing group e could com up with was that: 7 a 
"local" function can only return a scalar 7 the corresponding extrinsic 
function return value must be either a scalar or a rank one array of size 
number_of_processors() 7 Anything more general would not be cci ... and 
what to do is not obvious.
====

CCI #20 - this poses a non-trivial problem about how the size of a 
function result is known and / or declared. Fortran 90 doesn't have any 
way of saying that the result is "assumed shape/size". After extended 
discussion, the subgroup decided to propose that a local function can only 
return a scalar, and that the corresponding extrinsic function return 
value must be either a scalar or an array that is made up from the scalars 
returned by the locals. In full group discussion, Guy pointed out that it 
is the DISTRIBUTE directive that is the problem - it lies. There was a lot 
of discussion, with a basic trend that there should be no change, but 
rather explanatory text in the document. To bring the issue to close there 
was a vote on the proposal for no change, but with extra text - (choosing 
whatever reason among several that the individual liked best for no 
change)/ This passed 14 - 1 - 6.