Active CCI list - as of 10/23/97 ----------------- 4 new or pending items for Group C *lots* of pending items for Group D because I don't have record of how things have been formally resolved, except for those few items reported in full meeting. 1 new item for Group E clear at the end. -mary- =========================== CCI # 18 Defined assignment in FORALL Group C Submitted: 05/04/95 Updated: 10/23/95 Submitted by: Henry Zongaro S tatus: in progress Maybe Resolved. CHK will provide clarification in HPF 2 document. It will be a change to the "sequentialization" of the forall to account for defined assignments. BUT Henry suggests alternative based on new X3J3 action. Original question: I was wondering whether there's not a problem with allowing defined assignment to appear within a FORALL. Consider the following example. module mod integer :: a(3) = (/1,2,3/) contains pure subroutine def_assign(lhs, rhs) integer, intent(inout) :: lhs character, intent(in) :: rhs lhs = a(ichar(rhs)+1) end subroutine def_assign end module mod program p use mod interface assignment(=) module procedure def_assign end interface forall (i = 1:2) a(i) = char(i) ! Sneaky way of passing ! "i" to def_assign end program p The rules of forall specify that the right-hand side and the indices of the left-hand side are evaluated, in any order, prior to assignment, which also takes place in any order. In the above example, we have a(1) = char(1) a(2) = char(2) as the two defined assignments which take place. Inside of def_assign, there's a host-associated reference to a, so what ends up happening is the following: a(1) = a(2) a(2) = a(3) The order in which these assignments occurs affects the result. The value of a after the forall statement is executed could be (/2,3,3/) or (/3,3,3/). Basically, the problem is that in defined assignment, completely evaluating the right-hand side for all active combinations does not necessarily let the compiler precompute everything which might also appear on the left-hand side. Thanks, Henry DISCUSSION AT MEETING: y, Henry, Jerry to circulate proposed wording for this definition. No action from the July meeting was recorded ... was this resolved or not? ----- Rob's recollection: Resolved. CHK will provide clarification in HPF 2 document. It will be a change to the "sequentializatio" of the forall to account for defined assignments. ----- new note from Henry X3J3 is taking a different approach on CCI 18. They're trying to prohibit references in the procedure that defines the assignment to the variable that appears on the left-hand side of the defined assignment. I believe WG5 will decide on this in their November meeting. We might want to pick up whatever they decide. One advantage of the estriction is that no extra compiler mechanisms are required for this somewhat obscure case. ===================================== ===================================== CCI# 36 Independent and remapping Group C Larry Meadows Submitted: 10/7/95 status: new Question: In Section 4.4 of the standard, one of the conditions on INDEPENDENT is that realignment and redistribution cannot occur, since they may change the processor storing a particular array element. I would argue that the same reasoning applies to remapping of arguments in subroutines called inside of INDEPENDENT DO loops. For example: !hpf$ distribute a(block) !hpf$ indepedent do i = 1,n call sub(a) enddo subroutine sub(a) real a(:) !hpf$ distribute a(cyclic) ... return end >From an implementation point of view, remapping of arguments is a collective operation, just as is realignment or redistribution, so is difficult to implement inside INDEPENDENT DO loops. Couple of other points: - Would be nice to have some examples in 4.4.1 where subroutines were called, legally and illegally. (see cci #37 for other point) Thanks. lfm ===================================== ===================================== CCI #37 Status of locals in procedures called inside Independent DO Group C Larry Meadows Submitted 10/7/95 Status: new Question: - I seem to recall that we decided that local variables of subroutines called inside INDEPENDENT DO loops were automatically NEW (and I assume that this doesn't apply to SAVE or COMMON variables). Is this documented anywhere? Thanks. lfm ===================================== ===================================== CCI #29 Calling hpf_local from independent loop Group C Updated: 10/23/95 Rob Schreiber Submitted: 8/3/95 Status: in progress QUESTION: I have two CCI questions. Question I. Can an extrinsic(hpf_local) be invoked in an independent loop? In a Forall? Ex: Forall (1 = 1:10) a(i) = f(i,a(i)) Note that part of the calling sequence, as specified in Ver 1.1, appendix A, is "The processors are synchronized. In other words, all actions that logically precede the call are completed." It seems clear that when this was written it was tacitly assumed that the call did not occur in an independent loop or forall. Part 2: May ony other kind of extrinsic be called in a forall or independent loop? ================ Discussion begins here: Summary provided by Rob S. Status: Under discussion. Summary of the discussion in September: 1. Only and HPF type routine can be PURE. Thus, only and HPF_LOCAL routine could be local, pure, and hence called in a forall. 2. There is no semantic problem with this, or with invocation of any extrinsic routine in an independent loop. 3. An example real x(N, 100000) !hpf$ distribute x(*, block) !hpf$ independent do i = 1, N call extrinsic_fn(x(i,:)) enddo The independent loop is a mechanism for spawning N threads per processor, each independent of the others; the load per thread may be variable. It would possibly be useful here to NOT synchronize after each call to the extrinsic routine! Is there any semantic reason to force the synchronization? 4. There are some very important issues in the implementation, with possible language impacts. Let us assume an MPI based implementation of HPF calling an extrinsic local routine that uses MPI for communication. Because of the unity of purpose and of hotel between HPF and MPI, it is arguably necessary for HPFF to make this work cleanly and efficiently. Issue MPI-1: Does HPF handle MPI_Init and MPI_Finalize, automatically. Issue MPI-2: Are any of the MPI routines PURE? Issue MPI-3: Thread safety. In a naive implementation, HPF does a barrier before and after the call to the extrinsic; but there is no guarantee that there are no outstanding, nonreceived messages in the messaging system. Thus, to be safe, any extrinsic routine should use its own communicator. To prevent interference between separate calls to the routine, a new communicator should becreated for every call. An obvious way to do this is to call MPI_Comm_Dup(MPI_COMM_WORLD, New_comm) at the beginning of every such extrinsic. However, an extrinsic that consumes all its messages would be justified in doing this once, on its first invocation, and saving the communicator for reuse on later invocations. But consider what happens if the extrinsic is called in an independent DO loop as in the example above, and there is no barrier used. Now we really need a separate communicator per thread. On the other hand, a call to MPI_Comm_Dup is a collective call, which synchronizes the processes. Perhaps this should be done by the calling HPF routine, so that the MPI_COMM_WORLD communicator is different on every call. Issue MPI-4: If called from the range of an ON_HOME directive, what set of processors does MPI_COMM_WORLD correspond to? If it corresponds to the subset executing the ON block, then how can the called routine access nonresident data? Should there be a way to access a communicator that corresponds to these executing processors, while MPI_COMM_WORLD always corresponds to all of the processors? Issue MPI-5: If called from separate ON_HOME blocks in the scope of a TASK directive, with disjoint processors groups, so that the two ON blocks may be executed concurrently, what communicators correspond to the two processor groups? (If, in issue 4 above, the answer is that MPI_COMM_WORLD corresponds to the executing subset of the processors, then the answer here is MPI_COMM_WORLD.) =========== comments from Jim Cownie ================= > Issue MPI-1: Does HPF handle MPI_Init and MPI_Finalize, automatically. I would say that the HPF run-time should have called MPI_Init before any user code has run, therefore user extrinsic functions which need MPI can just use it. This is actually not a big deal, since the user routine can always do use MPI_Initialized() to guard her call to MPI_Init. (Though if this is done, then the HPF run-time needs to do the same, since MPI_Init should only be called once.) That's why it's simpler to say that MPI_Init has already been called before any user HPF code has run. > Issue MPI-2: Are any of the MPI routines PURE? Probably. For instance one could cast reductions as functions which return the result, and only read the arguments (though why you'd want to use an MPI reduction extrinsically rather than an HPF one is beyond me). ... after all 5 MPI issues ... I would suggest that in all of these cases the HPF run-time should provide a "current communicator" which includes the set of processes running the current construct. In some cases this will be MPI_COMM_WORLD (or a Comm_Dup of COMM_WORLD), in others (ON_HOME, task parallelism, processor subsets) it will represent a subset of the available processes. In MPI MPI_COMM_WORLD is always available as the set of all processes (until MPI-2 introduces a dynamic process model, though that shouldn't worry HPF implementations). Therefore I think 1) MPI_COMM_WORLD is *always* the set of all processes. (This is the current MPI view). 2) If you need subsets then you should create new communicators, and provide a way for the user code to access them. MPI_COMM_WORLD should mean the same thing in a routine called from HPF extrinsic as it did in a "raw" MPI program. The HPF extrinsic MPI environment should contain additions to the raw MPI environment (new communicators, maybe pre-defined datatypes giving array distributions, etc), but should not change the meaning of things in the raw MPI world. In other words you may need to learn more to work in the HPF extrinsic environment, but you shouldn't have to unlearn things you already knew about MPI. -- Jim ===================================== GROUP D ACTIVE ITEMS ===================================== CCI #11 Pointer with sequence Group D Updated: 10/03/95 Henry Zongaro Submitted: 02/16/95 Status: in progress Question: Hello, Things have been quiet here lately, so I thought I'd send a few questions that I've been hoarding. All page and line references are relative to the 1.0 HPF Language Spec. I'd like to hear other people's opinions on these, especially the 2nd and 3rd items. 1) The response to CCI item 6.3 indicated that variables with the POINTER attribute must be distributed or aligned. A related question - can they be given the SEQUENCE attribute? Or can a pointer be associated with both sequential and nonsequential targets? Discussion: Meeting minutes: needs more research ======== from July meeting - subgroup proposal: conforming dimensions of the pointer object and target either must be (unmapped or both be identcally distributed) or both must be sequential. The issue is ... does the pointer exist as an instance by itself, or is it just assocated with its bound arg. There is a case of pointers to sections of arrays, so just knowing that a pointer goes a block distributed thing, one can't talk about pointers to section. Andy Meltzer recalls that was related to the fact that allocatable objects don't have their distribution until after they are assigned. A straw poll was taken with the special understanding that a substantial vote for "abstain" would mean reconsider. The vote was 7-3-10, so this CCI item is returned to committee for further clarification of the issue. ========= from September meeting minutes: CCI 11 Pointers cannot be mapped. Can they be associated with both sequential and non sequential targets? Subgroup recommends that pointers cannot point to sequential variables. Full group didn't think this was right and asked the subgroup to try again. ===================================== ===================================== CCI #32 Changing distribution of SAVE array Group D Updated: 9/18/95 Henry Zongaro Submitted: 8/31/95 Status: in progress QUESTION: Hello, Page 41, lines 21-34 of the HPF 1.1 document specifies that an array or template must not be distributed on a processor arrangement at the time the arrangement becomes undefined, unless the array or template also becomes undefined or the processor arrangement always has identical bounds. Presumably this was done so that objects with the SAVE attribute would not change mappings from one call to the next. Is this rule sufficiently strict? Consider the following: program p call sub(5) call sub(10) end program p subroutine sub(n) integer, save :: a(10) !hpf$ processors proc(2) !hpf$ distribute a(block(n)) onto proc end subroutine sub In the first call to sub, a is distributed block(5); in the second call, it is distributed block(10). This is currently permitted because the bounds of the processor arrangement have not changed. More complicated examples could be drawn involving ALIGN. Similar text appears on page 44, line 43 - page 45, line 3 for templates. DISCUSSION Comments from Rob 9/1/95 We should tighten the language to prevent this. It's a way to remap, which should be done only if the object remapped has the dynamic attribute, and only via executable directives. I believe this applies to objects in modules, for example: module mod real a(10) end module mod subroutine sub(n) use mod !hpf$ processors procs(2) !hpf$ distribute a(block(n)) onto procs end subroutine sub This should be proscribed; if redistribute is used and if A is dynamic, however, I think it's legal. Right? No action reported from Sept. meeting. ===================================== ===================================== CCI # 34 Alignment of a single dimension Group D Updated: 9/18/95 Adriaan Joubert Submitted: 9/06/95 Status: in progress Question: Hello, I am trying to align one dimension of two 2-dimensional arrays, and cannot find a way of expressing this in HPF. The problem is the following PROGRAM MAIN REAL, ALLOCATABLE :: A(:,:), Res(:,:) !HPF$ DISTRIBUTE A(BLOCK,*) !HPF$ DISTRIBUTE Res(BLOCK,*) ... ALLOCATE(A(N,M)) ALLOCATE(Res(N,M*2)) Res = SUB (A) ... CONTAINS FUNCTION SUB (A) RESULT(B) REAL, INTENT(in) :: A(:,:) !HPF$ DISTRIBUTE *(BLOCK,*) :: A REAL :: B(SIZE(A,1),SIZE(A,2)*2) !HPF$ DISTRIBUTE (BLOCK,*) :: B ... END FUNCTION SUB END PROGRAM MAIN So the 1st dimension of B and A in the subroutine will be distributed in the same way. It seems however that compilers can generate faster code, if they know how arrays are aligned with one another. Among other things the compiler would have to know that B is exactly aligned with Res. In other words I would like to add to the main program something like !HPF$ TEMPLATE :: MYTemp(N,M*2) !HPF$ DISTRIBUTE MyTemp(BLOCK,*) !HPF$ ALIGN WITH MyTemp(:,I) :: A(:,I) !HPF$ ALIGN WITH MyTemp :: Res and in the subroutine !HPF$ TEMPLATE :: MYTemp(SIZE(A,1),SIZE(A,2)*2) !HPF$ DISTRIBUTE MyTemp(BLOCK,*) !HPF$ ALIGN WITH *MyTemp(:,I) :: A(:,I) !HPF$ ALIGN WITH MyTemp :: B and then the compiler should be able to figure out that everything is nicely aligned. But I cannot do this, as I do not know N and M at compile time. In this case I could probably get away with definitions !HPF$ ALIGN WITH A(:,*) :: Res(:,*) in the main program and !HPF$ ALIGN WITH A(:,*) :: B(:,*) in the subroutine. The replication of the second dimension would not matter, as it is all on the same processor. But if I have a (BLOCK,BLOCK) distribution for both arrays, but I still want to ensure that all elements in the first section of every column are on the same row of processors, i.e. REAL A(20,20), B(20,40) P1: A(1:10,1:10) P2: A(1:10,11:20) B(1:10,1:20) B(1:10,21:40) P3: A(11:20,1:10)P4: A(11:20,11:20) B(11:20,1:20) B(11:20,21:40) there seems to be no way of telling the compiler about this with a descriptive statement. Well, what about !HPF$ ALIGN WITH A(:,I) :: B(:,I*2-1) !HPF$ ALIGN WITH A(:,I) :: B(:,I*2) But is this legal? And this can be harder to do if the second dimension is (SIZE(A,2)-1)*SIZE(A,2)/2, as in my case. I'd appreciate any help on this one. Understanding the distribution directives seems to get harder the more you know, instead of easier ;-( Adriaan ========= Discussion ========= Rob comments 9/6/95 I see you want allocatable templates, a hole in HPF that we know about. But in your case, no template is needed. There is no reason to specify replicationin your alignment. You can use !HPF$ ALIGN A(I,J) WITH RES(I,J) and drop the distribute directive for A in the main program. In the function, you can use a distribute on B and a descriptive alignment of A to B. ..... ..."But is this legal? And this can be harder to do if the second dimension is (SIZE(A,2)-1)*SIZE(A,2)/2, as in my case." ... No, it's not legal. Align is not a symmetric relation, and the alignment map cannot be many to one unless it collapses a dimension; at least that's my understanding. In case this is not clear, your statement is equivalent to !HPF$ ALIGN B(:,2*I-1) WITH A(:,I) which only explicitly aligns the odd numbered columns of B! But why align the bigger of the two arrays (B) with the smaller (A)? My version of SUB would be: FUNCTION SUB (A) RESULT(B) REAL, INTENT(in) :: A(:,:) REAL :: B(SIZE(A,1),SIZE(A,2)*2) !HPF$ ALIGN *(I,J) WITH B(I,J) :: A !HPF$ DISTRIBUTE B (BLOCK,*) ...."I'd appreciate any help on this one. Understanding the distribution directives seems to get harder the more you know, instead of easier ;-( "... Yup. -- Rob ======= >From Adam M. 9/5/95 As I understand it Adriaan Joubert wants to tell HPF to set the data up as follows: REAL A(20,20), B(20,40) P1: A(1:10,1:10) P2: A(1:10,11:20) B(1:10,1:20) B(1:10,21:40) P3: A(11:20,1:10) P4: A(11:20,11:20) B(11:20,1:20) B(11:20,21:40) I would say (like Rob Schreiber) that you include, in Main, the line !HPF$ ALIGN A(I,J) WITH B(I,J*2-1) or equivalently !HPF$ ALIGN A(:,J) WITH B(:,J*2-1) PLUS (distribute them together) !HPF$ DISTRIBUTE B (BLOCK,BLOCK) And I would claim that the procedure should be coded as follows: FUNCTION SUB (A) RESULT(B) REAL, INTENT(in) :: A(:,:) REAL :: B(SIZE(A,1),SIZE(A,2)*2) !HPF$ ALIGN *(I,J) WITH B(I,J*2-1) :: A !HPF$ DISTRIBUTE *B (BLOCK,BLOCK) If I am wrong - why? - Adam No action reported from Sept. meeting. ===================================== ===================================== I believe that the following Group D items were resolved in subcommittee - without bringing the issues to full committee. But I have no record of official action ... so they are still here as open items. PLEASE someone help close these out. Many of them have answers that were proposed via email ... but these answers need to be ok'd by subgroup. ===================================== ===================================== CCI #10 Permutations in HPF Group D Updated: 07/12/95 Kenth Engo Submitted: 01/11/95 Status: in progress Question: I have a general question about the way HPF deals with permutations of data on the different parallel architectures. In many applications in MIMD and SIMD computations today, one often encounter the need to just perform a permutation of the data distributed on the parallel computer, i.e. a one-to-one mapping of the data set onto itself. It is then very important that the routing of the data is done in such a fashion that traffic contetions are eliminated in the interconnecting network. One does not want a situation where 10 processors are communicating data to the very same processor. This would make all but one processor idle, since no more than one processor is allowed to communicate with the destination processor at the same time. What I have in mind is examplified by the following HPF code: C 8 PROCESSORS AND AN ARRAY OF 32 ELEMENTS C !HPF$ PROCESSORS SEDECIM(8) REAL CENTURY(32) C C THE ARRAY IS DISTRIBUTED BY BLOCK. C !HPF$ DISTRIBUTE CENTURY(BLOCK) ONTO SEDECIM C C THE ELEMENTS ARE DISTRIBUTED WITH ELEMENTS 1,2,3,4 ON PROC C #1, ELEMENTS 5,6,7,8 ON PROC. 2 AND SO ON. Suppose one want to redistribute the elements during execution to the following arrangement C DATA REDISTRIBUTED CYCLIC ON THE PROCESSORS C !HPF$ REDISTRIBUTE CENTURY(CYCLIC) ONTO SEDECIM C C THE ELEMENTS ARE NOW REDISTRIBUTED WITH ELEMENTS 1,9,17,25 ON C PROC #1, ELEMENTS 2,10,18,26 ON PROC. 2 AND SO ON. During this redistribution a permutation of the data set is performed. How is this permutation implemented, and what is actually done. Is there a strategy/theory for generally doing this permutation optimally, that is; without any traffic contetions in the network? I will be grateful if someone could answer this email, and possibly send or give me references to literature or people where I can find out more about how HPF implements the permutations. Best regards, Kenth Engo Discussion: Notes from May meeting CCI # 10 - not cci but a request for implementation practice. ====================== NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? 10/23/95 - still no action report ===================================== ===================================== CCI #17 Defaults for distribution Group D Updated: 07/11/95 A.C. Marshall Submitted: 04/18/95 Status: in progress Questions: Forgive me for being dim (and only having v1.0 of the draft standard) but... Looking at the syntax rules for DISTRIBUTE (p24 & 26) it would appear to me that: !HPF$ DISTRIBUTE A(BLOCK) !H303/5/8 !HPF$ DISTRIBUTE ONTO P :: A !H301/2/6/10 are valid but that !HPF$ DISTRIBUTE A !HPF$ DISTRIBUTE :: A are not. Is this just me or is this how things are supposed to be, and if so why is it not possible to use default distribution and processor grid in the same statement, after all !HPF$ PROCESSORS P(NUMBER_OF_PROCESSORS()) !HPF$ DISTRIBUTE ONTO P :: A is valid and has the same effect. Adam Marshall Discussion: notes from May meeting: needs more research: ---- Scott Baden and Chuck Koelbel reply 07/07/95 This is the way things are "supposed to be" Consult the relevant text (page 30, line 20-21, v. 1.1) ... To prevent syntactic ambiguity, the dist-format-clause must be present in the statement form [of a distribute spec] Chuck Koelbel adds that the "syntactic ambiguity" referred to here is due to the problem of non-significant blanks in Fortran: Consider > > !HPF$ DISTRIBUTE PRONTO ONTO LOGY > !HPF$ DISTRIBUTE PR ONTO ONTOLOGY > or the following example: > > !HPF$ ALIGN TWITHEADS WITH A > !HPF$ ALIGN T WITH EADSWITHA > > Disallowed for the same reason. Chuck also continues: > > It's not clear that his example has "the same effect". For example, > consider > > !HPF$ PROCESSORS P(NUMBER_OF_PROCESSORS()) > !HPF$ PROCESSORS Q(4,NUMBER_OF_PROCESSORS()/4) > !HPF$ DISTRIBUTE :: A > Does A have a 1-dimensional or 2-dimensional distribution? (Yeah, this > assumes that NUMBER_OF_PROCESSORS() is divisible by 4...) > > The reason for requiring at least one of the clauses is, "What information > are you giving if you leave both out?" In effect, > > !HPF$ DISTRIBUTE :: A > > would be a no-op, and we didn't think there was a need for that. Scott Baden Chuck Koelbel ====================== NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? 10/23/95 still don't know. ===================================== ===================================== CCI #8 Dummy assertion asterisk Group D Updated: 7/13/95 Yasuharu Hayashi Submitted: 04/25/95 Status: in progress Question: I have a question about the interpretation of the assertion asterisk when the template of a dummy argument is a natural template. According to High Performance Fortran Language Specification November 10 ,1994 Version 1.1 p.51,l.31, "If the dummy argument has a natural template (no INHERIT attribute) then things are more complicated. In certain situations the programmer is justified in inferring a preexisting distribution for the natural template ......" When a actual argument is a whole array, the text on p.51,l.35 states only "In all these situations, the actual argument must be a whole array or array section, and the template of the actual must be coextensive with the array along any axis having a distribution format other than "*". If the actual argument is a whole array, then the pre-existing distribution of the natural template of the dummy is identical to that of the actual argument". I think this description is ambiguous. For example : PROGRAM EX REAL A(10,10),B(5,5) !HPF$ PROCESSORS P(5) !HPF$ DISTRIBUTE A(BLOCK,*) ONTO P !HPF$ ALIGN B(*,I) WITH A(2*I,*) CALL SUB(B) : END SUBROUTINE SUB(BB) REAL BB(5,5) !HPF$ PROCESSORS P(5) !HPF$ DISTRIBUTE *(*,BLOCK(1)) ONTO *P :: BB : END Is the assertion asterisks for BB in SUB HPF-conforming ? Isn't it necessary to add the list as follows which shows what assertion asterisks for a natural template are legal when a actual argument is a whole array ? "1.If n th axis of a actual argument which is a whole array corresponds to T(n) th axis of the template of it and j > i, T(j) must be larger than T(i). 2.If the situation is not described below ,no assertion about the distribution of the natural template of a dummy is HPF-conforming. (a) If the alignment of the actual array axis with its template is collapsed, then * should appear in the distribution for the corresponding axis of the natural template of the dummy. (b) If the actual array is aligned with the axis of its template by replication (or "replication-triplet") and that template axis is distributed * , then no entry should appear in the distribution for the natural template of the dummy. (c) If the actual array is aligned with the axis of its template by int-expr and that template axis is distributed * , then no entry should appear in the distribution for the natural template of the dummy. (d) If the alignment of the actual array axis with the axis of its template is subscript triplet l:u:s and that axis of its template distributed *, then * should appear in the distribution for the corresponding axis of the natural template of the dummy. (e) If the alignment of the actual array axis with the axis of its template is subscript triplet l:u:s and that axis of its template distributed BLOCK(n) and LB is the lower bound for that axis of the template, then BLOCK(n/s) should appear in the distribution for the natural template of the dummy, provided that s divides n evenly and that l - LB < s. (f) If the alignment of the actual array axis with the axis of its template is subscript triplet l:u:s and that axis of its template distributed CYCLIC(n) and LB is the lower bound for that axis of the template, then CYCLIC(n/s) should appear in the distribution for the natural template of the dummy , provided that s divides n evenly and that l - LB < s." (g) If the alignment of the actual array axis with the axis of its template is subscript triplet l:u:s ,s must be positive. Or it might be better to forbid the use of any assertion asterisks in DISTRIBUTE directive in case that a dummy argument doesn't have INHERIT attribute and the corresponding actual argument isn't ultimately aligned with itself since it seems that this solution makes things far simplified and cause little actual inconvenience (the same effect can also be achieved by ALIGN directive). Discussion: Rob replies ... This example is nonconforming because axis 2 of B is NOT coextensive (one-to-one and onto mapping to) axis 1 of the template to which B is aligned. > ... example from original message Now let the example be this: > PROGRAM EX > REAL A(10,10),B(5,10) > !HPF$ PROCESSORS P(5) > !HPF$ DISTRIBUTE A(BLOCK,*) ONTO P > !HPF$ ALIGN B(*,I) WITH A(I,*) > CALL SUB(B) > : > END > > SUBROUTINE SUB(BB) > REAL BB(5,10) > !HPF$ PROCESSORS P(5) > !HPF$ DISTRIBUTE *(*,BLOCK(2)) ONTO *P :: BB > : > END This example is correct. The replication over the second axis of the template of the actual is not a problem because that is an axis whose distribution format is *. B is not coextensive with that axis because it has a one-to-many association with it, but since the template axis has a * distribution, coextension is not a requirement. Is this reasonable? Rob Schreiber ============= meeting discussion , Henry, Jerry to circulate proposed wording for this definition. ====================== NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? 10/23/95 still don't know ===================================== ===================================== CCI #19 Mapping function results Group D Updated: 7/13/95 Henry Zongaro Submitted: 05/04/95 Status: in progress Question: I have a couple of questions related to specification of mappings for function results. 1) Consider the following program fragment program prog interface function f() integer f(100) !hpf$ processors p(number_of_processors()) !hpf$ distribute f(block) onto p end function f end interface call sub(f()) end program prog subroutine sub(i) integer :: i(100) !hpf$ processors p(number_of_processors()) !hpf$ distribute i *(block) onto *p end subroutine sub Is the above HPF conforming? Does distribution of a function result variable affect the distribution of the expression returned? The text on page 53, lines 27-28 indicates that the alignment of an expression is, in general, unpredictable, except in the case of arrays and array sections, so I believe the answer to my question is "No". However, this is actually spurred by another question relating to the SEQUENCE directive. According to page 151, line 47 of the 1.1 HPF Spec., an can be a . When I first read this, I thought the explicit reference to was there to include result variables. Now a co-worker has suggested an alternate interpretation, and we were wondering which is correct. Her suggestion was that this is trying to allow something like the following: program p integer, external :: f !hpf$ sequence :: f i = f() end program p Is this correct? Will this make the result of the function sequential? If so, that brings up another question: program p interface function f() integer :: f(10) !hpf$ sequence :: f end function f end interface call sub(f()) end program p subroutine sub(a) integer a(2, 5) !hpf$ sequence a end subroutine sub According to page 155, lines 33-35, an array valued expression cannot be specified to be sequential, but if specifying f to be a sequential function makes its result value sequential, this would be a contradiction. Discussion: May meeting minutes: needs more research ====================== NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? 10/23/95 Still don't know. ===================================== ===================================== CCI #21 Question about derived type mappings and documentation. Group D Updated: 7/19/95 Michael Hennecker Submitted: 05/11/95 Status: in progress Question: Hello, I have some questions regarding data mapping of objects of derived type: (1) Is it possible to DISTRIBUTE / ALIGN objects of derived type, or are the data mapping attibutes restricted to intrinsic types? (2) If mapping of objects of derived type is not possible, shouldn't the v1.0 and v1.1 specs for HPF_ALIGNMENT, HPF_DISTRIBUTION and HPF_TEMPLATE "ALIGNEE may be of any type." (5.7.15, 5.7.16) "DISTRIBUTEE may be of any type." (5.7.17) read "may be of any intrinsic type." ? Best regards, Michael Discussion: Reply by CHK 5/15/95 ... It is possible to map objects of derived type. (It is currently not possible to map components of derived type objects; this is being discussed in the HPFF 95 meetings.) Second question is moot, given the first answer. Thanks for asking. Chuck Koelbel ====================== NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? 10/23/95 still don't know. ===================================== ===================================== CCI #22 Implementor note about processor distributions? Group D Updated: 7/19/95 Henry Zongaro Submitted: 6/15/95 Status: in progress Question: Hello, We came across something that didn't seem immediately obvious here, and might not be immediately obvious to others, so we were wondering whether a note to users and/or implementers might be justified. On page 30 of the 1.1 Language Spec., it's stated that if the ONTO clause of a DISTRIBUTE directive is omitted, an arbitrary processor arrangement is chosen for each distributee. In some cases, there may be no suitable arrangement; I assume such a program would not be HPF-conforming. For example, program p integer :: a(10, 10) !hpf$ distribute a(block(5), block(5)) end program p Here, the processor arrangement created would have to have an extent of at least two in each dimension (which, by the way, constrains how arbitrary the selection of a processor arrangement can be), so this program could not run in an environment in which the number of processors was fewer than four. Does a note seem worthwhile here, or do others feel such a case is immediately obvious? Thanks, Henry ====================== NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? 10/23/95 still don't know ===================================== ===================================== CCI #26 Conditional realignment Group D Updated: 07/19/95 Fabien COELHO Submitted: 07/01/95 Status: in progress Question: Hi out there, Is this kind of thing allowed in HPF ? ! align A with T if (some runtime condition) ! realign A with T' endif ! redistribute T ... after the redistribution, array A mapping is not known. It depends on the runtime condition. I cannot remember anything that may forbid this. I guess it is not very nice for the compiler... should/could be forbidden ? Or I may be wrong ? Fabien. DISCUSSION: Chuck Koelbel replies 7/3/95: Yes, this is allowed. This was the intent of HPF REALIGN and REDISTRIBUTE - to allow the user to make run-time decisions about data mapping. Allowing run-time remapping will indeed require substantial support in the run-time system. We discussed this tradeoff, and the concensus was that users had valid reasons for wanting this capability, therefore it should be in the language. The difficulties with implementation were one reason that REALIGN and REDISTRIBUTE were not put in Subset HPF. In short, you are right that this is legal and hard to implement. You are wrong that it is/should be forbidden. Chuck ====================== NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? 10/23/95 still don't know ===================================== ===================================== CCI #30 Collapsing dimensions Group D Updated: 9/18/95 Rob Schreiber Submitted: 8/3/95 Status: in progress Question II. This is really a question for implementors, as much as a question for language lawyers. Consider this program real a(100,200), b(100) !hpf$ distribute a(block, *) !hpf$ distribute b(*) forall (row = 1:100) a(row,:) = f(a(row,*)) ... pure function f(x) real x(:) real f(size(x)) !hpf$ distribute *(*) :: x !hpf$ align f(:) with x(:) I cannot find any rule against this. (The issue is whether one may distribute an r dimensional object with fewer than r instances of BLOCK or CYCLIC(k) in its dist format list.) What is the mapping of b? And should one be allowed to describe the mapping of x in this manner, or must one use the more cumbersome and specific: pure function f(x, row) integer row real x(:) real f(size(x)) !hpf$ template t(100,200) !hpf$ align f(:) with x(:) !hpf$ align x(:) with t(row, :) !hpf$ distribute t(block, *) ----------- Hello, Pres asked me to amplify my previous CCI request concerning the directive distribute x(*) Here is some additional commentary: The key issue is to let the compiler know what's going on when a subroutine is passed an "on one processor only" section of a distributed array; the call site is probably in a forall or independent loop. The obvious syntax is to say, prescriptively: !hpf$ distribute dummy_arg(*) or descriptively: !hpf$ distribute dummy_arg*(*) I was surprised that this is allowed by the HPF syntax: if this distribution is specified by the program for an array, that is not a dummy arg, I don't know what to make of it. Would it mean to replicate the array? To store it on one processor of the compiler's choice? To store it on the "front-end? In shared memory? I think a reasonable proposal would be as follows: -------------------------------------------------------------------------------------- In a (re)distribute directive, the number of non-* (i.e. block and cyclic[(k)]) entries in the dist-format-list must ordinarily be at least one, and must be the same as the rank of the processors arrangement in the ONTO clause, if present. If, however, the distributee is a dummy argument, then, if the distribute directive is descriptive, the requirement of at least one non-* entry in the dist-format-list is waived. Thus real dummy(:,:) !hpf$ distribute dummy *(*,*) Is valid for a dummy argument; it asserts that the actual argument will be distributed on a single processor. -------------------------------------------------------------------------------------- (Advise to language designers:) It's quite likely that a section of a processors arrangement will be allowed in the ONTO clause of (re)distribute. In that case, one could also use the following subroutine act_on_local_info(dummy, iproc) real dummy(:,:) !hpf$ processors all_procs(number_of_processors()) !hpf$ distribute dummy *(*,*) onto all_procs(iproc) This would be appropriate in the following contexts: program main real actual_2d(8,16), actual_wide_2d(8,32), actual_3d(8, 16, 10) $hpf$ processors procs(8) !hpf$ distribute (*, block) onto procs :: actual_2d, actual_wide_2d !hpf$ distribute (*, block, *) onto procs :: actual_2d, actual_wide_2d !hpf$ independent do j = 1, 1 call act_on_local_info( actual_2d(:, j:j), (j+1)/2 ) ! dummy shape is (8,1) call act_on_local_info( actual_wide_2d(:, 2*j-1:2*j), (j+1)/2 ) ! dummy shape is (8,2) call act_on_local_info( actual_3d(:, j, :), (j+1)/2 ) ! dummy shape is (8,10) enddo -------------------------------------------------------------------------------------- The alternative to this, as far as I can tell, is to make the programmer align the dummy to a template, as follows: subroutine act_on_located_info(dummy, iproc) real dummy(:,:) !hpf$ processors all_procs(number_of_processors()) !hpf$ template, distribute onto all_procs :: all_temp(number_of_processors()) !hpf$ align *(*,*) with all_temp(iproc) :: dummy program main real actual_2d(8,16), actual_wide_2d(8,32), actual_3d(8, 16, 10) $hpf$ processors procs(8) !hpf$ distribute (*, block) onto procs :: actual_2d, actual_wide_2d !hpf$ distribute (*, block, *) onto procs :: actual_2d, actual_wide_2d !hpf$ independent do j = 1, 16 call act_on_located_info( actual_2d(:, j:j), (j+1)/2 ) ! dummy shape is (8,1) call act_on_located_info( actual_wide_2d(:, 2*j-1:2*j), (j+1)/2 ) ! dummy shape is (8,2) call act_on_located_info( actual_3d(:, j, :), (j+1)/2 ) ! dummy shape is (8,10) enddo -- Rob no action report ed from Sept meeting. ===================================== GROUP E New items ===================================== CCI #35 F95 and reduction functions. Group E Adam Marshall Submitted: 10/5/95 Status: new Question: I may well have missed something here but does the HPFF intend to `redraft' the HPF V1.1 spec to define the language binding in terms of Fortran 95 or is that going to be left to HPF 2? For example, I am thinking of the argument lists to tghe reduction functions MINVAL, PRODUCT etc. Also there are some very "sensible" extensions in allowing user defined functions in specification expressions. I guess as F95 is a superset of F90 vendors could provide the new F95 features as extensions to HPF V1.1. It would seem a sensible thing to do. Adam Marshall Discussion: partial reply from Mary ... Adam We plan to address F95 in HPF2 ... timing of approval for F95 is a bit awkward, since we may be a bit ahead of F95 formal approval. But the spirit of the group is that full HPF is F95 + ... It is highly unlikely that we will make these changes retroactive to the definition of HPF1.1 specification. As you suggest, that would be left to the vendors. We are not in the mode of making extensions to HPF1.1 at this time. -mary zosel- and partial reply from Chuck ... The short answer is, "We'll leave Fortran 95 up to HPF 2.0." I agree that F95 has some nice features, taken from HPF and from other places. (I can't speak for the whole HPFF working group, but I think this is a common view.) Plans are to restructure the HPF 2.0 specification to be based on F95, but intermediate versions (corrections, clarifications, interpretations) will continue to be based on F90. Think of this as bundling all the big changes into one package. As with (almost) any language, vendors can add their own extensions. Their customers can decide whether they want to pay for worthwhile features that may be nonportable to other platforms. F95 extensions to HPF sound very feasible, and should be more portable than extensions like "brand X active objects". (Well, I'm *trying* not to slam any particular company...) Chuck Koelbel NOTE - these replies don't address entire question ....