Hpff-Core Following is a text version of the active CCI items. There will be a formatted version of the full text at the meeting this week. These are ONLY the active items, and they are sorted by group. So don't be surprised that they don't start with #1 - and don't have all the numbers. Two disclaimers: #1 - I didn't get home with any record of action for a number of the CCI's discussed at the July meeting - These were the "easy" ones discussed in subgroups, but not in the full meeting. So unfortunately - They're Back! I hope someone remembers. #2 - In the dump and edit process from my database, MSWORD managed to do a weird and wonderful mangling job on the file. I think I caught all the problems, but if some text seems strange - just wait for the "real" copy that will be available at the meeting. And an offer: If any of you would like the postscript version of the text (will be at the meeting), or a copy of the filemaker-pro database (mac), send me a note and I'll ship it to you. ----------------------------------- Active CCI - for Sept. meeting ----------------------------------- ******************************************************** GROUP C ACTIVE ITEMS ******************************************************** Item #18 Henry Zongaro 05/04/95 No action from the July meeting was recorded ... was this resolved or not? Action: Chuck, Guy, Henry, Jerry Title: Defined assignment in FORALL Group C Updated: 7/12/95 status: in progress ----------------- Hello, I was wondering whether there's not a problem with allowing defined assignment to appear within a FORALL. Consider the following example. module mod integer :: a(3) = (/1,2,3/) contains pure subroutine def_assign(lhs, rhs) integer, intent(inout) :: lhs character, intent(in) :: rhs lhs = a(ichar(rhs)+1) end subroutine def_assign end module mod program p use mod interface assignment(=) module procedure def_assign end interface forall (i = 1:2) a(i) = char(i) ! A sneaky way of passing "i" ! to def_assign end program p The rules of forall specify that the right-hand side and the indices of the left-hand side are evaluated, in any order, prior to assignment, which also takes place in any order. In the above example, we have a(1) = char(1) a(2) = char(2) as the two defined assignments which take place. Inside of def_assign, there's a host-associated reference to a, so what ends up happening is the following: a(1) = a(2) a(2) = a(3) The order in which these assignments occurs affects the result. The value of a after the forall statement is executed could be (/2,3,3/) or (/3,3,3/). Basically, the problem is that in defined assignment, completely evaluating the right-hand side for all active combinations does not necessarily let the compiler precompute everything which might also appear on the left-hand side. Thanks, Henry ------------------ DISCUSSION AT MEETING: CCI #18 ...Semantics of a forall are to evaluate all rhs and then store in all lhs, but if the assignment operator is user defined this is under user control, not compiler control. The user's definition might make the order of assignment important. Guy queried how an array assignment was handled in this case. Jf a forall are to evaluate all rhs and then store in all lhs, but if the assignment operator is user defined this is under user control, not compiler control. The user's definition might make the order of assignment important. Guy queried how an array assignment was handled in this case. Jerry will take this question to X3J3 about the status with respect to elemental functions. Guy pointed out that for WHERE F90 forbids the defined assignment. Chuck presented a proposal that sounded promising: that the evaluation is as if the rhs were assigned into a temp using the defined assignment operator and then the lhs is a direct copy of the already evaluated values. There was discussion of issues like the type of the temp (same as lhs or rhs?). Action Chuck, Guy, Henry, Jerry to circulate proposed wording for this definition. ===================================================== ===================================================== Item #25 Matt Rosing 6/27/95 Status: in progress No action from July meeting - was this resolved? Group C Title: Use of pure and independent Question: I have some questions about HPF semantics and implementation. These are based on the HPF definition I have (dated 5/93). I'm trying to determine if HPF can be used to implement a code we have and I have questions about spmd execution within HPF. The code has two phases, the first fits the data parallel model very well and the second does not. Within the second phase, independent operations are done on sections of a distributed array. These operations modify the array. There are apparently two methods to describe non-data parallel code in HPF. This includes functions and subroutines declared pure, and loops declared independent. There are about five pages of constraints on the use of pure functions and it's not clear that I can use them. The main reason I believe they won't work is that pure functions are not allowed to have side effects and our code has each independent operation modifying a part of a distributed array. Is it true that pure functions can't be used for this? Independent do loops, however, appear to have fewer constraints. The only constraint seems to be that multiple loop iterations can not write to the same location (I'm ignoring IO). I have a few questions to see how far I can push this: 1) Can an iteration modify data belonging to another processor? 2) If so, how do implementations handle the one-sided nature of the communication generated by this? The reason I ask is that, if done naively by buffering remote writes until the loops are done, this could require as much space for buffers as there is space for the data structure being operated on. 3) Is there any limitation on the types of routines that can be called from within an independent loop? 4) How is the scheduling done? If there are just a bunch of subroutine calls within the body of the loop, how does the compiler figure out which processor does what? 5) Is it possible to dynamically schedule the loops on the processors for load balancing? Although none of the iterations would interfere, the scheduling mechanism would. Some of these questions are probably outside the scope of the language definition but I would still appreciate answers from any implementors. The reason being that the resulting performance might depend heavily on the implementation. Thanks, Matt (m_rosing@pnl.gov) ------------ Chuck Replies 6/29/95 There is a new draft of the HPF spec, available through http://www.erc.msstate.edu/hpff/home.html. The changes are minor (basically, corrections and clarifications) from the version you're looking at. The entire purpose of PURE is to ensure that those functions do not have any side effects. So, at face value, this is correct. You could rewrite the functions to return arrays (this assumes each function modifies one array). You could then call the PURE functions, assigning the returned values into the separate array sections. For example, the final code calling the functions might look like FORALL ( I=1:NUM_BLOCKS ) X(ILOW(I):IHI(I)) = FOO( A, B, C ) END FORALL This would probably mean a lot of modification for the code, and it might break the current generation of compilers. But it would be one way to use PURE in this context. >1) Can an iteration modify data belonging to another processor? Yes. Distribution of data has no effect on the semantics of the program. Note that HPF has no notion of the processor that an iteration executes on. (Of course, the underlying compiler should have some such idea...) >2) If so, how do implementations handle the one-sided nature of .... As you suggest below, this is something outside the scope of the language definition. Very good compilers will do something efficient (like strip-mining the loops) to ensure reasonable-sized buffers. Bad compilers will run out of memory and generate core dumps. >3) Is there any limitation on the types of routines that can be called from within an independent loop? Not in the language. The only limitation is in how called routines behave, i.e. that they don't violate the independence conditions. >4) How is the scheduling done? If there are just a bunch of subroutine .... As with 2 above, this is beyond the scope of the language definition. You are correct that this will be difficult to do well. Incidentally, discussions of specifying scheduling mechanisms for parallel loops are going on in the HPFF 2 meetings. If you have opinions on the subject (and I know you do, Matt :-), please feel free to contribute them. The final language will be better for it. >5) Is it possible to dynamically schedule the loops on the processors .... A dynamic scheduling mechanism would be a valid HPF implementation. (Of course, it is not the only one.) Independence is not a problem here because any interference is between system variables; INDEPENDENT only makes an assertion about the user code. Again, discussion in the HPFF 2 meetings may be relevant to this question in the future. =====Rik Littlefield replies 6/29/95========== Matt, You write: >2) If so, how do implementations handle the one-sided nature of .... If you're talking about the electronic structure codes, say creating the Fock matrix, then the problem is even worse. There are only O(N^2) data elements but something like O(N^4) updates to those elements. (At best, I think it's O(N^3) with clever tricks.) ======== NO ACTION FROM JULY MEETING RECORDED _ WAS THIS RESOLVED? ===================================================== ===================================================== Item # 29 Rob Schreiber 8/3/95 status: new Title: Calling hpf_local from independent loop Group C & E Hello, Question I. Can an extrinsic(hpf_local) be invoked in an independent loop? In a Forall? Ex: Forall (1 = 1:10) a(i) = f(i,a(i)) Note that part of the calling sequence, as specified in Ver 1.1, appendix A, is "The processors are synchronized. In other words, all actions that logically precede the call are completed." It seems clear that when this was written it was tacitly assumed that the call did not occur in an independent loop or forall. Part 2: May ony other kind of extrinsic be called in a forall or independent loop? ===================================================== ===================================================== ********************************************************* GROUP D ACTIVE ITEMS ******************************************************** Item # 8 Yasuharu Hayashi 04/25/95 status: in progress NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? action: study and discuss again in July Title: Dummy assertion asterisk Group D Updated: 7/13/95 Question: I have a question about the interpretation of the assertion asterisk when the template of a dummy argument is a natural template. According to High Performance Fortran Language Specification November 10 ,1994 Version 1.1 p.51,l.31, "If the dummy argument has a natural template (no INHERIT attribute) then things are more complicated. In certain situations the programmer is justified in inferring a preexisting distribution for the natural template ......" When a actual argument is a whole array, the text on p.51,l.35 states only "In all these situations, the actual argument must be a whole array or array section, and the template of the actual must be coextensive with the array along any axis having a distribution format other than "*". If the actual argument is a whole array, then the pre-existing distribution of the natural template of the dummy is identical to that of the actual argument". I think this description is ambiguous. For example : PROGRAM EX REAL A(10,10),B(5,5) !HPF$ PROCESSORS P(5) !HPF$ DISTRIBUTE A(BLOCK,*) ONTO P !HPF$ ALIGN B(*,I) WITH A(2*I,*) CALL SUB(B) : END SUBROUTINE SUB(BB) REAL BB(5,5) !HPF$ PROCESSORS P(5) !HPF$ DISTRIBUTE *(*,BLOCK(1)) ONTO *P :: BB : END Is the assertion asterisks for BB in SUB HPF-conforming ? Isn't it necessary to add the list as follows which shows what assertion asterisks for a natural template are legal when a actual argument is a whole array ? "1.If n th axis of a actual argument which is a whole array corresponds to T(n) th axis of the template of it and j > i, T(j) must be larger than T(i). 2.If the situation is not described below ,no assertion about the distribution of the natural template of a dummy is HPF-conforming. (a) If the alignment of the actual array axis with its template is collapsed, then * should appear in the distribution for the corresponding axis of the natural template of the dummy. (b) If the actual array is aligned with the axis of its template by replication (or "replication-triplet") and that template axis is distributed * , then no entry should appear in the distribution for the natural template of the dummy. (c) If the actual array is aligned with the axis of its template by int-expr and that template axis is distributed * , then no entry should appear in the distribution for the natural template of the dummy. (d) If the alignment of the actual array axis with the axis of its template is subscript triplet l:u:s and that axis of its template distributed *, then * should appear in the distribution for the corresponding axis of the natural template of the dummy. (e) If the alignment of the actual array axis with the axis of its template is subscript triplet l:u:s and that axis of its template distributed BLOCK(n) and LB is the lower bound for that axis of the template, then BLOCK(n/s) should appear in the distribution for the natural template of the dummy, provided that s divides n evenly and that l - LB < s. Question continued: (f) If the alignment of the actual array axis with the axis of its template is subscript triplet l:u:s and that axis of its template distributed CYCLIC(n) and LB is the lower bound for that axis of the template, then CYCLIC(n/s) should appear in the distribution for the natural template of the dummy , provided that s divides n evenly and that l - LB < s." (g) If the alignment of the actual array axis with the axis of its template is subscript triplet l:u:s ,s must be positive. Or it might be better to forbid the use of any assertion asterisks in DISTRIBUTE directive in case that a dummy argument doesn't have INHERIT attribute and the corresponding actual argument isn't ultimately aligned with itself since it seems that this solution makes things far simplified and cause little actual inconvenience (the same effect can also be achieved by ALIGN directive). Discussion: Rob replies ... This example is nonconforming because axis 2 of B is NOT coextensive (one-to-one and onto mapping to) axis 1 of the template to which B is aligned. >... example from original message Now let the example be this: >PROGRAM EX >REAL A(10,10),B(5,10) >!HPF$ PROCESSORS P(5) >!HPF$ DISTRIBUTE A(BLOCK,*) ONTO P >!HPF$ ALIGN B(*,I) WITH A(I,*) >CALL SUB(B) >: >END >SUBROUTINE SUB(BB) >REAL BB(5,10) >!HPF$ PROCESSORS P(5) >!HPF$ DISTRIBUTE *(*,BLOCK(2)) ONTO *P :: BB >: >END This example is correct. The replication over the second axis of the template of the actual is not a problem because that is an axis whose distribution format is *. B is not coextensive with that axis because it has a one-to-many association with it, but since the template axis has a * distribution, coextension is not a requirement. Is this reasonable? Rob Schreiber =============meeting discussion CCI #8 ...Semantics of a forall are to evaluate all rhs and then store in all lhs, but if the assignment operator is user defined this is under user control, not compiler control. The user's definition might make the order of assignment important. Guy queried how an array assignment was handled in this case. Jerry will take this question to X3J3 about the status with respect to elemental functions. Guy pointed out that for WHERE F90 forbids the defined assignment. Chuck presented a proposal that sounded promising: that the evaluation is as if the rhs were assigned into a temp using the defined assignment operator and then the lhs is a direct copy of the already evaluated values. There was discussion of issues like the type of the temp (same as lhs or rhs?). Action Chuck, Guy, Henry, Jerry to circulate proposed wording for this definition. ====================== NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? ===================================================== ===================================================== Item # 10 Kenth Engo 01/11/95 status: in progress NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? who has action item on this? Title: Permutations in HPF Group D Updated: 07/12/95 Question: I have a general question about the way HPF deals with permutations of data on the different parallel architectures. In many applications in MIMD and SIMD computations today, one often encounter the need to just perform a permutation of the data distributed on the parallel computer, i.e. a one-to-one mapping of the data set onto itself. It is then very important that the routing of the data is done in such a fashion that traffic contetions are eliminated in the interconnecting network. One does not want a situation where 10 processors are communicating data to the very same processor. This would make all but one processor idle, since no more than one processor is allowed to communicate with the destination processor at the same time. What I have in mind is examplified by the following HPF code: C 8 PROCESSORS AND AN ARRAY OF 32 ELEMENTS C !HPF$ PROCESSORS SEDECIM(8) REAL CENTURY(32) C C THE ARRAY IS DISTRIBUTED BY BLOCK. C !HPF$ DISTRIBUTE CENTURY(BLOCK) ONTO SEDECIM C C THE ELEMENTS ARE DISTRIBUTED WITH ELEMENTS 1,2,3,4 ON PROC C #1, ELEMENTS 5,6,7,8 ON PROC. 2 AND SO ON. Suppose one want to redistribute the elements during execution to the following arrangement C DATA REDISTRIBUTED CYCLIC ON THE PROCESSORS C !HPF$ REDISTRIBUTE CENTURY(CYCLIC) ONTO SEDECIM C C THE ELEMENTS ARE NOW REDISTRIBUTED WITH ELEMENTS 1,9,17,25 ON C PROC #1, ELEMENTS 2,10,18,26 ON PROC. 2 AND SO ON. During this redistribution a permutation of the data set is performed. How is this permutation implemented, and what is actually done. Is there a strategy/theory for generally doing this permutation optimally, that is; without any traffic contetions in the network? I will be grateful if someone could answer this email, and possibly send or give me references to literature or people where I can find out more about how HPF implements the permutations. Best regards, Kenth Engo Notes from May meeting CCI # 10 - not cci but a request for implementation practice. Nothing recorded for July. ===================================================== ===================================================== Item # 11 Henry Zongaro 02/16/95 status: in progress action: needs more research Title: pointer with sequence Group D Updated: 8/14/95 Question: Hello, Things have been quiet here lately, so I thought I'd send a few questions that I've been hoarding. All page and line references are relative to the 1.0 HPF Language Spec. I'd like to hear other people's opinions on these, especially the 2nd and 3rd items. 1) The response to CCI item 6.3 indicated that variables with the POINTER attribute must be distributed or aligned. A related question - can they be given the SEQUENCE attribute? Or can a pointer be associated with both sequential and nonsequential targets? ------------ Meeting minutes: needs more research ======== from July meeting - subgroup proposal: conforming dimensions of the pointer object and target either must be (unmapped or both be identcally distributed) or both must be sequential. The issue is ... does the pointer exist as an instance by itself, or is it just assocated with its bound arg. There is a case of pointers to sections of arrays, so just knowing that a pointer goes a block distributed thing, one can't talk about pointers to section. Andy Meltzer recalls that was related to the fact that allocatable objects don't have their distribution until after they are assigned. A straw poll was taken with the special understanding that a substantial vote for "abstain" would mean reconsider. The vote was 7-3-10, so this CCI item is returned to committee for further clarification of the issue. ================================================= ================================================= Item# 17 A.C. Marshall 5/03/95 status: in progress NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? action: needs more research Title: Defaults for distribution Group D Updated: 07/11/95 Question: Forgive me for being dim (and only having v1.0 of the draft standard) but... Looking at the syntax rules for DISTRIBUTE (p24 & 26) it would appear to me that: !HPF$ DISTRIBUTE A(BLOCK) !H303/5/8 !HPF$ DISTRIBUTE ONTO P :: A !H301/2/6/10 are valid but that !HPF$ DISTRIBUTE A !HPF$ DISTRIBUTE :: A are not. Is this just me or is this how things are supposed to be, and if so why is it not possible to use default distribution and processor grid in the same statement, after all !HPF$ PROCESSORS P(NUMBER_OF_PROCESSORS()) !HPF$ DISTRIBUTE ONTO P :: A is valid and has the same effect. Adam Marshall notes from May meeting: needs more research: ---- Scott Baden and Chuck Koelbel reply 07/07/95 This is the way things are "supposed to be" Consult the relevant text (page 30, line 20-21, v. 1.1) ... To prevent syntactic ambiguity, the dist-format-clause must be present in the statement form [of a distribute spec] Chuck Koelbel adds that the "syntactic ambiguity" referred to here is due to the problem of non-significant blanks in Fortran: Consider >!HPF$ DISTRIBUTE PRONTO ONTO LOGY >!HPF$ DISTRIBUTE PR ONTO ONTOLOGY or the following example: >!HPF$ ALIGN TWITHEADS WITH A >!HPF$ ALIGN T WITH EADSWITHA >Disallowed for the same reason. Chuck also continues: >It's not clear that his example has "the same effect". For example, consider >!HPF$ PROCESSORS P(NUMBER_OF_PROCESSORS()) !HPF$ PROCESSORS Q(4,NUMBER_OF_PROCESSORS()/4) !HPF$ DISTRIBUTE :: A >Does A have a 1-dimensional or 2-dimensional distribution? (Yeah, this assumes that NUMBER_OF_PROCESSORS() is divisible by 4...) >The reason for requiring at least one of the clauses is, "What information >are you giving if you leave both out?" In effect, >!HPF$ DISTRIBUTE :: A >would be a no-op, and we didn't think there was a need for that. Scott Baden Chuck Koelbel ====================== NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? ===================================================== ===================================================== Item # 19 Henry Zongaro 05/04/95 Status: in progress NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? Action: needs more research Title: Mapping function results Group D Updated: 7/13/95 I have a couple of questions related to specification of mappings for function results. 1) Consider the following program fragment program prog interface function f() integer f(100) !hpf$ processors p(number_of_processors()) !hpf$ distribute f(block) onto p end function f end interface call sub(f()) end program prog subroutine sub(i) integer :: i(100) !hpf$ processors p(number_of_processors()) !hpf$ distribute i *(block) onto *p end subroutine sub Is the above HPF conforming? Does distribution of a function result variable affect the distribution of the expression returned? The text on page 53, lines 27-28 indicates that the alignment of an expression is, in general, unpredictable, except in the case of arrays and array sections, so I believe the answer to my question is "No". However, this is actually spurred by another question relating to the SEQUENCE directive. According to page 151, line 47 of the 1.1 HPF Spec., an can be a . When I first read this, I thought the explicit reference to was there to include result variables. Now a co-worker has suggested an alternate interpretation, and we were wondering which is correct. Her suggestion was that this is trying to allow something like the following: program p integer, external :: f !hpf$ sequence :: f i = f() end program p Is this correct? Will this make the result of the function sequential? If so, that brings up another question: program p interface function f() integer :: f(10) !hpf$ sequence :: f end function f end interface call sub(f()) end program p subroutine sub(a) integer a(2, 5) !hpf$ sequence a end subroutine sub According to page 155, lines 33-35, an array valued expression cannot be specified to be sequential, but if specifying f to be a sequential function makes its result value sequential, this would be a contradiction. ---------------- May meeting minutes: needs more research ====== Nothing recorded from July meeting. ===================================================== ===================================================== Item # 21 Michael Hennecker 05/11/95 Status: in progress NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? Title: Question about derived type mappings and documentation. Group D Question: Hello, I have some questions regarding data mapping of objects of derived type: (1) Is it possible to DISTRIBUTE / ALIGN objects of derived type, or are the data mapping attibutes restricted to intrinsic types? (2) If mapping of objects of derived type is not possible, shouldn't the v1.0 and v1.1 specs for HPF_ALIGNMENT, HPF_DISTRIBUTION and HPF_TEMPLATE "ALIGNEE may be of any type." (5.7.15, 5.7.16) "DISTRIBUTEE may be of any type." (5.7.17) read "may be of any intrinsic type." ? Best regards, Michael Reply by CHK 5/15/95 ... It is possible to map objects of derived type. (It is currently not possible to map components of derived type objects; this is being discussed in the HPFF 95 meetings.) Second question is moot, given the first answer. Thanks for asking. Chuck Koelbel ------- Nothing recorded from July meeting. ===================================================== ===================================================== Item # 22 Henry Zongaro 6/15/95 Status: in progress NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? Title: Implementor note about processor distributions? Group D Question: Hello, We came across something that didn't seem immediately obvious here, and might not be immediately obvious to others, so we were wondering whether a note to users and/or implementers might be justified. On page 30 of the 1.1 Language Spec., it's stated that if the ONTO clause of a DISTRIBUTE directive is omitted, an arbitrary processor arrangement is chosen for each distributee. In some cases, there may be no suitable arrangement; I assume such a program would not be HPF- conforming. For example, program p integer :: a(10, 10) !hpf$ distribute a(block(5), block(5)) end program p Here, the processor arrangement created would have to have an extent of at least two in each dimension (which, by the way, constrains how arbitrary the selection of a processor arrangement can be), so this program could not run in an environment in which the number of processors was fewer than four. Does a note seem worthwhile here, or do others feel such a case is immediately obvious? Thanks, Henry -------- Nothing recorded from July meeting ===================================================== ===================================================== Item # 26 Fabien COELHO 07/01/95 Status: in progress NO ACTION FROM JULY MEETING RECORDED. WAS THIS RESOLVED OR NOT? Title: Conditional realignment Group D Question: Hi out there, Is this kind of thind of thing allowed in HPF ? ! align A with T if (some runtime condition) ! realign A with T' endif ! redistribute T ... after the redistribution, array A mapping is not known. It depends on the runtime condition. I cannot remember anything that may forbid this. I guess it is not very nice for the compiler... should/could be forbidden ? Or I may be wrong ? Fabien. Chuck Koelbel replies 7/3/95: Yes, this is allowed. This was the intent of HPF REALIGN and REDISTRIBUTE - to allow the user to make run-time decisions about data mapping. Allowing run-time remapping will indeed require substantial support in the run-time system. We discussed this tradeoff, and the concensus was that users had valid reasons for wanting this capability, therefore it should be in the language. The difficulties with implementation were one reason that REALIGN and REDISTRIBUTE were not put in Subset HPF. In short, you are right that this is legal and hard to implement. You are wrong that it is/should be forbidden. Chuck -------- Nothing recorded from July meeting ===================================================== ===================================================== Item # 28 Larry Meadows 7/25/95 Status: in progress Title: combined directive ordering Group D Updated: 8/14/95 Question: LOST ORIGINAL MESSAGE !!! Ack! Here's what was addressed at the July HPFF meeting. The syntax of combined directives allows: ALIGH WITH A(*,:) :: B(:) Is this a shape-spec-list for array-decl or array-spec for alignment? It is confusing to users. comment from Pres comment from Pres But (!!!) I surely would _not_ want the discussion of the topic that Larry introduces to slip further into more demands for "orderings" of directives and declarations as per the discussion of CCI #12 ======== Notes from July meeting: Proposal: Add syntax rules to allow shape-list in combined directives only with template name and processor names ALIGN (:) WITH A(*,:) :: B The full group recommended that the subgroup come back with a more specific proposal.. ======== proposed text ===== On page 24, make the following changes: Line 6, chage "entity-decl-list" to "hpf-entity-decl-list" After line 14, add: "H303 hpf-entity-decl is hpf-entity [(explicit-shape-spec-list)] H304 hpf-entity is object-name or template-name or processors-name" After line 20, add: "Constraint:: If an explicit-shape-spec-list appears, hpf-entity must be a template-name or processors-name." On lines 34-35, change two occurrences of "object-name" to "hpf- entity" On line 35, before "If both" insert "If an explicit-shape-spec-list appears, hpf-entity has the dimension attribute." ===================================================== ===================================================== Item # 30 Rob Schreiber 8/3/95 Status: new Title: collapsing dimensions Group D Question: Question II. This is really a question for implementors, as much as a question for language lawyers. Consider this program real a(100,200), b(100) !hpf$ distribute a(block, *) !hpf$ distribute b(*) forall (row = 1:100) a(row,:) = f(a(row,*)) ... pure function f(x) real x(:) real f(size(x)) !hpf$ distribute *(*) :: x !hpf$ align f(:) with x(:) I cannot find any rule against this. (The issue is whether one may distribute an r dimensional object with fewer than r instances of BLOCK or CYCLIC(k) in its dist format list.) What is the mapping of b? And should one be allowed to describe the mapping of x in this manner, or must one use the more cumbersome and specific:pure function f(x, row) integer row real x(:) real f(size(x)) !hpf$ template t(100,200) !hpf$ align f(:) with x(:) !hpf$ align x(:) with t(row, :) !hpf$ distribute t(block, *) ----------- Hello, (from Rob) Pres asked me to amplify my previous CCI request concerning the directive distribute x(*) Here is some additional commentary: The key issue is to let the compiler know what's going on when a subroutine is passed an "on one processor only" section of a distributed array; the call site is probably in a forall or independent loop. The obvious syntax is to say, prescriptively: !hpf$ distribute dummy_arg(*) or descriptively: !hpf$ distribute dummy_arg*(*) I was surprised that this is allowed by the HPF syntax: if this distribution is specified by the program for an array, that is not a dummy arg, I don't know what to make of it. Would it mean to replicate the array? To store it on one processor of the compiler's choice? To store it on the "front-end? In shared memory? I think a reasonable proposal would be as follows: -------------------------------------------------------------------------------------- In a (re)distribute directive, the number of non-* (i.e. block and cyclic[(k)]) entries in the dist-format-list must ordinarily be at least one, and must be the same as the rank of the processors arrangement in the ONTO clause, if present. If, however, the distributee is a dummy argument, then, if the distribute directive is descriptive, the requirement of at least one non-* entry in the dist-format-list is waived. Thus real dummy(:,:) !hpf$ distribute dummy *(*,*) Is valid for a dummy argument; it asserts that the actual argument will be distributed on a single processor. (continued) (continued clarification) -------------------------------------------------------------------------------------- ((continued clarification) -------------------------------------------------------------------------------------- (Advise to language designers:) It's quite likely that a section of a processors arrangement will be allowed in the ONTO clause of (re)distribute. In that case, one could also use the following subroutine act_on_local_info(dummy, iproc) real dummy(:,:) !hpf$ processors all_procs(number_of_processors()) !hpf$ distribute dummy *(*,*) onto all_procs(iproc) This would be appropriate in the following contexts: program main real actual_2d(8,16), actual_wide_2d(8,32), actual_3d(8, 16, 10) $hpf$ processors procs(8) !hpf$ distribute (*, block) onto procs :: actual_2d, actual_wide_2d !hpf$ distribute (*, block, *) onto procs :: actual_2d, actual_wide_2d !hpf$ independent do j = 1, 1 call act_on_local_info( actual_2d(:, j:j), (j+1)/2 ) ! dummy shape is (8,1) call act_on_local_info( actual_wide_2d(:, 2*j-1:2*j), (j+1)/2 ) ! dummy shape is (8,2) call act_on_local_info( actual_3d(:, j, :), (j+1)/2 ) ! dummy shape is (8,10) enddo -------------------------------------------------------------------------------------- The alternative to this, as far as I can tell, is to make the programmer align the dummy to a template, as follows: subroutine act_on_located_info(dummy, iproc) real dummy(:,:) !hpf$ processors all_procs(number_of_processors()) !hpf$ template, distribute onto all_procs :: all_temp(number_of_processors()) !hpf$ align *(*,*) with all_temp(iproc) :: dummy program main real actual_2d(8,16), actual_wide_2d(8,32), actual_3d(8, 16, 10) $hpf$ processors procs(8) !hpf$ distribute (*, block) onto procs :: actual_2d, actual_wide_2d !hpf$ distribute (*, block, *) onto procs :: actual_2d, actual_wide_2d !hpf$ independent do j = 1, 16 call act_on_located_info( actual_2d(:, j:j), (j+1)/2 ) ! dummy shape is (8,1) call act_on_located_info( actual_wide_2d(:, 2*j-1:2*j), (j+1)/2 ) ! dummy shape is (8,2) call act_on_located_info( actual_3d(:, j, :), (j+1)/2 ) ! dummy shape is (8,10) enddo -- Rob ===================================================== ===================================================== Item # 32 Henry Zongaro 8/31/95 Status: new Title: Changing distribution of SAVE array Group D Question: Hello, Page 41, lines 21-34 of the HPF 1.1 document specifies that an array or template must not be distributed on a processor arrangement at the time the arrangement becomes undefined, unless the array or template also becomes undefined or the processor arrangement always has identical bounds. Presumably this was done so that objects with the SAVE attribute would not change mappings from one call to the next. Is this rule sufficiently strict? Consider the following: program p call sub(5) call sub(10) end program p subroutine sub(n) integer, save :: a(10) !hpf$ processors proc(2) !hpf$ distribute a(block(n)) onto proc end subroutine sub In the first call to sub, a is distributed block(5); in the second call, it is distributed block(10). This is currently permitted because the bounds of the processor arrangement have not changed. More complicated examples could be drawn involving ALIGN. Similar text appears on page 44, line 43 - page 45, line 3 for templates. Comments from Rob 9/1/95 We should tighten the language to prevent this. It's a way to remap, which should be done only if the object remapped has the dynamic attribute, and only via executable directives. I believe this applies to objects in modules, for example: module mod real a(10) end module mod subroutine sub(n) use mod !hpf$ processors procs(2) !hpf$ distribute a(block(n)) onto procs end subroutine sub This should be proscribed; if redistribute is used and if A is dynamic, however, I think it's legal. Right? ===================================================== ===================================================== Item # 34 Adriaan Joubert 9/06/95 Status: new Title: Alignment of a single dimension Group D Question: Hello, I am trying to align one dimension of two 2-dimensional arrays, and cannot find a way of expressing this in HPF. The problem is the following PROGRAM MAIN REAL, ALLOCATABLE :: A(:,:), Res(:,:) !HPF$ DISTRIBUTE A(BLOCK,*) !HPF$ DISTRIBUTE Res(BLOCK,*) ... ALLOCATE(A(N,M)) ALLOCATE(Res(N,M*2)) Res = SUB (A) ... CONTAINS FUNCTION SUB (A) RESULT(B) REAL, INTENT(in) :: A(:,:) !HPF$ DISTRIBUTE *(BLOCK,*) :: A REAL :: B(SIZE(A,1),SIZE(A,2)*2) !HPF$ DISTRIBUTE (BLOCK,*) :: B ... END FUNCTION SUB END PROGRAM MAIN So the 1st dimension of B and A in the subroutine will be distributed in the same way. It seems however that compilers can generate faster code, if they know how arrays are aligned with one another. Among other things the compiler would have to know that B is exactly aligned with Res. In other words I would like to add to the main program something like !HPF$ TEMPLATE :: MYTemp(N,M*2) !HPF$ DISTRIBUTE MyTemp(BLOCK,*) !HPF$ ALIGN WITH MyTemp(:,I) :: A(:,I) !HPF$ ALIGN WITH MyTemp :: Res and in the subroutine !HPF$ TEMPLATE :: MYTemp(SIZE(A,1),SIZE(A,2)*2) !HPF$ DISTRIBUTE MyTemp(BLOCK,*) !HPF$ ALIGN WITH *MyTemp(:,I) :: A(:,I) !HPF$ ALIGN WITH MyTemp :: B and then the compiler should be able to figure out that everything is nicely aligned. But I cannot do this, as I do not know N and M at compile time. In this case I could probably get away with definitions !HPF$ ALIGN WITH A(:,*) :: Res(:,*) in the main program and !HPF$ ALIGN WITH A(:,*) :: B(:,*) in the subroutine. The replication of the second dimension would not matter, as it is all on the same processor. But if I have a (BLOCK,BLOCK) distribution for both arrays, but I still want to ensure that all elements in the first section of every column are on the same row of processors, i.e. REAL A(20,20), B(20,40) P1: A(1:10,1:10) P2: A(1:10,11:20) B(1:10,1:20) B(1:10,21:40) P3: A(11:20,1:10) P4: A(11:20,11:20) B(11:20,1:20) B(11:20,21:40) there seems to be no way of telling the compiler about this with a descriptive statement. (continued under discussion) (question continued) Well, what about !HPF$ ALIGN WITH A(:,I) :: B(:,I*2-1) !HPF$ ALIGN WITH A(:,I) :: B(:,I*2) But is this legal? And this can be harder to do if the second dimension is (SIZE(A,2)-1)*SIZE(A,2)/2, as in my case. I'd appreciate any help on this one. Understanding the distribution directives seems to get harder the more you know, instead of easier ;-( Adriaan ========= Rob comments 9/6/95 I see you want allocatable templates, a hole in HPF that we know about. But in your case, no template is needed. There is no reason to specify replicationin your alignment. You can use !HPF$ ALIGN A(I,J) WITH RES(I,J) and drop the distribute directive for A in the main program. In the function, you can use a distribute on B and a descriptive alignment of A to B. ..... ..."But is this legal? And this can be harder to do if the second dimension is (SIZE(A,2)-1)*SIZE(A,2)/2, as in my case." ... No, it's not legal. Align is not a symmetric relation, and the alignment map cannot be many to one unless it collapses a dimension; at least that's my understanding. In case this is not clear, your statement is equivalent to !HPF$ ALIGN B(:,2*I-1) WITH A(:,I) which only explicitly aligns the odd numbered columns of B! But why align the bigger of the two arrays (B) with the smaller (A)? My version of SUB would be: FUNCTION SUB (A) RESULT(B) REAL, INTENT(in) :: A(:,:) REAL :: B(SIZE(A,1),SIZE(A,2)*2) !HPF$ ALIGN *(I,J) WITH B(I,J) :: A !HPF$ DISTRIBUTE B (BLOCK,*) ...."I'd appreciate any help on this one. Understanding the distribution directives seems to get harder the more you know, instead of easier ;-( "... Yup. -- Rob ======= >From Adam M. 9/5/95 As I understand it Adriaan Joubert wants to tell HPF to set the data up as follows: REAL A(20,20), B(20,40) P1: A(1:10,1:10) P2: A(1:10,11:20) B(1:10,1:20) B(1:10,21:40) P3: A(11:20,1:10) P4: A(11:20,11:20) B(11:20,1:20) B(11:20,21:40) I would say (like Rob Schreiber) that you include, in Main, the line !HPF$ ALIGN A(I,J) WITH B(I,J*2-1) or equivalently !HPF$ ALIGN A(:,J) WITH B(:,J*2-1) PLUS (distribute them together) !HPF$ DISTRIBUTE B (BLOCK,BLOCK) And I would claim that the procedure should be coded as follows: FUNCTION SUB (A) RESULT(B) REAL, INTENT(in) :: A(:,:) REAL :: B(SIZE(A,1),SIZE(A,2)*2) !HPF$ ALIGN *(I,J) WITH B(I,J*2-1) :: A !HPF$ DISTRIBUTE *B (BLOCK,BLOCK) If I am wrong - why? - Adam ===================================================== ===================================================== ********************************************************* GROUP E ACTIVE ITEMS and see also Item #29 in Group C ******************************************************** Item # 31 Rob Schreiber 8/4/95 Status: new Title: number of processors Group E Question: We had a recent discussion of the question "What should number_of_processors()" return when the machine consists of . Let's say the machine is 2 SMPs (what does SMP stand for?) with 4 processors and a common memory, each. In my view, the "right" answer in this case is probably 8. Not 2. Not (/2, 4/). That's because I want the following code real x(1000, number_of_processors()) !hpf$ independent do i = 1, number_of_processors() call embarrassingly_parallel( x(:, i) ) enddo to run in parallel, with one iteration per processor, on all the available processors. ===================================================== ===================================================== Item # 33 Henry Zongaro 8/31/95 Status: new Title: Query of function result distribution Group E A second question I have relates to CCI item #20. The HPF_LOCAL_LIBRARY procedures all specify that the ARRAY argument must be a local dummy argument associated with a global HPF actual argument. It would seem to me to be desirable to be able to call these routines with a function result as well. For example, program p interface extrinsic(hpf_local) function f() integer :: f(100) !hpf$ processors proc(number_of_processors()) !hpf$ distribute f(blohanks, Henry end function f end interface end program p extrinsic(hpf_local) function f() integer :: f(100/number_of_processors()), g_index ! Would be nice to be able to call local_to_global to determine ! which elements of local result correspond to which elements ! of global result call local_to_global(f, (/1/), l_index) end function f Any opinions? Thanks, Henry Comment from Rob 9/1/95 Sure. ===================================================== ===================================================== PREVIOUSLY ANSWERED ITEM REFERRED TO IN CCI 33 Item # 20 Henry Zongaro 05/04/95 Status: waiting for text Action: needs more text in document Title: Declaring array-valued functions Group E Updated: 7/13/95 Question: 2) This second question is one that Rich Shapiro was asking towards the end of the October, 94 HPFF meeting. Consider the following program. program prog interface extrinsic(hpf_local) function f() integer f(10) !hpf$ processors p(number_of_processors()) !hpf$ distribute f(block) onto p end function f end interface print *, f() end program prog extrinsic(hpf_local) function f() integer f(?) end function f How should f be declared in the local subroutine? If this program runs on a number of processors which divides 10, the answer is simple. But what if the program is run on four processors? Three processors have three elements each, but the fourth has one element. There are ways in which you could specify this. However, there's no way for the user to know *which* processor is the one which will have one element of the result variable. Dummy array arguments of HPF_LOCAL procedures don't have this problem, because they must be assumed shape. Perhaps function results have to be distributed in such a way that the same number of elements of the result will be mapped onto all processors. A related problem: how can an array valued function be specified in an HPF_LOCAL module? extrinsic(hpf_local) module mod contains function f() integer f(?) !hpf$ processors p(number_of_processors()) !hpf$ distribute f(block) onto p end function f end module mod The declaration of f here has to do double duty. It has to provide the shape of the result of f for any reference to the function, and it has describe the result variable mapped onto a single particular processor. Can array valued functions be permitted to appear in HPF_LOCAL modules? Thanks, Henry Subgroup report: - only thing group e could com up with was that: 7 a "local" function can only return a scalar 7 the corresponding extrinsic function return value must be either a scalar or a rank one array of size number_of_processors() 7 Anything more general would not be cci ... and what to do is not obvious. ==== CCI #20 - this poses a non-trivial problem about how the size of a function result is known and / or declared. Fortran 90 doesn't have any way of saying that the result is "assumed shape/size". After extended discussion, the subgroup decided to propose that a local function can only return a scalar, and that the corresponding extrinsic function return value must be either a scalar or an array that is made up from the scalars returned by the locals. In full group discussion, Guy pointed out that it is the DISTRIBUTE directive that is the problem - it lies. There was a lot of discussion, with a basic trend that there should be no change, but rather explanatory text in the document. To bring the issue to close there was a vote on the proposal for no change, but with extra text - (choosing whatever reason among several that the individual liked best for no change)/ This passed 14 - 1 - 6.