Preliminary Proposal to provide support for OOC arrays in HPF Alok Choudhary, Chuck Koelbel, Ken Kennedy The following proposal presents a way to support out-of-core array in HPF. The main objectives are consistency with the "normal" HPF data mapping directives, simplicity, and minimal extensions. The following illustrates an example of declaring out-of-core arrays. The only addition here is the directive OUT-OF-CORE. (Other directives and information will follow later in this writeup) !HPF$ TEMPLATE TEMP(100,100) !HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B)) !HPF$ ALIGN WITH TEMP :: A,B,C !HPF$ OUT-OF-CORE: TEMP This directive simply says that if nodes had infinite memory then elements of the array will be in the memory of the processor as described by the distribution directives. In other words, the directives describe which processor's memory will an element be brought into when in-core. This is a logical and simple extension of the HPF data mapping directives. As usual, directives can be directly applied to arrays (rather than through a template). Restrictions: 1) One cannot align an in-core array with an out-of-core array. E.g., if A(10,10) is an OOC array and B(10) is an in-core array, it is not permitted to align B with A (or vice-versa) in any form, because not all the elements of A may be in-core, and therefore, it is difficult to enforce the meaning of align in such cases. 2) Others? Association of File(s) with Arrays. If the data for an out-of-core array comes from an input file then the file name must be specified. This is different from traditional open and read/write of a file because the user does not explicitly access the file. For this we propose a directive !HPF$ ASSOCIATE (, [,other parameters]) :: OOC array name e.g., !HPF ASSOCIATE (filex) :: A By default, the storage order in the file is the fortran storage order and the element (1,1) corresponds to the first elment of the file. [,Other parameters] can be used to specify. 1) Storage order 2) starting element of the dataset within the file. 1) is needed if the storage order is anything other than the default column major. There are two possibilities. Use key words such as row major, column major, or chucnks with parameters (such as chunk size in each dimension); OR use an explicit order, e.g., (2,1,3) which says that the outermost dimension is 2 followed by 1 followed by 3. This may not allow chunking. 2) Starting element of the dataset is necessary if file contains some meta data in the beginning (e.g., a description of storage order from which the info in 1 will be derived) of a file, record size etc. This would require an open of a file, read of meta data and close of a file before the corresponding ASSOCIATE may be executed. Note that it is not allowed to have a file open for regular access as well as for out-of-core computation because OOC computation provides implicit access ot a file and any explicit access presents consistency problems. However, the following is allowed. OPEN the file READ/WRITE/INQUIRE to your heart's content CLOSE the file (somewhere down the call chain) declare the OOC array use the array RETURN (i.e. exit the scope where the array is declared) now you can access the file again The CLOSE acts as a sync point. Similarly, the array being deallocated acts as a sync point. Note that this is necessary if metadats is contained within the file. If an OCC array is used purely for scratch purposes, then it is not necessary to associate a file with it. Compiler can choose a name and create files in whatever way necessary. HOWEVER, any array with which no file name is associated, must not be read before anything is written into it (for obvious reasons). In fact, it should be a runtime error if there is a part of the OOC array which has not been assigned any values is read. ASSOCIATE directive is also an executable (like redistribute) because a user may want to process several files using the same array (at different points in time, e.g., pipelined computations). At any point, however, only one association exists. E.G., !HPF$ TEMPLATE TEMP(100,100) !HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B)) !HPF$ ALIGN WITH TEMP :: A,B,C !HPF$ OUT-OF-CORE: TEMP DO i=1,numer_of_frames ASSOCIATE (frame.i) :: A ! Syntax to be decided of course! EXTRACT_NICE_FEATURES(A) END Do Note one can associate multiple OOC arrays with the same file but the consistency (if multiple assignments are done) is user's responsiblity. TILING Even when OOC distribution is provided, it may require significant compiler analysis to figure out in what order to bring data in to be processed. User, based on her knowledge of computation (philosophy of HPF) can provide this hint. For this purpose, a directive IO_DISTRIBUTE can be used, which essentially provides a hint as to how data should be fetched into the memory. For example, say each processors "owns" chunk of OOC data after a (block,block) distribution. How data is to be fetched from whichin this chunk (bunch of rows, bunch of columns, two-dim blocks) can be described by IO_DISTRIBUTE. Note that IO_DISTRIBUTE will also help in reorganizing data into local files of each processor wrt to storage order within each file, if the implementation chooses to do such a reorganization. However, note that block size here would depend on the amount of memory (tile size) available (and not on the NOP). The syntax for this size specification within IO_DISTRIBUTE needs to be resolved. One way is to do it explicitly; e.g., IO_DISTRIBUTE A(5,10), meaning blocks of 5X10 elements... But I believe this is a detail. ---------------------------------------------------------------- The above discussion is based on the following questions. Some of them are not answered (in terms of a concrete proposal) 1. Type of array: is the given array in-core or out-of-core? What is the distribution of the array? 2. If out-of-core, is there a corresponding file? 3. If there exists a file, is the file persistent (input/output) or temporary? 4. Information about array mapping 1. Multiple arrays mapped to the same file 2. Multiple files mapped to the same array 5. Information about File Mapping 1. File Ordering 2. File Distribution Information 6. Hints about tiling 1. Available Memory 2. Tiling Parameters * Execution Model Compiler has to decide underlying execution model. Two possible models are 1. Local Placement Model (OOC data in local space) 2. Global Placement Model (OOC data in Shared Space) -------------------------------------------------------------------------------- [A related message from Rob Schreiber] We would like to start a discussion on this. thanks Alok, Chuck, Ken Okay. Here goes. The following proposal presents a way to support out-of-core array in HPF. The main objectives are consistency with the "normal" HPF data mapping directives, simplicity, and minimal extensions. The following illustrates an example of declaring out-of-core arrays. The only addition here is the directive OUT-OF-CORE. (Other directives and information will follow later in this writeup) !HPF$ TEMPLATE TEMP(100,100) !HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B)) !HPF$ ALIGN WITH TEMP :: A,B,C !HPF$ OUT-OF-CORE: TEMP (and OOC could be an attribute in a combined directive: !HPF$ TEMPLATE, DISTRIBUTE(BLOCK,*), OUT_OF_CORE :: T ) My prediction is that for at least N years, all HPF compilers will ignore the OOC attribute. For machines with virtual memory and demand paging, N will not fit in a 32 bit integer. This directive simply says that if nodes had infinite memory then elements of the array will be in the memory of the processor as described by the distribution directives. In other words, the directives describe which processor's memory will an element be brought into when in-core. This is a logical and simple extension of the HPF data mapping directives. As usual, directives can be directly applied to arrays (rather than through a template). Restrictions: 1) One cannot align an in-core array with an out-of-core array. E.g., if A(10,10) is an OOC array and B(10) is an in-core array, it is not permitted to align B with A (or vice-versa) in any form, because not all the elements of A may be in-core, and therefore, it is difficult to enforce the meaning of align in such cases. Since there is no IN_CORE directive, there is no need for a restriction. Alignment to an OOC template makes an object OOC. (Principle of the excluded middle: P .or. .not. P). Association of File(s) with Arrays. If the data for an out-of-core array comes from an input file then the file name must be specified. This is different from traditional open and read/write of a file because the user does not explicitly access the file. For this we propose a directive !HPF$ ASSOCIATE (, [,other parameters]) :: OOC array name e.g., !HPF ASSOCIATE (filex) :: A By default, the storage order in the file is the fortran storage order and the element (1,1) corresponds to the first elment of the file. This seems to me to be a completely absurd idea. The whole point of OOC should be to leave the file format, etc., hidden from the user and up to the implementation; this freedom is the value of it. If there is initial data on a file, then open the file and read it into the array, as an idiom for the translation of file formats from the world-visibile fortran file holding the data to the completely internal file holding the array. Will you also need to OPEN the file? If not, how do you get the information ordinarily provided by the user in the OPEN statement? How do you handle errors? [,Other parameters] can be used to specify. 1) Storage order 2) starting element of the dataset within the file. 1) is needed if the storage order is anything other than the default column major. There are two possibilities. Use key words such as row major, column major, or chucnks with parameters (such as chunk size in each dimension); OR use an explicit order, e.g., (2,1,3) which says that the outermost dimension is 2 followed by 1 followed by 3. This may not allow chunking. 2) Starting element of the dataset is necessary if file contains some meta data in the beginning (e.g., a description of storage order from which the info in 1 will be derived) of a file, record size etc. This would require an open of a file, read of meta data and close of a file before the corresponding ASSOCIATE may be executed. Note that it is not allowed to have a file open for regular access as well as for out-of-core computation because OOC computation provides implicit access ot a file and any explicit access presents consistency problems. What do you mean "have a file open for OOC access"? Does ASSOCIATE implicitly do this to the file? If so, you are implying that the file is used to hold the array, and is possibly modified if the array is modified? So it contains more than the initial data? If so, this is really screwy. For efficiency, the file(s) that hold an OOC array should be a) local to a processor or an SMP pmultiprocessor; b) unformatted c) direct access d) with medium-sized records, whose size is implementation dependent. So these notions of row-major, etc, are silly. They're at the wrong conceptual level, like putting Orthogonal Recursive Bisection into the language (as a distribution function). Just let the user read his initial data into the array using fortran I/O! Do you know of an OS that asks the user how it should implement demand paging? What page size, page replacement strategy, ... ? The whole point should be to make this work transparently. However, the following is allowed. OPEN the file READ/WRITE/INQUIRE to your heart's content CLOSE the file (somewhere down the call chain) declare the OOC array use the array RETURN (i.e. exit the scope where the array is declared) now you can access the file again The CLOSE acts as a sync point. Similarly, the array being deallocated acts as a sync point. Note that this is necessary if metadats is contained within the file. So it seems that OOC arrays can be persistent files. In that case, they must be actual FORTRAN files. So you lose many possible advantages; the system cannot do anything that is "not Fortran". If an OCC array is used purely for scratch purposes, then it is not necessary to associate a file with it. Compiler can choose a name and create files in whatever way necessary. HOWEVER, any array with which no file name is associated, must not be read before anything is written into it (for obvious reasons). In fact, it should be a runtime error if there is a part of the OOC array which has not been assigned any values is read. Why isnt it just uninitialized data? ASSOCIATE directive is also an executable (like redistribute) because a user may want to process several files using the same array (at different points in time, e.g., pipelined computations). At any point, however, only one association exists. E.G., !HPF$ TEMPLATE TEMP(100,100) !HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B)) !HPF$ ALIGN WITH TEMP :: A,B,C !HPF$ OUT-OF-CORE: TEMP DO i=1,numer_of_frames ASSOCIATE (frame.i) :: A ! Syntax to be decided of course! EXTRACT_NICE_FEATURES(A) END Do Note one can associate multiple OOC arrays with the same file but the consistency (if multiple assignments are done) is user's responsiblity. At the same time? Hey, I see it now. ASSOCIATE is just EQUIVALENCE for files! But it's executable! Cool! Hey, why not just make HPF an interpreted language? TILING Even when OOC distribution is provided, it may require significant compiler analysis to figure out in what order to bring data in to be processed. User, based on her knowledge of computation (philosophy of HPF) can provide this hint. For this purpose, a directive IO_DISTRIBUTE can be used, which essentially provides a hint as to how data should be fetched into the memory. For example, say each processors "owns" chunk of OOC data after a (block,block) distribution. How data is to be fetched from whichin this chunk (bunch of rows, bunch of columns, two-dim blocks) can be described by IO_DISTRIBUTE. Note that IO_DISTRIBUTE will also help in reorganizing data into local files of each processor wrt to storage order within each file, if the implementation chooses to do such a reorganization. However, note that block size here would depend on the amount of memory (tile size) available (and not on the NOP). The syntax for this size specification within IO_DISTRIBUTE needs to be resolved. One way is to do it explicitly; e.g., IO_DISTRIBUTE A(5,10), meaning blocks of 5X10 elements... But I believe this is a detail. ---------------------------------------------------------------- The above discussion is based on the following questions. Some of them are not answered (in terms of a concrete proposal) (My opinions): 1. Type of array: is the given array in-core or out-of-core? What is the distribution of the array? OOC is an attribute of templates and the arrays aligned to them. 2. If out-of-core, is there a corresponding file? Not visible. 3. If there exists a file, is the file persistent (input/output) or temporary? temporary. 4. Information about array mapping 1. Multiple arrays mapped to the same file 2. Multiple files mapped to the same array Neither. 5. Information about File Mapping 1. File Ordering Invisible to the user. 2. File Distribution Information The OOC template can have and HPF distribute attribute. 6. Hints about tiling 1. Available Memory 2. Tiling Parameters Leave it to the compiler for now. We have no idea how to give useful advice. * Execution Model Compiler has to decide underlying execution model. Two possible models are 1. Local Placement Model (OOC data in local space) 2. Global Placement Model (OOC data in Shared Space) Implementation dependent. -- Rob PS -- I have a fortran 77 Out Of Core linear systems solver (using plain old Fortran I/O) sitting around. Want to compare implementations? I could make a version using an OOC array. Of course, if there was no paging and VM, this would be worthwhile in many cases, even if slow. Except maybe for scientific supercomputing! I think Cray never needed to add VM and paging because users didn't want or believe in it. So why does HPF?