The following proposal presents a way to support out-of-core array in HPF. The main objectives are consistency with the "normal" HPF data mapping directives, simplicity, and minimal extensions. Note these are objectives! The following illustrates an example of declaring out-of-core arrays. The only addition here is the directive OUT-OF-CORE. (Other directives and information will follow later in this writeup) !HPF$ TEMPLATE TEMP(100,100) !HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B)) !HPF$ ALIGN WITH TEMP :: A,B,C !HPF$ OUT-OF-CORE: A () !HPF$ OUT_OF_CORE: B This directive simply says that if nodes had infinite memory then elements of the array will be in the memory of the processor as described by the distribution directives. In other words, the directives describe which processor's memory will an element be brought into when in-core. This is a logical and simple extension of the HPF data mapping directives. Note: Only arrays are declared Out-of-core and not abstract templates, as templates only represent abstract index space. There is a file name associated with array A within the OOC directive above. That is user's way of specifying a file name to be associated with an array. The properties of the file will be declared through the open statement (described later). For example, !HPF$ OUT-OF-CORE: A ("HUGE3D.dat") means that the user will open a file with the above name before any access to A is permitted. On the other hand, the compiler is free to choose any file name for array B, and compiler is responsible for opening that file(s). Also, array B CANNOT BE PERSISTENT. In other words, the file associated with B is a scratch file. SPECIFICATION of Properties of an OOC array. This is proposed to be done using the OPEN statement. The STATUS field of the OPEN statement can be used to specify the persistence property of the associated array. A file associated with an OOC array is a special type of file, with which certain properties are associated (described later). The following new values for STATUS field are added. OLDPERSISTENT = File already exist. NEWPERSISTENT = File needs to be created and does not already exist. OLDSCRATCH = File exists, but can be deleted after the program finishes. NEWSCRATCH= File needs to be created for this program, but may be deleted after the program finishes. The word "TEMPORARY" may be used instead of scratch. Can be decided after discussion. RESTRICTIONS: 1) Data Files associated with OOC arrays are unformatted only because of performance reasons. 2) Only ACTION = READWRITE is permitted (because an array can be read or written). this raises an interesting possibility. If the user knows that throughout the program if an array is only read or written the other two options of ACTION = READ or WRITE may be permitted. But my initial proposal only allows READWRITE. 3) If the file is OLDPERSISTENT or OLDSCRATCH, there must be another file called .metadata This file contain the information about the data distributions, number of data files, logical processors distribution using which the data files were created, etc. Essentially, all the information that is required to describe and manage distributed arrays (and some more) is required with this metadata file. Hence, files containing persistent arrays are NOT standard FORTRAN Files. Associated with the metadata, are a number of inquiry functions, that allow a program/compiler to inquire about the contents and organization of data within the data file(s). TYPE OF INFORMATION REQUIRED TO BE IN THE METADAT FILE IS DESCRIBED BELOW. Assumptions and background for metadata information and number of files. There could be a) 1 file containing an OOC array or b) number of files containing an OOC array. a) In this case, there are two possibilities. i) The data is organized in global (or undistributed) form in canonical form or in an easily describeable form. That is, one can descrbe the orders of dimensions in which data is stored in the global name space, just like in-memory orders. E.g., row-major, colum-major etc. ii) There is one data file, but the data is stored separately (appended one after another) for each processor of the creating (logical) processor array overwhich the "contained" persistent array was distributed. For example, if the array was created using a 2X2 processor grid, then the signle data file will have four distinct sections, one for each processor (like a map of the local memory of each processor for the arra) appended one after the other. The metadata file will contain a description of bounds of each section, size etc. b) This is like case a ii) above, except, there will be one separate file for each processor participating in the creation of the array (that is the corresponding persistent files). The proposed convention for the file names is as follows (note that the processor grid description will come from the metadat file). E.G., for a two dimensional processor array (say 2x2) that created the persistent array, There will be five files. .metadata .1.1 .1.2 .2.1 .2.2 So, if the orginal file name was HUGEDATA specified by the user in the OOC directive, and status being OLDPERSISTENT, then the system will expect HUGEDATA.metadata HUGEDATA.1.1 HUGEDATA.1.2 HUGEDATA.2.1 HUGEDATA.2.2 Note that each individual file may have metadata in the beginning, but in my opinion that may hinder optimizations because if one has the flexibility to stripe/distribute... organize.. the datafiles, the metadata may come in the way because of different datatypes(within the same file) etc. having metadata separately makes much more sense. Also, since it will be a small file, it can be replicated (cached) on all the nodes. The metadata should contain the following information. 1) Size of metadata file : int 2) Single file in global space, single file in individual proc space, or multiple files : int 3) creating processor arrangement : int[7] 4) distribution information for each dimension 5) local bounds for each dimesion, global bounds for each dimension for each processor 6) data type (this could be record description of each element of the array) 7) order of storage (colum major, row-major ..) 8)..... 9) .... Using this, one should be able to inquire about per processor information as well overall information about a persistent array. The particular names of the inquiry functions need to be developed. ****************** Questions of Particular interests and my opinion on those: Thanks Rob** Q. IS OOC a type parameter? OOC is just a directive, describing potentially a very large array. Things should work whether data fits in memory or not in exactly the same fashion. ** Q. Are OOC arrays arbitrarily mappable? OOC directive does not change the meaning of any mapping directive. So I guess, yes. However, it makes little sense to replicate etc. One may want to put restrictions on those types of things. However, if a user uses OOC, he/she probably knows the tradeoffs. ** Q Must all the intrinsics and library routines accept OOC argument?? Hmmmmmmm... Should in principle, but a big burden!! I dont know. But I should be able to compute minimum, maximum etc for 3D CFD calculations ...... May be one can allow for intrinsics which are simple (e.g., sum, min, max ...) ** Q May an OOC array have the pointer or the TARGET attribute? BIG Hmmmmmmmmmmmm........ We are getting into muddy waters here! Can be very complex? Would anyone like to use such a facility with OOC array???? Pointers on disks (files) and pointers in memory are different things. Something called POINTER SWIZLING is needed to implement this stuff (people do this in persistent objects in C++..), but performance is not the primary concerne there!!!! Q May an OOC be allocatable? Yes, I believe so. There should not be any problem with that as long as before allocating all the other things like open etc are taken care of. Q Dynamic attribute for an OOC array? needs discussion. Is it necessary? ** Q If you tie a file to the OOC array, can it be a sequential file? Any file tied to an OOC array should be a file described above. It is not any regular fortran file. It is required to be UNFORMATTED.