Preliminary Proposal to provide support for OOC arrays in HPF 

Alok Choudhary, Chuck Koelbel, Ken Kennedy


The following proposal presents a way to support out-of-core array in HPF. 
The main objectives are consistency with the "normal" HPF data mapping 
directives, simplicity, and minimal extensions. 

The following illustrates an example of declaring out-of-core arrays. The 
only addition here is the directive OUT-OF-CORE. (Other directives and 
information will follow later in this writeup) 

!HPF$ TEMPLATE TEMP(100,100)
!HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B)) !HPF$ ALIGN WITH TEMP :: A,B,C
!HPF$ OUT-OF-CORE: TEMP

This directive simply says that if nodes had infinite memory then elements 
of the array will be in the memory of the processor as described by the 
distribution directives. In other words, the directives describe which 
processor's memory will an element be brought into when in-core. This is a 
logical and simple extension of the HPF data mapping directives. 

As usual, directives can be directly applied to arrays (rather than 
through a template).

Restrictions:
1) One cannot align an in-core array with an out-of-core array. E.g., 
if A(10,10) is an OOC array and B(10) is an in-core array, it is not 
permitted to align B with A (or vice-versa) in any form, because not all 
the elements of A may be in-core, and therefore, it is difficult to 
enforce the meaning of align in such cases. 2) Others?


Association of File(s) with Arrays.

If the data for an out-of-core array comes from an input file then the 
file name must be specified. This is different from traditional open and 
read/write of a file because the user does not explicitly access the file.

For this we propose a directive

!HPF$ ASSOCIATE (<fn>, [,other parameters]) :: OOC array name 

e.g.,

!HPF ASSOCIATE (filex) :: A

By default, the storage order in the file is the fortran storage order and 
the element (1,1) corresponds to the first elment of the file.

[,Other parameters] can be used to specify. 1) Storage order
2) starting element of the dataset within the file. 

1) is needed if the storage order is anything other than the default 
column major.

There are two possibilities. Use key words such as row major, column 
major, or chucnks with parameters (such as chunk size in each dimension); 

OR use an explicit order, e.g., (2,1,3) which says that the outermost 
dimension is 2 followed by 1 followed by 3. This may not allow chunking. 

2) Starting element of the dataset is necessary if file contains some 
meta data in the beginning (e.g., a description of storage order from 
which the info in 1 will be derived) of a file, record size etc. 

This would require an open of a file, read of meta data and close of a 
file before the corresponding ASSOCIATE may be executed. 

Note that it is not allowed to have a file open for regular access 
as well as for out-of-core computation because OOC computation provides 
implicit access ot a file and any explicit access presents consistency 
problems.

However, the following is allowed.

OPEN the file
READ/WRITE/INQUIRE to your heart's content CLOSE the file
(somewhere down the call chain) declare the OOC array use the array
RETURN (i.e. exit the scope where the array is declared) now you can 
access the file again
The CLOSE acts as a sync point. Similarly, the array being deallocated 
acts as a sync point.

Note that this is necessary if metadats is contained within the file. 


If an OCC array is used purely for scratch purposes, then it is not 
necessary to associate a file with it. Compiler can choose a name and 
create files in whatever way necessary. HOWEVER, any array with which no 
file name is associated, must not be read before anything is written into 
it (for obvious reasons). In fact, it should be a runtime error if there 
is a part of the OOC array which has not been assigned any values is read. 

ASSOCIATE directive is also an executable (like redistribute) because a 
user may want to process several files using the same array (at different 
points in time, e.g., pipelined computations). At any point, however, only 
one association exists. 


E.G.,

!HPF$ TEMPLATE TEMP(100,100)
!HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B)) !HPF$ ALIGN WITH TEMP :: A,B,C
!HPF$ OUT-OF-CORE: TEMP

DO i=1,numer_of_frames

ASSOCIATE (frame.i) :: A ! Syntax to be decided of course! 
EXTRACT_NICE_FEATURES(A)
END Do

Note one can associate multiple OOC arrays with the same file but the 
consistency (if multiple assignments are done) is user's responsiblity.

TILING


Even when OOC distribution is provided, it may require significant 
compiler analysis to figure out in what order to bring data in to be 
processed. User, based on her knowledge of computation (philosophy of HPF) 
can provide this hint. 

For this purpose, a directive IO_DISTRIBUTE can be used, which essentially 
provides a hint as to how data should be fetched into the memory. 

For example, say each processors "owns" chunk of OOC data after a 
(block,block) distribution. How data is to be fetched from whichin this 
chunk (bunch of rows, bunch of columns, two-dim blocks) can be described 
by IO_DISTRIBUTE.

Note that IO_DISTRIBUTE will also help in reorganizing data into local 
files of each processor wrt to storage order within each file, if the 
implementation chooses to do such a reorganization. 

However, note that block size here would depend on the amount of memory 
(tile size) available (and not on the NOP). The syntax for this size 
specification within IO_DISTRIBUTE needs to be resolved. 

One way is to do it explicitly; e.g., IO_DISTRIBUTE A(5,10), meaning 
blocks of 5X10 elements...

But I believe this is a detail.
---------------------------------------------------------------- 

The above discussion is based on the following questions. Some of them are 
not answered (in terms of a concrete proposal) 

1. Type of array: is the given array in-core or out-of-core? What is the 
distribution of the array?
2. If out-of-core, is there a corresponding file? 3. If there exists a 
file, is the file persistent (input/output) or temporary?
4. Information about array mapping
1. Multiple arrays mapped to the same file 2. Multiple files mapped to the 
same array 5. Information about File Mapping
1. File Ordering
2. File Distribution Information
6. Hints about tiling
1. Available Memory
2. Tiling Parameters

* Execution Model
Compiler has to decide underlying execution model. Two possible models are
1. Local Placement Model (OOC data in local space) 2. Global Placement 
Model (OOC data in Shared Space) 


--------------------------------------------------------------------------------
[A related message from Rob Schreiber]


        We would like to start a discussion on this.

        thanks
        Alok, Chuck, Ken

Okay.   Here goes.


        The following proposal presents a way to support out-of-core array in HPF.
        The main objectives are consistency with the "normal" HPF data mapping
        directives, simplicity, and minimal extensions.

        The following illustrates an example of declaring out-of-core arrays.
        The only addition here is the directive OUT-OF-CORE. (Other directives
        and information will follow later in this writeup)

        !HPF$ TEMPLATE TEMP(100,100)
        !HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B))
        !HPF$ ALIGN WITH TEMP :: A,B,C
        !HPF$ OUT-OF-CORE: TEMP

(and OOC could be an attribute in a combined directive:

!HPF$ TEMPLATE, DISTRIBUTE(BLOCK,*), OUT_OF_CORE :: T

)
My prediction is that for at least N years, all HPF compilers will ignore 
the OOC attribute.  For machines with virtual memory and demand paging, N 
will not fit in a 32 bit integer.

        This directive simply says that if nodes had infinite memory then
        elements of the array will be in the memory of the processor as described
        by the distribution directives. In other words, the directives describe
        which processor's memory will an element be brought into when in-core.
        This is a logical and simple extension of the HPF data mapping directives.

        As usual, directives can be directly applied to arrays (rather than through
        a template).

        Restrictions: 
        1) One cannot align an in-core array with an out-of-core array. E.g.,
           if A(10,10) is an OOC array and B(10) is an in-core array, it is
           not permitted to align B with A (or vice-versa) in any form, because
           not all the elements of A may be in-core, and therefore, it is
           difficult to enforce the meaning of align in such cases.

Since there is no IN_CORE directive, there is no need for a restriction.
Alignment to an OOC template makes an object OOC.
(Principle of the excluded middle:  P .or. .not. P).

        Association of File(s) with Arrays.

        If the data for an out-of-core array comes from an input file then 
        the file name must be specified. This is different from traditional
        open and read/write of a file because the user does not explicitly access the
        file.

        For this we propose a directive

        !HPF$ ASSOCIATE (<fn>, [,other parameters]) :: OOC array name

        e.g., 

        !HPF ASSOCIATE (filex) :: A

        By default, the storage order in the file is the fortran storage order
        and the element (1,1) corresponds to the first elment of the
        file.

This seems to me to be a completely absurd idea.  The whole point of
OOC should be to leave the file format, etc., hidden from the user and
up to the implementation; this freedom is the value of it.  If there is
initial data on a file, then open the file and read it into the array,
as an idiom for the translation of file formats from the world-visibile
fortran file holding the data to the completely internal file holding
the array.

Will you also need to OPEN the file?   If not, how do you get the
information ordinarily provided by the user in the OPEN statement?
How do you handle errors?


        [,Other parameters] can be used to specify.
        1) Storage order
        2) starting element of the dataset within the file.

        1) is needed if the storage order is anything other than the default
           column major.

           There are two possibilities. Use key words such as row major, column
        major, or chucnks with parameters (such as chunk size in each dimension);

         OR use an explicit order, e.g., (2,1,3) which says that the outermost
        dimension is 2 followed by 1 followed by 3. This may not allow chunking.

        2) Starting element of the dataset is necessary if file contains some
           meta data in the beginning (e.g., a description of storage order from
           which the info in 1 will be derived) of a file, record size etc.

           This would require an open of a file, read of meta data and close
           of a file before the corresponding ASSOCIATE may be executed.

        Note that it is not allowed to have a file open for regular access
         as well as for out-of-core computation because OOC computation
         provides implicit access ot a file and any explicit access
        presents consistency problems.

What do you mean "have a file open for OOC access"?   Does ASSOCIATE
implicitly do this to the file?  If so, you are implying that the file
is used to hold the array, and is possibly modified if the array is
modified?   So it contains more than the initial data?

If so, this is really screwy.   For efficiency, the file(s) that hold
an OOC array should be
a) local to a processor or an SMP pmultiprocessor;
b) unformatted
c) direct access
d) with medium-sized records, whose size is implementation dependent.

So these notions of row-major, etc, are silly.   They're at the wrong
conceptual level, like putting Orthogonal Recursive Bisection into the
language (as a distribution function).  Just let the user read his
initial data into the array using fortran I/O!

Do you know of an OS that asks the user how it should implement
demand paging?   What page size, page replacement strategy, ... ?
The whole point should be to make this work transparently.


        However, the following is allowed.

                OPEN the file
                READ/WRITE/INQUIRE to your heart's content
                CLOSE the file
                (somewhere down the call chain) declare the OOC array
                use the array
                RETURN (i.e. exit the scope where the array is declared)
                now you can access the file again
        The CLOSE acts as a sync point.  Similarly, the array being
        deallocated acts as a sync point.

        Note that this is necessary if metadats is contained within the file.

So it seems that OOC arrays can be persistent files.   In that case, they must be
actual FORTRAN files.   So you lose many possible advantages; the system cannot do
anything that is "not Fortran".

        If an OCC array is used purely for scratch purposes, then it is not
        necessary to associate a file with it. Compiler can choose
        a name and create files in whatever way necessary. HOWEVER,
        any array with which no file name is associated, must not be
        read before anything is written into it (for obvious reasons).
        In fact, it should be a runtime error if there is a part of the
        OOC array which has not been assigned any values is read.

Why isnt it just uninitialized data?

        ASSOCIATE directive is also an executable (like redistribute) because
        a user may want to process several files using the same array
        (at different points in time, e.g., pipelined computations).
        At any point, however, only one association exists. 


        E.G., 

        !HPF$ TEMPLATE TEMP(100,100)
        !HPF$ DISTRIBUTE TEMP(CYCLIC(B),CYCLIC(B))
        !HPF$ ALIGN WITH TEMP :: A,B,C
        !HPF$ OUT-OF-CORE: TEMP

         DO i=1,numer_of_frames

            ASSOCIATE (frame.i) :: A ! Syntax to be decided of course!
            EXTRACT_NICE_FEATURES(A)
         END Do

        Note one can associate multiple OOC arrays with the same file but
        the consistency (if multiple assignments are done) is user's
        responsiblity.

At the same time?  Hey, I see it now.   ASSOCIATE is just EQUIVALENCE
for files!

But it's executable!   Cool!   Hey, why not just make HPF an
interpreted language?


                                TILING


        Even when OOC distribution is provided, it may require significant
        compiler analysis to figure out in what order to bring data in
        to be processed. User, based on her knowledge of computation
        (philosophy of HPF) can provide this hint.

        For this purpose, a directive IO_DISTRIBUTE can be used, which essentially
        provides a hint as to how data should be fetched into the memory.

        For example, say each processors "owns" chunk of OOC data after
        a (block,block) distribution. How data is to be fetched from whichin this
        chunk (bunch of rows, bunch of columns, two-dim blocks) can be
        described by IO_DISTRIBUTE.

        Note that IO_DISTRIBUTE will also help in reorganizing data into local
        files of each processor wrt to storage order within each file, if the
        implementation chooses to do such a reorganization.

        However, note that block size here would depend on the amount of memory
        (tile size) available (and not on the NOP). The syntax for this
        size specification within IO_DISTRIBUTE needs to be resolved.

        One way is to do it explicitly; e.g., IO_DISTRIBUTE A(5,10), meaning
        blocks of 5X10 elements...

        But I believe this is a detail.
        ----------------------------------------------------------------

        The above discussion is based on the following questions. Some of them
        are not answered (in terms of a concrete proposal)

(My opinions):

         1. Type of array: is the given array in-core or out-of-core? What is
        the distribution of the array?
OOC is an attribute of templates and the arrays aligned to them.
         2. If out-of-core, is there a corresponding file?
Not visible.
         3. If there exists a file, is the file persistent (input/output) or
        temporary?
temporary.
         4. Information about array mapping
            1. Multiple arrays mapped to the same file
            2. Multiple files mapped to the same array
Neither.
         5. Information about File Mapping
            1. File Ordering
Invisible to the user.
            2. File Distribution Information
The OOC template can have and HPF distribute attribute.
         6. Hints about tiling
            1. Available Memory
            2. Tiling Parameters
Leave it to the compiler for now.  We have no idea how to give useful advice.

        * Execution Model
         Compiler has to decide underlying execution model. Two possible
        models are
          1. Local Placement Model (OOC data in local space)
          2. Global Placement Model (OOC data in Shared Space)
Implementation dependent.


--   Rob

PS -- I have a fortran 77 Out Of Core linear systems solver (using
plain old Fortran I/O) sitting around.  Want to compare
implementations?   I could make a version using an OOC array.

Of course, if there was no paging and VM, this would be worthwhile
in many cases, even if slow.   Except maybe for scientific supercomputing!
I think Cray never needed to add VM and paging because users didn't want
or believe in it.   So why does HPF?