From chk@erato.cs.rice.edu  Tue May  5 15:51:51 1992
Received: from erato.cs.rice.edu by cs.rice.edu (AA09069); Tue, 5 May 92 15:51:51 CDT
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08328); Tue, 5 May 92 15:51:49 CDT
Message-Id: <9205052051.AA08328@erato.cs.rice.edu>
To: hpff-io@erato.cs.rice.edu
Cc: chk@erato.cs.rice.edu
Word-Of-The-Day: subaltern : (n) a person holding a subordinate position
Subject: Welcome to the HPFF I/O mailing list
Date: Tue, 05 May 92 15:51:49 -0500
From: chk@erato.cs.rice.edu


Just a note to tell you that hpff-io@rice.edu is now on the air.  This
is the HPFF I/O and Miscellaneous features subgroup.

						Chuck

From knighten@ssd.intel.com  Tue May  5 16:19:07 1992
Received: from ssd.intel.com by cs.rice.edu (AA10881); Tue, 5 May 92 16:19:07 CDT
Received: from tualatin.SSD.intel.com by ssd.intel.com (4.1/SMI-4.0)
	id AA06567; Tue, 5 May 92 14:17:02 PDT
Date: Tue, 5 May 92 14:17:02 PDT
From: Bob Knighten <knighten@ssd.intel.com>
Message-Id: <9205052117.AA06567@ssd.intel.com>
Received: by tualatin.SSD.intel.com (4.1/SMI-4.0)
	id AA08811; Tue, 5 May 92 14:17:00 PDT
To: hpff-io@cs.rice.edu
Subject: Is there anything to say about I/O in HPF?
Reply-To: knighten@ssd.intel.com (Bob Knighten)

Now that hpff-io is in operation, I would like to start with what seems to be
the fundamental question:  Is there anything to say about I/O in HPF?

At the last meeting there were at least a couple of people who argued that the
only thing to be said about I/O in HPF is that it has the semantics of I/O as
specified in Fortran 90, i.e. there is really nothing to say.  There are
others, and I am one, who feel that to get useful I/O performance it will be
necessary to give hints to the compiler just as we are doing for the
distribution of data.

So . . . what say you?

-- Bob

Robert L. Knighten	             | knighten@ssd.intel.com
Intel Supercomputer Systems Division | 
15201 N.W. Greenbrier Pkwy.	     | (503) 629-4315
Beaverton, Oregon  97006	     | (503) 629-9147 [FAX]

From highnam@slcs.slb.com  Wed May  6 10:07:42 1992
Received: from SLCS.SLB.COM by cs.rice.edu (AA00704); Wed, 6 May 92 10:07:42 CDT
From: highnam@slcs.slb.com
Received: from speedy.SLCS.SLB.COM
	by SLCS.SLB.COM (4.1/SLCS Mailhost 3.13)
	id AA03689; Wed, 6 May 92 10:07:24 CDT
Received: by speedy.SLCS.SLB.COM (4.1/SLCS Subsidiary 1.10)
	id AA01582; Wed, 6 May 92 10:07:24 CDT
Date: Wed, 6 May 92 10:07:24 CDT
Message-Id: <9205061507.AA01582.highnam@speedy.SLCS.SLB.COM>
To: hpff-io@cs.rice.edu
Subject: re: additional I/O functionality 


  Fortran already has a fairly complete set of I/O operations. 
  We shouldn't mess with those operations unless we have a *really* good reason.
  The files that are written should be no different in form from files written
  by a conventional sequential Fortran program.

  Vendors are, of course, free to provide additional I/O libraries as they
  see fit.   (With whatever weird, contorted, fast, scheme they choose.)

  In the minutes of the March HPFF meeting Chuck recorded that I ``opined
  that parallel IO is not needed, users can roll-their-own like they do now''.
  What I actually said is that I don't think that we should add new semantics
  to Fortran for parallel I/O, in part because the issue seems to be highly
  vendor-dependent, and in any case, if extra performance is available, we
  will see I/O libraries.  I could be wrong !


In the language of Arrays, Templates, Distributions, and Processors,
what can be said to assist I/O ?

In terms of device access:

  I see two classes of systems: those in which the I/O capability is
  and is not symmetrically available to all processors (small p).
  Where ``symmetric'' is used in terms of performance, not functionality.
  A symmetric example: a CM2 system in which the DV is equally visible to
  all processors ``at the other end'' of an I/O bus.  An asymmetric example:
  an iPSC/860 system in which not necessarily all processors have disks.

  Can we provide information to the HPF compiler about I/O characteristics
  of a system that the binary that it creates will run on ?  In current HPF
  terms this should be in terms of specific Processor(s) to processor(s)
  bindings.  If the compile-time hints turn out to be wrong the program shouldn't
  fail..

  Have a class of predefined distribution models that will optimize I/O performance 
  on whatever system they are used ?  (As with default distributions, the exact
  mapping would be left to the implementation.)


In terms of extending the semantics of external storage:

  Local vs global files ?  (See (a)symmetric note above.)

  Files with an implicit distribution ? => not (necessarily) readable
                                           by a non-HPF program

Peter





From pm@icase.edu  Fri May 29 10:08:13 1992
Received: from bonito.icase.edu by cs.rice.edu (AA01075); Fri, 29 May 92 10:08:13 CDT
Received: by bonito.icase.edu (5.65.1/lanleaf2.4.9)
	id AA01097; Fri, 29 May 92 11:08:13 -0400
Message-Id: <9205291508.AA01097@bonito.icase.edu>
Date: Fri, 29 May 92 11:08:13 -0400
From: Piyush Mehrotra <pm@icase.edu>
To: hpff-io@cs.rice.edu
Subject: add



From pm@icase.edu  Wed Jun 24 15:23:58 1992
Received: from seahorse.icase.edu by cs.rice.edu (AA07188); Wed, 24 Jun 92 15:23:58 CDT
Received: by seahorse.icase.edu (5.65.1/lanleaf2.4.9)
	id AA08275; Wed, 24 Jun 92 16:23:53 -0400
Message-Id: <9206242023.AA08275@seahorse.icase.edu>
Date: Wed, 24 Jun 92 16:23:53 -0400
From: Piyush Mehrotra <pm@icase.edu>
To: hpff-io@cs.rice.edu
Subject: I/O - what else


During the discussions in the plenary meetings I have heard the sentiment
that Fortran I/O is good enough - we don't need anything else.
Am I correct in presuming that the semantics of writing out a distributed
array to a file is as follows:
	write the file out in fortran column major serial order
	such that even a sequential fortran program can read it back in


This semantics does not allow the distributed array to be output
keeping the distribution of the array in mind.  For example,
if an array needs to be written out temporarily before being read
in into another with the same distribution, then I would prefer
that each processor write out its local portion of the array
(possibily after serialising it) independently.  This would
most likely be able to exploit multiple I/O controllers and
would also avoid the expense of redistributing the array
before writing it out and after reading it in.
The same situation may arise if I am pssing data between two
HPF programs without ever reading it in a sequential program.

Is this something we should look at and provide for? Actually
I do know of at least one user here on the iPSC who does exactly
what I have described above.

Comments?


	- Piyush



From choudhar@cat.syr.edu  Wed Jun 24 15:29:34 1992
Received: from cat.syr.edu (peach.ece.syr.EDU) by cs.rice.edu (AA07680); Wed, 24 Jun 92 15:29:34 CDT
Date: Wed, 24 Jun 92 16:29:26 EDT
From: choudhar@cat.syr.edu (Alok Choudhary)
Received: by cat.syr.edu (4.1/1.0-6/5/90)
	id AA11407; Wed, 24 Jun 92 16:29:26 EDT
Message-Id: <9206242029.AA11407@cat.syr.edu>
To: pm@icase.edu
Cc: hpff-io@cs.rice.edu
In-Reply-To: Piyush Mehrotra's message of Wed, 24 Jun 92 16:23:53 -0400 <9206242023.AA08275@seahorse.icase.edu>
Subject: I/O - what else




I agree with Piyush. I have given it some thought, but not enough yet
to concretely propose something. But I am in favor of providing
directives to help improve I/O.

Alok


 Alok Choudhary
 Assistant Professor
 ECE Dept., 121 Link Hall
 Syracuse University
 Syracuse, NY 13244
 (315)-443-4280
 Fax: (315)-443-2583
 choudhar@cat.syr.edu

From highnam@slcs.slb.com  Thu Jun 25 00:11:19 1992
Received: from SLCS.SLB.COM by cs.rice.edu (AA16750); Thu, 25 Jun 92 00:11:19 CDT
From: highnam@slcs.slb.com
Received: from speedy.SLCS.SLB.COM
	by SLCS.SLB.COM (4.1/SLCS Mailhost 3.13)
	id AA14681; Thu, 25 Jun 92 00:10:56 CDT
Received: by speedy.SLCS.SLB.COM (4.1/SLCS Subsidiary 1.10)
	id AA25411; Thu, 25 Jun 92 00:10:58 CDT
Date: Thu, 25 Jun 92 00:10:58 CDT
Message-Id: <9206250510.AA25411.highnam@speedy.SLCS.SLB.COM>
To: hpff-io@cs.rice.edu
Subject: re: I/O - what else [new construct?]


 There aren't many things more frustrating than to
 have 18 GBytes of data on a disk subsystem in a format
 known only to God his/her vendor.  This is *only*
 acceptable if all access to that data makes its layout
 appear, transparently, to be that of an ordinary Fortran 
 file.

 Users, strange beasts that they are, like the functionality
 of NFS (if not its performance).  They can access any file
 from any system and (punting binary/ascii and byte ordering
 issues) use what they find, immediately.

 Now, vendors (e.g., Intel with CFS, TMC with its DV, or any
 of the RAID vendors) can, and do, supply fileservers that 
 hide (or can hide) weird physical distributions of bits.  
 And that's part of the vendor's job, not that of a language
 design group.

 F90 has plenty of I/O mechanisms [Marc Snir insinuated "too
 many" in the June mtg].  As far as possible we should stick
 with those mechanisms.  Please note that this view says 
 *nothing* about additional data structures that the file
 system might associate with a given file. I would almost
 expect the HPF runtime system to peek at the mapping of 
 its array [section] argument, and the file system's hidden
 info on a file, to see if it can be smart about a particular
 data transfer.  Perhaps vendors can comment here ?

 Individual vendors are going to provide all manner of slick
 ways to "dump/undump" data to their disk (and other) subsystems.
 These (as I mentioned in an earlier msg) will necessarily be
 vendor-dependent.  I will make use of these mechanisms for
 scratch files.  What we don't want to do in HPF is to give
 vendors a way to get "off the hook" with special-purpose I/O 
 mechanisms.  They have to do the Right Thing; they'll do the
 Other Stuff anyway.

 Peter




From dfk@wildcat.dartmouth.edu  Mon Jun 29 10:54:44 1992
Received: from wildcat.dartmouth.edu by cs.rice.edu (AA27512); Mon, 29 Jun 92 10:54:44 CDT
Received: by wildcat.dartmouth.edu (5.65D1/4.1)
	id AA24694; Mon, 29 Jun 92 11:54:43 -0400
Date: Mon, 29 Jun 92 11:54:43 -0400
From: dfk@wildcat.dartmouth.edu (David Kotz)
Message-Id: <9206291554.AA24694@wildcat.dartmouth.edu>
To: hpff-io@cs.rice.edu
Subject: hello

I just joined the HPFF IO group. I am interested in multiprocessor
file systems, and in particular the operating systems issues
underlying such file systems. But I am also interested in the file
system interface (see my Usenix file systems workshop paper); hence my
interest in this group. Up front, I must say that I am more interested
in the general question of multiprocessor file systems and their
interfaces than in Fortran or HPF directly.

Is there a place where I can quickly read up on the Fortran 90 I/O
interface semantics, which seems to be where we're starting?

Is there any form of I/O proposal for HPF yet?

It appears that people are reluctant to include any new I/O
constructs.  I personally think that the OS can do a lot for parallel
I/O, but I think that trying to write a parallel program to use
parallel I/O using only old-fashioned sequential I/O constructs is
silly. This is why HPF and Fortran 90 are/were created -- to allow
people to express parallelism directly.

If you're interested in what some vendors are doing with OS support
for parallel I/O, including some of the issues already discussed in
this group, see the papers about the new nCUBE file system (Del
Rosario, DeBenedictus, and others). Neat stuff. 

The question about the format of files is an interesting one. A
parallel computer that stores its files in some funny format may have
high performance locally, but not be able to make the file system or
files remotely accessible via, say, NFS. Look at the CM-2 for example.
In this increasingly networked world, this is bad.

I'll have to reread the archives again; there are more comments I'd
like to make.

David Kotz
Asst Prof
Math & CS
Dartmouth College
dfk@cs.dartmouth.edu

From @cunyvm.cuny.edu:SNIR@YKTVMV  Tue Jul  7 09:42:44 1992
Received: from CUNYVM.CUNY.EDU by cs.rice.edu (AA23152); Tue, 7 Jul 92 09:42:44 CDT
Message-Id: <9207071442.AA23152@cs.rice.edu>
Received: from YKTVMV by CUNYVM.CUNY.EDU (IBM VM SMTP V2R2) with BSMTP id 2449;
   Tue, 07 Jul 92 10:42:00 EDT
Date: Tue, 7 Jul 92 10:42:08 EDT
From: "Marc Snir" <SNIR%YKTVMV.bitnet@cunyvm.cuny.edu>
To: hpff-io@cs.rice.edu

%IO.tex
%Snir
\documentstyle[11pt]{article}
\pagestyle{plain}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin  .25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin   -.5in
\textwidth 6in
\textheight  9.0in


\title{Proposal for IO}
\author{Marc Snir}
%\date{   }


\begin{document}

\maketitle

\section{Summary}

This document proposes to expand the mapping notation of HPF to include
files.  We try to achieve two purposes:
\begin{enumerate}
\item
Allow user control of file layout (file striping) across multiple IO devices.
\item
Allow one unformatted data transfer to access data that is stored across
multiple IO devices.
\end{enumerate}

We present in this document two alternative proposals to achieve these goals.
Both proposals allow to map files onto multiple (abstract) storage nodes.

The first proposal uses HPF align and distribute
mapping directives to allow the mapping of
consecutive file records onto nodes -- records are not split.  Regular
data transfers are now used to access these records, with one
extension: an unformatted data transfer may access more than one record.

The second proposal associates files with templates.  The template
specifies how each file record is distributed across storage nodes.
Align directives
can be further used to realign data items when they are transferred in
READ or WRITE statements.

Both proposals preserve the linear structure of files as sequences of
records.   The first proposal also preserves the order of data items on
sequential files, but may effect the way these items are split into records.
If unformatted data transfer operations are extended to allow for multiple
record
accesses, then the other I/O extensions can be ignored without affecting the
semantics of programs.  The second proposal may affect the order data
items are stored within records.

With both proposals, layout dependent internal file representations are likely
to be used, in order to enhance I/O performance.
It is expected that implementers
will provide tools to convert files, either online or offline, so that a file
created with one mapping (e.g. in a sequential program) can be
accessed with any other mapping (e.g., in a parallel program).

The mapping directives for files will
allow to achieve a better layout of files across multiple I/O devices
(file striping) and/or a better distribution of file caches.
Improved I/O rates via parallel I/O happen when the mapping of a
file matches the mapping of data in memory.

The two proposals are not entirely contradictory; it is possible to combine some
elements from both proposals.

Both proposals require the following restriction on Fortran 90 I/O:.
\begin{quote}
All values needed to determine which entity is specified by an item in an
input or output item list need to be
available before the data transfer operation occurs.
\end{quote}
Currently, they only need to be available before the item is processed.
E.g.,
\begin{verbatim}

READ (15) N, X(N)

\end{verbatim}
is legal Fortran 90 but illegal HPF (or would result in the old value of {\tt
N} be used to index into {\tt X}).

\begin{verbatim}

READ (15) N
READ (15) X(N)
\end{verbatim}
is legal Fortran HPF.

\section{First Proposal}

A Fortran file is a sequence of records.  We treat such file as a 1-D array of
records with LB=1 and infinite UB.  This array can be mapped to a (storage) node
arrangement in a manner analogous to the mapping of an array to a
(processor) node
arrangement.  Files are mapped using the same notation as for array
mapping.  The mapping defines a partition of the file, and each part is
associated with one abstract node.

The mapping of a file to a node arrangement can be
interpreted in two ways:
\begin{enumerate}
\item
The nodes may represent
(abstract) independent storage units, each storing a fixed part of the
file.
\item
The nodes may represent (abstract) independent file caches, with a fixed
association of each cache with a part of the file.
\end{enumerate}

In both cases the file is mapped onto
physical I/O devices so as to allow maximal concurrency for accesses
directed to distinct parts of the file.
If the second interpretation is used, then it is meaningful to align arrays
and files onto the same templates.

We introduce a new filemap object.  Filemaps are, essentially, named files.
They appear where an array names would
appear in a array mapping expression.   An actual file is associated with a
FILEMAP in an OPEN statement.  Filemaps are introduced because files are
not first class objects in FORTRAN (files are not declared).   Also, Filemaps
can have rank $>$ 1, giving more flexibility in the types of mappings that can
be specified.

The following diagram illustrates the mapping

\begin{verbatim}

                                   Node           Physical
File      Filemap     Template   arrangement    storage units  (or caches)
 _           _           _           _               _
|_|-------->|_|-------->|_|-------->|_|------------>|_|

     OPEN        ALIGN    DISTRIBUTE   Implementation
                                         Dependent
\end{verbatim}

\subsection{Node Directive}

We suggest to replace the keyword {\tt PROCESSOR} with the keyword {\tt
NODE}, which is more neutral.  Node arrangements (ex processor arrangements) can
be targets both for file mappings and for array mappings.  Some implementations
may disallow the use of the same node arrangement name as a target both for
array mappings and for file mappings.  In such case an {\tt AFFINITY} directive,
that specifies affinity between io nodes and processor nodes, would be useful.
(Such directive would also be useful to specify affinity between nodes of
different arrangements, e.g. nodes in arrangements of different rank.)

The set of allowable
node arrangements that can be used to map files is implementation dependent --
however, a node arrangement with {\tt NUMBER-OF-IONODES} nodes is always
legal.

The mapping of nodes to
physical storage units is implementation dependent.

For example:

\begin{verbatim}

!HPF$ NODE :: D1(2,4), D2(2,2)
      PARAMETER(NOD=NUMBER-OF-IONODES())
!HPF$ NODE, DIMENSION(NOD) :: D3,D4

\end{verbatim}

\subsection{FILEMAP Directive}

A Fortran file is an infinite  one-dimensional array of records, with LB=1.
A filemap can be thought of as an assumed-size array of records.  This array
is associated with (one-dimensional) files,
using storage association rules.  The filemap name is
used to specify a mappings for files.  The association between a filemap name
and an actual file is effected by the OPEN statement.

A FILEMAP directive declares filemap names.  The syntax is

\begin{verbatim}

filemap-directive   is   FILEMAP [::] filemap-name ( assumed-size-spec )
                                      [, filemap-name (assumed-size-spec ) ] ...
               or  FILEMAP, DIMENSION ( assumed-size-spec ) :: filemap-name-list


\end{verbatim}

An {\it assumed-size-spec} is a specification of the form used for assumed sized
arrays:  All dimensions are specified, with the exception of the last, which is
assumed.  In our case, the last dimension is infinite.  Only
initialization expressions may occur  in this specification (including
expressions that depend on {\tt NUMBER-OF-IONODES}).

For example:

\begin{verbatim}

!HPF$ FILEMAP :: F1(2,4,*)
!HPF$ FILEMAP, DIMENSION(2,2,1:*) :: F2,F3

\end{verbatim}

A FILEMAP directive does not allocate space, neither in memory, nor on disk.

\subsection{File mapping}


ALIGN and DISTRIBUTE statements are used to map FILEMAPs onto nodes.
The syntax is identical to the syntax for processor mappings, with one
restriction:
Block distributions cannot be used for the last (infinite) dimension of
the filemap.

For example:

\begin{verbatim}

!HPF$ DISTRIBUTE (CYCLIC,CYCLIC,*) ONTO D2 :: F2,F3
!HPF$ DISTRIBUTE F1(*,BLOCK,CYCLIC(2)) ONTO D1
\end{verbatim}

Assume that {\tt F1, F2} are the filemaps and {\tt D1, D2} are the node
arrangements from the previous examples.

The first distribute statement
specifies the following mapping for successive records of a file associated with
{\tt F2} or {\tt F3}.



\begin{verbatim}

D2(1,1)         D2(1,2)

1 (1,1,1)       3 (1,2,1)
5 (1,1,2)       7 (1,2,2)
.               .
.               .
.               .



D2(2,1)          D2(2,2)

2 (2,1,1)        4 (2,2,1)
6 (2,1,2)        8 (2,2,2)
.                .
.                .

\end{verbatim}



The second distribute statement
specifies the following mapping for successive records of a file associated with
{\tt F1}.





\begin{verbatim}

D1(1,1)           D1(1,2)          D1(1,3)            D1(1,4)

 1 (1,1,1)       17                33                 49
 2 (2,1,1)       .                 .                  .
 3 (1,2,1)       .                 .                  .
 4 (2,2,1)       20                36                 52
 9 (1,1,2)       25                41                 57
10 (2,1,2)       .                 .                  .
11 (1,2,2)                         .                  .
12 (2,2,2)       28                44                 60
65               81                97                113
.                .                 .                  .
.                .                 .                  .



D1(2,1)            D1(2,2)         D1(2,3)            D1(2,4)

 5 (1,3,1)         21              37                 53
 6 (2,3,1)         .               .                  .
 7 (1,4,1)         .               .                  .
 8 (2,4,1)         24              40                 56
13 (1,3,2)         29              45                 61
14 (2,3,2)         .               .                  .
15 (1,4,2)         .               .                  .
16 (2,4,2)         32              48                 64
69                 85             101                117
.                  .               .                  .
.                  .               .                  .

\end{verbatim}

\subsection{OPEN statement}

A new connection specifier of the form {\tt FILEMAP = filemap-name}
associates a mapping
with the opened file.   If the file exists then the mapping must be one of the
mappings allowed for the file.  The set of allowed file mappings for an existing
file is implementation dependent, but always include the mapping under which
the file was created.  More generally, it will include any mapping where the
file is mapped onto the same storage node arrangement, and with the same
allocations of file records to storage nodes
(different mappings may
result in the same allocation of records to storage nodes).
One choice is to allow any mapping,  with possible degraded performance for
ill matched mappings; another choice is to remap an existing
file when it is opened with a new mapping, either offline or online.   Vendors
are expected to provide implementation dependent mechanisms to exercise such
choices.

The default mapping is implementation dependent.

Only external files can be mapped.

\subsection{Data Transfer Statements}

The optional parameter {\tt NUMREC = scalar-int-expr} is added to the {\tt
io-control-spec}.  This argument is valid only for unformatted data transfers,
and specifies the numbers of consecutive records accessed by the data transfer,
starting from the current record, for sequential access, or the record indicated
by the REC parameter, for direct access.  The default value for NUMREC
is 1.  (Alternative choices are 1 or number of storage storage nodes for
variable length records; and ``as many
as needed to match the length of the item list'', for fixed length records.)
Data transfers are executed following the
usual semantics of Fortran I/O, with successive values in the file matched to
successive elements in the input or output item list.  The number of records
specified by NUMREC is accessed.  The rules for padding, for end-of-record
exceptions and for file positioning are extended accordingly.

\subsection{Extensions}

We may want a {\tt REMAP} statement, to be used instead of the sequence
{\tt CLOSE ... OPEN}, in order to associate a new mapping to an existing
file.

We may want to extend the {\tt INQUIRE} statement to return  file mapping
information (alternatively, we may use the same query intrinsics used to query
array partitions).

A new intrinsics of the form {\tt INDEX(filemap-name, list-of-indices)} would be
handy, as it would allow to address random-access files as multidimentional
arrays.  E.g.

\begin{verbatim}

READ (7, REC = INDEX(F1,3,5) ) A
\end{verbatim}

Each data transfer operation specifies
an association between parts of the file and abstract processor nodes
where from (where to) the data in the record is transferred.
We may want to add additional directives to the OPEN statement to indicate that
this association fulfils certain restrictions for as long as the file is open.
\begin{itemize}
\item
Accesses to a file are {\em independent} if, in all data transfers,
each file part is associated with the same processor node.  An
{\tt INDEPENDENT} argument in the OPEN statement may be used to specify this
condition (which simplifies file caching).
\item
A data transfer is {\em aligned} if each file part is associated with a
unique processor node (is not split among two processor nodes).  We may
use an {\tt ALIGNED} argument in the OPEN statement to specify that all
data transfers are aligned.
(INDEPENDENT implies ALIGNED, but not vice versa).
\end{itemize}

\section{Second Proposal}

A Fortran file is considered to be a sequence of records, each partitioned
into {\em fields}.  While a file is open, the partition of records to
fields is specified using a template (one field per template element).  This
template is distributed onto a storage node arrangement.
The set of fields associated with template elements that map onto a node
form a {\em subrecord}.  This subrecord is mapped onto the corresponding
storage node.   Thus, the record is broken into subrecords, one per storage
node, and the subrecords are broken into fields.  The fields within a
subrecord (numbered in column major order) are not necessarily consecutive;
different subrecords may have different number of fields.

Storage nodes have the same role as in the first proposal.

When a data transfer occur, items on the
input or output list are associated with template elements and, thus,
record fields, using align statements.   An unformatted data transfer access
only one record.  Each subrecord of this record will contain parts of each
item in the input or output list.

The following diagram illustrates the mapping

\begin{verbatim}

                              storage         Physical
I/O item       Template        node      storage units (or caches)
    _              _             _               _
   |_|----------->|_|---------->|_|------------>|_|

         ALIGN       DISTRIBUTE   Implementation
       (per item)    (per file)     Dependent


\end{verbatim}

\subsection{Node Directive}

Same is in previous proposal.

\subsection{IO Templates}

IO templates are used to specify a partition of file records into
fields and a distribution of these fields onto storage nodes.
These templates are declared using the keyword {\tt IOTEMPLATE}.  They are
distributed onto storage node arrangements as regular templates would be.

\subsection{OPEN statement}

A new connection specifier of the form {\tt IOTEMPLATE =
node-arrangement-name} associates a template
with the opened file.  Only external files may be so associated.  For an
existing file, the template distribution and the underlying storage node
arrangement must be included in the (implementation
dependent) set of allowable distributions and arrangements for
this file. This set includes
any template distributed onto the same storage node
arrangement.

Suppose that a file is created with template T1 and next opened
with template T2, where both templates are distributed onto the same node
arrangement.  The file presents in both cases the same sequence of records,
with the same partition of records into subrecords.  However, each access may
specify a different partition of subrecords into fields and a different
numbering of the fields.

A template cannot be redistributed while it is associated with an open file.

\subsection{Unformatted Data Transfer Statements}

Arrays can be mapped onto io templates using ALIGN
directives.  The alignment of an array onto an
io template does not affect its storage; it only affects
the way
this array is transferred from or to a file that is associated with this io
template.  An array can be simultaneously aligned with
several distinct io templates; each alignment controls data
transfers to/from files that are associated with the corresponding io
template.

The execution of a WRITE statement creates one partitioned record,
as described below:
\begin{enumerate}
\item
A copy of each item in the output item list is created, realigned and
redistributed according to the mapping specification for this item at the data
transfer statement.
\item
This copy is stored on the file, with each item stored in the
corresponding fields.  Successive items in the output item
list are stored in succession.
\end{enumerate}

The execution of a READ statement transfers the content of one partitioned
record
into the variables in the input item list, reversing the previous process.

Example

\begin{verbatim}

!HPF$ NODE :: D(2)
!HPF$ IOTEMPLATE, DISTRIBUTE(*, CYCLIC) ONTO D :: T1(2,4)
!HPF$ TEMPLATE, DISTRIBUTE(BLOCK, *) ONTO D :: T2(4,2)
       REAL A(2,4), B(2,4)
!HPF$  ALIGN WITH T1 :: A, B
!HPF$  ALIGN WITH T2(I,J) :: A(J,I), B(J,I)
       OPEN(UNIT = 15, FILE = 'IN', IOTEMPLATE = T1)
       OPEN(UNIT = 16, FILE = 'OUT', IOTEMPLATE = T2)
       READ (15) A, B
       WRITE (16) A, B

\end{verbatim}

First record of file IN:

\begin{verbatim}

subrecord 1: 11 21 31 41 51 61 71 81

subrecord 2: 12 22 32 42 52 62 72 82

\end{verbatim}

Values assigned to arrays A and B by READ operation

\begin{verbatim}

A(1,1) = 11    A(1,2) = 12   A(1,3) = 31    A(1,4) = 32
A(2,1) = 21    A(2,2) = 22   A(2,3) = 41    A(2,4) = 42

B(1,1) = 51    B(1,2) = 52   B(1,3) = 71    B(1,4) = 72
B(2,1) = 61    B(2,2) = 62   B(2,3) = 81    B(2,4) = 82

\end{verbatim}

First record of file OUT after program execution:

\begin{verbatim}

subrecord 1: 11 12 21 22 51 52 61 62

subrecord 2: 31 32 41 42 71 72 81 82

\end{verbatim}

It might be reasonable (but inelegant) to restrict the syntax of
of HPF mapping directives that apply to ionode arrangements (e.g., prohibit the
use of explicit indices in ALIGN directives).

Example:

\begin{verbatim}

!HPF$ NODE, DIMENSION(2) :: D
!HPF$ IOTEMPLATE, DISTRIBUTE ONTO D :: T1(2)
!HPF$ IOTEMPLATE, DISTRIBUTE ONTO D :: T2(2)
       REAL A(2)
!HPF$  ALIGN A WITH T1
!HPF$  ALIGN A(*) WITH T2
       OPEN(UNIT = 15, FILE = 'IN', IOTEMPLATE = T1)
       OPEN(UNIT = 16, FILE = 'OUT', IOTEMPLATE = T2)
       READ (15) A
       WRITE (16) A

\end{verbatim}

First record of file IN before execution:

\begin{verbatim}

subrecord 1: 1

subrecord 2: 2

\end{verbatim}

Values assigned to array A

\begin{verbatim}

A(1) = 1  A(2) = 2

\end{verbatim}

First record of file OUT after execution:

\begin{verbatim}

subrecord 1: 1 2

subrecord 2: 1 2

\end{verbatim}

Example:

\begin{verbatim}

!HPF$ NODE, DIMENSION(2,2) :: D
!HPF$ IOTEMPLATE, DISTRIBUTE ONTO D :: T(2,2)
       REAL A(1,1), B(2,1), C(2,2)
       DATA A, B, C / 11, 21, 22, 31, 32, 33, 34 /
!HPF$  ALIGN WITH T :: A, B, C
       OPEN(UNIT = 16, FILE = 'OUT', IOTEMPLATE = T)
       WRITE (16) A, B, C

\end{verbatim}

First record of file OUT after execution:

\begin{verbatim}

subrecord 1 :   11 21  31

subrecord 2 :      22  32

subrecord 3 :          33

subrecord 4 :          32

\end{verbatim}

Note that subrecords need not have the same length.

\subsubsection{Extensions}

\paragraph{}

As the previous example shows, it is convenient to allow the alignment of an
array of lower rank onto a template of higher rank, assuming additional
implicit dimensions, each with extent one.  Then, the following program would
have the same outcome as the previous one.

\begin{verbatim}

!HPF$ NODE, DIMENSION(2,2) :: D
!HPF$ IOTEMPLATE, DISTRIBUTE ONTO D :: T(2,2)
       REAL A, B(2), C(2,2)
       DATA A, B, C / 11, 21, 22, 31, 32, 33, 34 /
!HPF$  ALIGN WITH T :: A, B, C
       OPEN(UNIT = 16, FILE = 'OUT', IOTEMPLATE = T)
       WRITE (16) A, B, C

\end{verbatim}

We adopt this extension for alignment statements.
\paragraph{}

An implied alignment of the form obtained by a statement
{\tt ALIGN item WITH template}
is assumed for all
items that have not been explicitly aligned.  This require that items have rank
equal or lower than the rank of the template and, for each dimension, extent
equal of lower than the extent of the template in this dimension.
Thus, in the last example, the ALIGN directive can be dropped with no change in
the outcome.
\paragraph{}

An {\em io-implied-do-object} is treated as an array subsection.
Thus
\begin{verbatim}

(B(I,3), I=1,4,2)

\end{verbatim}
is handled as an array of rank one (or higher);

\begin{verbatim}

(B(I,J), C(I), I=1,10), J=1,10)

\end{verbatim}
is handled as an array of rank two (or higher); each entry X(I,J) of this array
consists of a pair of values (B(I,J), C(I)).

Example:

\begin{verbatim}

!HPF$ IONODE, DIMENSION(2,2) :: D
!HPF$ TEMPLATE, DISTRIBUTE ONTO D :: T(2,2)
       REAL A(2),B(2),C(2,2)
       DATA A, B, C / 11, 21, 22, 31, 32, 33, 34 /
       OPEN(UNIT = 16, FILE = 'OUT', IONODE = D2)
       WRITE (16) A, A+3, (B(I), 99, C(I,J), I = 1, 2), J = 1, 2)
       WRITE (16) A, (B(I), 99, C(I,J), J = 1, 2), I = 1, 2)

\end{verbatim}


First record of file OUT after execution:

\begin{verbatim}

subrecord 1 :   11 14 21 99 31

subrecord 2 :         22 99 32

subrecord 3 :         21 99 33

subrecord 4 :         22 99 34

\end{verbatim}

Second record of file OUT after execution:

\begin{verbatim}

subrecord 1 :   11 21 99 31

subrecord 2 :      21 99 33

subrecord 3 :      22 99 32

subrecord 4 :      22 99 34

\end{verbatim}

\paragraph{}

It is convenient to allow alignment directives for items in the input or output
item list to appear in data transfer statements.   The notation will be
elaborated in the next release (if any) of this document.  (Since the alignment
is to the template associated with the file, there is no need to explicitly
name the template.  On the other hand, for the sake of sanity, it is desirable
to avoid align dummies.)

\section{Acknowledgement}

First proposal based on ideas of P. Corbett and S. Baylor.

Second proposal based on ideas of E. Ekanadham and Y. Baransky.


\end{document}

From highnam@slcs.slb.com  Mon Aug 17 09:23:02 1992
Received: from SLCS.SLB.COM by cs.rice.edu (AA07724); Mon, 17 Aug 92 09:23:02 CDT
From: highnam@slcs.slb.com
Received: from speedy.SLCS.SLB.COM
	by SLCS.SLB.COM (4.1/SLCS Mailhost 3.13)
	id AA06384; Mon, 17 Aug 92 09:22:40 CDT
Received: by speedy.SLCS.SLB.COM (4.1/SLCS Subsidiary 1.10)
	id AA00486; Mon, 17 Aug 92 09:22:38 CDT
Date: Mon, 17 Aug 92 09:22:38 CDT
Message-Id: <9208171422.AA00486.highnam@speedy.SLCS.SLB.COM>
To: hpff-io@cs.rice.edu
Subject: Stephen Whitley (GECO-Prakla): comments on HPFF v0.1 document


From: whitley@slcs.slb.com (Stephen Mark Whitley)
Subject: HPF I/O
Date: Fri, 14 Aug 92 11:21:13 CDT

Issues for

       High Performance Fortran
        Language Specification

Chapter 7  Input/Output


General...

    Having worked with a few machine specific formats before, I would
    recommend strongly against the inclusion of such a concept in a
    "portable language".  It is not enough to simply have the language
    portable to machines, the external representation data (at least
    layout) should also be portable.  

    The obvious exception would be in a situation where the cost of
    doing i/o is prohibitive. In such a situation, I would feel better
    with extensions (possibly vendor supplied) to the language to
    optimize the i/o at the cost of portability than to introduce the
    concept of non-portable external data representation within the scope
    of HPF.


Specifics...

7.1  

    >> the second proposal may affect the order the data items are
    >> stored within records.

     This sounds painful.  I want to share data between machines and
     between programs on the same machine that might operate with
     different alignments. 

    >> Both proposals require the following restriction on ...
    >> ... is legal Fortran HPF


     I understand the necessity of separating these operations,
     however, to maintain the greatest portability between F90 and HPF
     would it not be better to let the HPF compiler resolve the
     dependency internally?


7.2.4

    >> A new connection specifier of the form .....

     Is there any way a directive could be used to maintain the F90
     syntax.

7.2.6 

    >> new intrinsic of the form ...

     Not portable to F90.  (however, I must admit to liking this
     option)


7.3
 
   >> ...

    Sounds useful, however, it might be asking too much to force
    the use of external reformatters to produce a copy of the data
    that is usable by other applications.


-----------------------------------------------------------------
Stephen Whitley                             Houston, Tx 77077
GECO-PRAKLA                                 Office:  713-596-1511
whitley@ghds01.sinet.slb.com                whitley@slcs.slb.com
-----------------------------------------------------------------



From jb@vnet.ibm.com  Wed Aug 19 07:43:50 1992
Received: from vnet.ibm.com by cs.rice.edu (AA13013); Wed, 19 Aug 92 07:43:50 CDT
Message-Id: <9208191243.AA13013@cs.rice.edu>
Received: from KGNVMA by vnet.ibm.com (IBM VM SMTP V2R2) with BSMTP id 1476;
   Wed, 19 Aug 92 08:46:36 EDT
Date: Wed, 19 Aug 92 08:40:08 EDT
From: "Jason Behm" <jb@vnet.ibm.com>
To: hpff-io@cs.rice.edu
Subject: this is only a test
Reply-To: jb@vnet.ibm.com

Organization: IBM Technical Computing - Kingston, NY USA
News-Software: UReply 3.0
X-X-From: Jason Behm

Either activity is low or I'm not subscribed.

From @cunyvm.cuny.edu:SNIR@YKTVMV  Mon Aug 31 12:42:49 1992
Received: from CUNYVM.CUNY.EDU by cs.rice.edu (AA09556); Mon, 31 Aug 92 12:42:49 CDT
Message-Id: <9208311742.AA09556@cs.rice.edu>
Received: from YKTVMV by CUNYVM.CUNY.EDU (IBM VM SMTP V2R2) with BSMTP id 7955;
   Mon, 31 Aug 92 13:42:29 EDT
Date: Mon, 31 Aug 92 13:42:27 EDT
From: "Marc Snir" <SNIR%YKTVMV.bitnet@cunyvm.cuny.edu>
To: hpff-io@cs.rice.edu

%IO.tex
%Snir
\documentstyle[11pt]{article}
\pagestyle{plain}
\pagenumbering{arabic}
\marginparwidth 0pt
\oddsidemargin  .25in
\evensidemargin  .25in
\marginparsep 0pt
\topmargin   -.5in
\textwidth 6in
\textheight  9.0in


\title{Proposal for IO}
\author{Marc Snir}
%\date{   }


\begin{document}

\maketitle

\section{Summary}

This document proposes to expand the mapping notation of HPF to include
files.  We try to achieve two purposes:
\begin{enumerate}
\item
Allow user control of file layout (file striping) across multiple I/O devices.
\item
Allow efficient transfer of distributed arrays to/from
striped files, with no sequential bottlenecks.
\end{enumerate}

To achieve the first goal,
we propose to allow HPF
mapping directives to control the association of
consecutive file records to I/O nodes.
The mapping directives for files will
allow to achieve a better layout of files across multiple I/O devices
(file striping) and/or a better distribution of file caches.
These directives control the physical layout of sequential files, but do not
alter the logical organization of sequential files.
Regular
data transfers can be used to access these records.  The semantics of
Fortran 90 I/O operations is left unchanged, and the representation of data on
the file does not depend on its distribution, or the distribution of the
arrays that were written onto it.

To achieve the second goal,
we propose to add
parallel I/O operations that allow to optimize the transfer distributed arrays
onto striped files.  The representation of data written using a parallel
I/O operation may be different from the data representation resulting from a
a sequential write, and may depend on
the file mapping and the mapping of the arrays that were written on it.
In particular, an array
transfered in one parallel write operation may be split into multiple
records, stored on distinct I/O nodes.
Data written with a parallel write operation can be read back with a
parallel read operation, onto a similarly distributed array.  However data
written with a parallel write cannot be read back with a sequential write,
and data written with a sequential write cannot be read with a parallel
read.

The two proposals are independent:  parallel read/writes can be directed to
any file, and files distributed onto multiple I/O nodes can be accessed via
sequential reads/writes.  High performance I/O will be achieved when
parallel reads and writes are used to access distributed files, and the
distribution of the file matches the distribution of the I/O items in
memory (or when the compiler can translate a sequential I/O operation into
an equivalent parallel I/O operation).

The second part of our proposal borrows some
features form the parallel I/O proposal of Viena Fortran \cite{vienaio}.
Unlike them, we do not propose to distinguish parallel files from sequential
files.

\section{File mapping}

A Fortran file is a sequence of records.  We treat such file as a 1-D array of
records with LB=1 and infinite UB.  This array can be mapped to a (storage) node
arrangement in a manner analogous to the mapping of an array to a
(processor) node
arrangement.  Files are mapped using the same notation as for array
mapping.  The mapping defines a partition of the file, and each part is
associated with one abstract node.

The mapping of a file to a node arrangement can be
interpreted in two ways:
\begin{enumerate}
\item
The nodes may represent
(abstract) independent storage units, each storing a fixed part of the
file.
\item
The nodes may represent (abstract) independent file caches, with a fixed
association of each cache with a part of the file.
\end{enumerate}

In both cases the file is mapped onto
physical I/O devices so as to allow maximal concurrency for accesses
directed to distinct parts of the file.
If the second interpretation is used, then it is meaningful to align arrays
and files onto the same templates.

We introduce a new filemap object.  Filemaps are, essentially, named files.
They appear where an array names would
appear in a array mapping expression.   An actual file is associated with a
FILEMAP in an OPEN statement.  Filemaps are introduced because files are
not first class objects in FORTRAN (files are not declared).   Also, Filemaps
can have rank $>$ 1, giving more flexibility in the types of mappings that can
be specified.

The following diagram illustrates the mapping

\begin{verbatim}

                                   Node           Physical
File      Filemap     Template   arrangement    storage units  (or caches)
 _           _           _           _               _
|_|-------->|_|-------->|_|-------->|_|------------>|_|

     OPEN        ALIGN    DISTRIBUTE   Implementation
                                         Dependent
\end{verbatim}

\subsection{Node Directive}

We suggest to replace the keyword {\tt PROCESSOR} with the keyword {\tt
NODE}, which is more neutral.  Node arrangements (ex processor arrangements) can
be targets both for file mappings and for array mappings.  Some implementations
may disallow the use of the same node arrangement name as a target both for
array mappings and for file mappings.  In such case an {\tt AFFINITY} directive,
that specifies affinity between io nodes and processor nodes, would be useful.
(Such directive would also be useful to specify affinity between nodes of
different arrangements, e.g. nodes in arrangements of different rank.)

The set of allowable
node arrangements that can be used to map files is implementation dependent --
however, a node arrangement with {\tt NUMBER\_OF\_IONODES} nodes is always
legal.

The mapping of nodes to
physical storage units is implementation dependent.

For example:

\begin{verbatim}

!HPF$ NODE :: D1(2,4), D2(2,2)
      PARAMETER(NOD=NUMBER_OF_IONODES())
!HPF$ NODE, DIMENSION(NOD) :: D3,D4

\end{verbatim}

\subsection{FILEMAP Directive}

A Fortran file is an infinite  one-dimensional array of records, with LB=1.
A filemap can be thought of as an assumed-size array of records.  This array
is associated with (one-dimensional) files,
using storage association rules.  The filemap name is
used to specify a mappings for files.  The association between a filemap name
and an actual file is effected by the OPEN statement.

A FILEMAP directive declares filemap names.  The syntax is

\begin{verbatim}

filemap-directive   is   FILEMAP [::] filemap-name ( assumed-size-spec )
                                      [, filemap-name (assumed-size-spec ) ] ...
               or  FILEMAP, DIMENSION ( assumed-size-spec ) :: filemap-name-list


\end{verbatim}

An {\it assumed-size-spec} is a specification of the form used for assumed sized
arrays:  All dimensions are specified, with the exception of the last, which is
assumed.  In our case, the last dimension is infinite.  Only
initialization expressions may occur  in this specification (including
expressions that depend on {\tt NUMBER\_OF\_IONODES}).

For example:

\begin{verbatim}

!HPF$ FILEMAP :: F1(2,4,*)
!HPF$ FILEMAP, DIMENSION(2,2,1:*) :: F2,F3

\end{verbatim}

A FILEMAP directive does not allocate space, neither in memory, nor on disk.

\subsection{File mapping}


ALIGN and DISTRIBUTE statements are used to map FILEMAPs onto nodes.
The syntax is identical to the syntax for processor mappings, with one
restriction:
Block distributions cannot be used for the last (infinite) dimension of
the filemap.

For example:

\begin{verbatim}

!HPF$ DISTRIBUTE (CYCLIC,CYCLIC,*) ONTO D2 :: F2,F3
!HPF$ DISTRIBUTE F1(*,BLOCK,CYCLIC(2)) ONTO D1
\end{verbatim}

Assume that {\tt F1, F2} are the filemaps and {\tt D1, D2} are the node
arrangements from the previous examples.

The first distribute statement
specifies the following mapping for successive records of a file associated with
{\tt F2} or {\tt F3}.



\begin{verbatim}

D2(1,1)         D2(1,2)

1 (1,1,1)       3 (1,2,1)
5 (1,1,2)       7 (1,2,2)
.               .
.               .
.               .



D2(2,1)          D2(2,2)

2 (2,1,1)        4 (2,2,1)
6 (2,1,2)        8 (2,2,2)
.                .
.                .

\end{verbatim}



The second distribute statement
specifies the following mapping for successive records of a file associated with
{\tt F1}.





\begin{verbatim}

D1(1,1)           D1(1,2)          D1(1,3)            D1(1,4)

 1 (1,1,1)       17                33                 49
 2 (2,1,1)       .                 .                  .
 3 (1,2,1)       .                 .                  .
 4 (2,2,1)       20                36                 52
 9 (1,1,2)       25                41                 57
10 (2,1,2)       .                 .                  .
11 (1,2,2)                         .                  .
12 (2,2,2)       28                44                 60
65               81                97                113
.                .                 .                  .
.                .                 .                  .



D1(2,1)            D1(2,2)         D1(2,3)            D1(2,4)

 5 (1,3,1)         21              37                 53
 6 (2,3,1)         .               .                  .
 7 (1,4,1)         .               .                  .
 8 (2,4,1)         24              40                 56
13 (1,3,2)         29              45                 61
14 (2,3,2)         .               .                  .
15 (1,4,2)         .               .                  .
16 (2,4,2)         32              48                 64
69                 85             101                117
.                  .               .                  .
.                  .               .                  .

\end{verbatim}

\subsection{OPEN statement}

A new connection specifier of the form {\tt FILEMAP = filemap-name}
associates a mapping
with the opened file.   If the file exists then the mapping must be one of the
mappings allowed for the file.  The set of allowed file mappings for an existing
file is implementation dependent, but always include the mapping under which
the file was created.  More generally, it will include any mapping where the
file is mapped onto the same storage node arrangement, and with the same
allocations of file records to storage nodes
(different mappings may
result in the same allocation of records to storage nodes).
One choice is to allow any mapping,  with possible degraded performance for
ill matched mappings; another choice is to remap an existing
file when it is opened with a new mapping, either offline or online.   Vendors
are expected to provide implementation dependent mechanisms to exercise such
choices.

The default mapping is implementation dependent.

Only external files can be mapped.

Implementations may restrict the use of the FILEMAP connection specifier to
files that are open for direct access (i.e., fixed size record files).

\section{Parallel Data Transfer}

The READ, WRITE, CLOSE, INQUIRE, BACKSPACE, ENDFILE, REWIND statements can be
used to access distributed files; there are no changes in the syntax or
semantics of these statements.

PREAD and PWRITE statements are
added to allow efficient input or output of distributed arrays.
The PREAD and PWRITE statements have the same syntax as unformatted
I/O statements with READ or WRITE, respectively; they are semantically
different.  The data representation created on a file by a PWRITE statement
may be different from the data representation that obtains if PWRITE is
replaced by WRITE.  In particular,
whereas an unformatted WRITE statement will create
a single record (stored on one I/O node), a PWRITE
statement may create multiple records, possibly on multiple I/O nodes.
Whereas an unformatted READ statement
accesses a unique record, a PREAD statement may access multiple records.

If a a PWRITE statement was used to write a list of output items on a file, then
a PREAD that starts at the same point in the file, and has a compatible list of
input items, will return the values that were written.   Two lists of items are
compatible if the corresponding items in each list occupy the same number of
storage units and have compatible mappings (informally, if the distribution of
entries onto abstract processors is the same).

Examples

The program below exchanges the values of arrays {\tt A} and {\tt B}.  The
exchange is legal because the arrays are compatible.

\begin{verbatim}

REAL, DIMENSION(1000,1000) :: A, B
ALIGN A WITH B
...
OPEN(UNIT = 15, ACTION = READWRITE)
PWRITE (UNIT = 15) A, B
REWIND (UNIT = 15)
PREAD (UNIT = 15) B, A
\end{verbatim}


The behavior of the program below is undefined.  More than one record
could have been created by the PWRITE statement, so that the BACKSPACE
statement does not necessarily return the file position to where it was before
PWRITE executed.


\begin{verbatim}

REAL, DIMENSION(1000,1000) :: A, B
ALIGN A WITH B
...
OPEN(UNIT = 15, ACTION = READWRITE)
PWRITE (UNIT = 15) A, B
BACKSPACE (UNIT = 15)
PREAD (UNIT = 15) B, A
\end{verbatim}

The behavior of the program below is undefined, since the two arrays {\tt A} and
{\tt B} don't have compatible distributions.

\begin{verbatim}

REAL, DIMENSION(1000,1000) :: A, B
DISTRIBUTE A(BLOCK,BLOCK)
DISTRIBUTE B(CYCLIC, CYCLIC)
...
OPEN(UNIT = 15, ACTION = READWRITE)
PWRITE (UNIT = 15) A, B
REWIND (UNIT = 15)
PREAD (UNIT = 15) B, A
\end{verbatim}

Data written by a WRITE statement cannot be read with PREAD, and data
written with PWRITE cannot be read with READ, or by a PREAD that does not
start at exactly the same point in the file  (otherwise the program outcome
is undefined).

PREAD and PWRITE can be used both for sequential access and for direct
access.  In the later case, the REC specifier indicates the position in the
file where from the transfer starts.  It is still the case that a transfer
may involve several records.

\subsection{restrictions}

The following restrictions allow for a simpler, more efficient
implementation of parallel I/O.  We may either put them in the language, or
list them as recommended programming style.

\begin{enumerate}
\item
Items in the item list of a PREAD or PWRITE statements
are restricted to be variables (no io-implied-do).
[Compilers may want to relax this rule, by considering an io-implied-do as
being an operation that defines a new variable, akin
to an array section, with a
distribution induced by the distribution of the variables appearing in the
implied-do-loop.]
\item
All values needed to determine which entities are specified by a parallel
I/O item list need be specified before the I/O statement.  That is, we
prohibit a statement of the form {\tt PREAD (...) N, A(1:N)}.
\end{enumerate}

\subsection{Extensions}
\begin{itemize}
\item
We may want to
write an array with a layout that is suited to the mapping of the array
that will appear in the input item list, rather than suited to the mapping of
the array in the output list.  To achieve this, we need to add align/distribute
information as part of the PWRITE statement.
\item
We may want a {\tt REMAP} statement, to be used
instead of the sequence
{\tt CLOSE ... OPEN}, in order to associate a new mapping to an existing
file.
\item
We may want to extend the {\tt INQUIRE} statement to return  file mapping
information (alternatively, we may use the same query intrinsics used to query
array partitions).
\item
A new intrinsics of the form {\tt INDEX(filemap-name, list-of-indices)} would be
handy, as it would allow to address random-access files as multidimentional
arrays.  E.g.

\begin{verbatim}

READ (7, REC = INDEX(F1,3,5) ) A
\end{verbatim}
\item
Each data transfer operation specifies
an association between parts of the file and abstract processor nodes
where from (where to) the data in the record is transferred.
We may want to add additional directives to the OPEN statement to indicate that
this association fulfils certain restrictions for as long as the file is open.
\begin{itemize}
\item
Accesses to a file are {\em independent} if, in all data transfers,
each file part is associated with the same processor node.  An
{\tt INDEPENDENT} argument in the OPEN statement may be used to specify this
condition (which simplifies file caching).
\item
A data transfer is {\em aligned} if each file part is associated with a
unique processor node (is not split among two processor nodes).  We may
use an {\tt ALIGNED} argument in the OPEN statement to specify that all
data transfers are aligned.
(INDEPENDENT implies ALIGNED, but not vice versa).
\end{itemize}
\end{itemize}
\begin{thebibliography}{xxx}
\bibitem{vienaio}
P.\ Brezany, M.\ Gernt, P.\ Mehotra and H.\ Zima, ``Concurrent File Operations
in a High Performance Fortran.''
\end{thebibliography}
\end{document}


From jim@meiko.co.uk  Wed Sep  2 07:33:06 1992
Received: from marge.meiko.com by cs.rice.edu (AA16869); Wed, 2 Sep 92 07:33:06 CDT
Received: from hub.meiko.co.uk by marge.meiko.com (4.1/SMI-4.1)
	id AA12975; Wed, 2 Sep 92 08:33:04 EDT
Received: from spica.co.uk ([192.131.108.50]) by hub.meiko.co.uk (4.1/SMI-4.1)
	id AA29400; Wed, 2 Sep 92 13:31:37 BST
Date: Wed, 2 Sep 92 13:31:37 BST
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9209021231.AA29400@hub.meiko.co.uk>
Received: by spica.co.uk (4.1/SMI-4.1)
	id AA10870; Wed, 2 Sep 92 13:30:05 BST
To: hpff-io@cs.rice.edu
Subject: IO Anti-proposal

IO Anti-proposal		
================

James Cownie 2 Sept 1992

When the HPFF established its working groups one of them was dedicated
to I/O extensions for parallel machines. The objective here was to 
define an expected set of extensions to standard Fortran (90) I/O
which could be expected on an MPP machine running HPF. 

However this group has been noticeably silent (a total of 12 messages
in 3 months, two of which were proposals, and one a note on the
silence of the group !), and quite a few of the messages actually
question whether we should do anything in this area. 

Marc Snir has now made a proposal on which NO comments appear to have
been received.  This proposal adds significant extensions to Fortran
I/O, and additional (non-standard) I/O functions.

So that we do not fall into the trap of voting this proposal in solely
on the grounds that it exists, I would like to make an "anti-proposal",
which is
      
       "HPF should contain NO I/O extensions"

Arguments for the Anti-proposal
-------------------------------

1) I/O systems on parallel machines are too architecturally different for
   there to be a useful abstraction on which the language model can build.

   Consider the difference between a machine with discs on each node,
   compared with one which has a high bandwidth disc system connected
   to the comms network, and thus globally accessible.

2) Fortran I/O is already highly expressive, (some people would say
   too expressive).

3) The HPF compiler must already know when it is performing I/O on distributed
   entities, and can therefore optimise their I/O to distributed files
   without any extensions to the source language.

4) The management of distributed files (and their implementation) is a
   matter for the operating system, not the language.

What this Anti-proposal does NOT say
------------------------------------

1) This proposal does NOT disallow any extensions which particular
   vendors may wish to make (indeed this would be impossible !), it
   simply says that there are no special I/O mechanisms mandated by HPF.

2) This proposal does NOT forbid the HPF run-time system from using
   whatever facilities the operating system provides for accessing
   "high performance" files, it merely says that the HPF language
   contains no I/O extensions. 

   [So the HPF system is entirely free to place a status='SCRATCH'
   file in the highest performance file system it likes, and
   distribute it as is appropriate for the machine, it is not up to
   the user to say all this however]


-- Jim
James Cownie 
Meiko Limited
650 Aztec West
Bristol BS12 4SD
England

Phone : +44 454 616171
FAX   : +44 454 618188
E-Mail: jim@meiko.co.uk or jim@meiko.com


From highnam@slcs.slb.com  Fri Sep  4 01:02:34 1992
Received: from SLCS.SLB.COM by cs.rice.edu (AA04660); Fri, 4 Sep 92 01:02:34 CDT
From: highnam@slcs.slb.com
Received: from speedy.SLCS.SLB.COM
	by SLCS.SLB.COM (4.1/SLCS Mailhost 3.13)
	id AA09933; Fri, 4 Sep 92 01:02:04 CDT
Received: by speedy.SLCS.SLB.COM (4.1/SLCS Subsidiary 1.10)
	id AA11632; Fri, 4 Sep 92 01:02:02 CDT
Date: Fri, 4 Sep 92 01:02:02 CDT
Message-Id: <9209040602.AA11632.highnam@speedy.SLCS.SLB.COM>
To: hpff-io@cs.rice.edu
Subject: re: IO Anti-proposal


Seconded.

Furthermore, are there any F90 I/O features that should
not be required in the initial HPF ?  (A question that
Marc Snir introduced 3 months ago..)

Peter

From highnam@slcs.slb.com  Tue Sep  8 01:22:57 1992
Received: from SLCS.SLB.COM by cs.rice.edu (AA22012); Tue, 8 Sep 92 01:22:57 CDT
From: highnam@slcs.slb.com
Received: from speedy.SLCS.SLB.COM
	by SLCS.SLB.COM (4.1/SLCS Mailhost 3.13)
	id AA26572; Tue, 8 Sep 92 01:22:21 CDT
Received: by speedy.SLCS.SLB.COM (4.1/SLCS Subsidiary 1.10)
	id AA11812; Tue, 8 Sep 92 01:22:19 CDT
Date: Tue, 8 Sep 92 01:22:19 CDT
Message-Id: <9209080622.AA11812.highnam@speedy.SLCS.SLB.COM>
To: knighten@ssd.intel.com
Subject: Absence of anti-proposal from HPF draft 0.2 !!
Cc: hpff-io@cs.rice.edu


Bob,
     please explain!  The inclusion of the IBM proposal
gives the appearance (regardless of what you might say
in the meeting) of subgroup consensus or approval when
this is certainly NOT the case.

Peter

From knighten@ssd.intel.com  Tue Sep  8 10:27:51 1992
Received: from SSD.intel.com by cs.rice.edu (AA27994); Tue, 8 Sep 92 10:27:51 CDT
Received: from tualatin.SSD.intel.com by SSD.intel.com (4.1/SMI-4.1)
	id AA27407; Tue, 8 Sep 92 08:27:39 PDT
Date: Tue, 8 Sep 92 08:27:39 PDT
Message-Id: <9209081527.AA27407@SSD.intel.com>
Received: by tualatin.SSD.intel.com (4.1/SMI-4.0)
	id AA13719; Tue, 8 Sep 92 08:27:37 PDT
From: Bob Knighten <knighten@ssd.intel.com>
Sender: knighten@ssd.intel.com
To: highnam@slcs.slb.com
Cc: hpff-io@cs.rice.edu, hpff-core@cs.rice.edu
Subject: Re: Absence of anti-proposal from HPF draft 0.2 !!
In-Reply-To: <9209080622.AA11812.highnam@speedy.SLCS.SLB.COM>
References: <9209080622.AA11812.highnam@speedy.SLCS.SLB.COM>
Reply-To: knighten@ssd.intel.com (Bob Knighten)

highnam@SLCS.SLB.COM writes:
  > 
  > Bob,
  >      please explain!  The inclusion of the IBM proposal
  > gives the appearance (regardless of what you might say
  > in the meeting) of subgroup consensus or approval when
  > this is certainly NOT the case.
  > 
  > Peter

I did not and do not understand that inclusion of material in one of these
preliminary drafts indicated approval of the material, even by the subgroup.
Clearly the subgroup needs to meet to decide if the proposal that is in the
draft will even be put forward to the HPFF as a whole.

-- Bob

From loveman@mpsg.enet.dec.com  Tue Sep  8 11:29:23 1992
Received: from enet-gw.pa.dec.com by cs.rice.edu (AA00201); Tue, 8 Sep 92 11:29:23 CDT
Received: by enet-gw.pa.dec.com; id AA16191; Tue, 8 Sep 92 09:29:22 -0700
Message-Id: <9209081629.AA16191@enet-gw.pa.dec.com>
Received: from mpsg.enet; by decwrl.enet; Tue, 8 Sep 92 09:29:23 PDT
Date: Tue, 8 Sep 92 09:29:23 PDT
From: David Loveman <loveman@mpsg.enet.dec.com>
To: hpff-io@cs.rice.edu, hpff-core@cs.rice.edu
Apparently-To: hpff-io@cs.rice.edu
Subject: Re: Absence of anti-proposal from HPF draft 0.2 !!


At this point, inclusion of text in the draft document indicates two
things, and two things only:

1.  I have a copy of the text, and

2.  It LaTeXes, in the context of the document, correctly.

The document does contain the following disclaimer, which may not be strong enough:

     This is a preliminary draft of what will become the Final Report of 
     the High Performance Fortran Forum.  The language features presented 
     here are still under active discussion and {\bf have not yet been 
     finally approved.}

-David

% ====== Internet headers and postmarks (see DECWRL::GATEWAY.DOC) ======
% Received: by mpsg.mps.mlo.dec.com; id AA20571; Tue, 8 Sep 92 12:29:17 -0400
% Received: by ftn90; id AA04477; Tue, 8 Sep 92 12:29:07 -0400
% Message-Id: <9209081629.AA04477@ftn90>
% To: decwrl::"hpff-io@cs.rice.edu", decwrl::"hpff-core@cs.rice.edu"
% Subject: Re: Absence of anti-proposal from HPF draft 0.2 !!
% Date: Tue, 08 Sep 92 12:29:04 -0400
% From: loveman
% X-Mts: smtp

From pm@icase.icase.edu  Tue Sep  8 15:39:39 1992
Received: from elc16 (elc16.icase.edu) by cs.rice.edu (AA06395); Tue, 8 Sep 92 15:39:39 CDT
Received: by elc16 (5.65.1/lanleaf2.4.9)
	id AA04978; Tue, 8 Sep 92 16:39:32 -0400
Message-Id: <9209082039.AA04978@elc16>
Date: Tue, 8 Sep 92 16:39:32 -0400
From: Piyush Mehrotra <pm@icase.icase.edu>
To: hpff-io@cs.rice.edu
Subject: An anti-anti-proposal for I/O
In-Reply-To: Mail from 'David Loveman <loveman@mpsg.enet.dec.com>'
      dated: Tue, 8 Sep 92 09:29:20 PDT


An Anti-Anti-I/O-Proposal:


There is some sentiment that nothing needs to be added to FORTRAN 90 I/O.
However, we perceive this as a problem, since FORTRAN 
enforces a sequential ordering (the column-major ordering)
on the data elements that are written out.  The point is
that relaxing this ordering may allow the compiler
and/or runtime system to optimize the input/output
operations.

For example, consider the situation where
a distributed array is written out and then read into
a target array with the same distribution.
Here, both the writing and the reading of the elements
is going to be faster if each process can just write and
read the portion of the array that it owns. However, this may not
be possible if the FORTRAN ordering is to be preserved.

Secondary storage is used for two purposes:
a) to temporarily store scratch data which is later
reused in a program, and
b) to communicate data to other programs (which includes 
display to user).

In the first case, it is possible that the compiler can analyze
the program and figure out that the ordering can be relaxed.
However, in the second case it is only the user that can 
provide any hints about the subsequent usage of the data.

We propose to add a new directive to HPF which would allow
the user to provide some hints to the compiler/runtime system
regarding the possible distributions of the target arrays.
Note that if the FORTRAN sequence is relaxed, the file management
system would have to modified so that it keeps track of the layout 
of the data in the different files.


    io-distribute-directive	is 	IO_DISTRIBUTE	dist-stuff

where  dist-stuff is as currently defined for distribution of
objects to processor arrangements.


An IO_DISTRIBUTE directive associates an io-distribution
with the arrays specified in the distributee-list.
An io-distribution for an array is a hint
to the compiler/runtime system that the data in the array will 
be read into an array of the specified distribution.  
Note that the io-distribution provides information about
the subsequent usage of data on files and does not imply
anything about the layout of the data in the I/O subsystem.
The system can  use this information to choose the layout 
so as to optimize the output and/or the input operations.

The IO_DISTRIBUTE directive may occur in the specification-part 
of a program unit or in the execution-part or in both. In the execution
part , the IO_DISTRIBUTE directives must immediately follow a WRITE
statement as attached declarations.  In this
situation they are treated more as executable statements than as
declarations.  (There is no syntactic ambiguity about whether they
are in the declaration part or execution-part of a program unit
because in this situation they follow an executable statement and
therefore are necessarily in the execution-part.)  Such attached
declarations may declare only arrays being output by the WRITE
statement.  Expressions in such directives may be any integer
expressions; they are not restricted in this situation to be
specification expressions.

If an array identifier appears in a IO_DISTRIBUTE directive in a
specification-part, then the associated io-distribution is propagated
to each WRITE statement in the program unit. If there
is a conflict between the specification in the specification-part and
a specification in the executable-part, then the latter prevails.



Omitting the dist-attribute-stuff (see syntax rules page 24 of the
draft) or using a `*' in its place signifies that the
io-distribution is the same as the actual distribution of the array.

If the user does not explicitly associate an io-distribution
with an array, the user is  signifying that there is not enough 
information available about the subsequent usage of the data 
and hence the compiler is free to do what it wants (as always).

Example:

    REAL A(100), B(100), C(100)
    ALIGN  WITH A:: B, C
    DISTRIBUTE (CYCLIC) :: A
    IO_DISTRIBUTE  * ::  A

    ....

    WRITE(10) A, B, C
    IO_DISTRIBUTE  * :: B

    ....

    WRITE(15) A, B
    IO_DISTRIBUTE (BLOCK) :: A




Here, in the case of the first WRITE statement,
arrays A, and B have their actual distributions
(CYCLIC) as their io-distributions; A because of
the io-distribute directive in the specification part 
and B because of the attached directive.
Array C does not have an associated io-distribution.
signifying that the compiler should take the default action.
The specification, however, indicates that the data elements of 
both A and  B  as written out to unit 10 will be read into 
arrays which are distributed cyclically.

The IO_DISTRIBUTE directive  attached to the second
WRITE statement overrides the initial io-distribution of A
declaring that even though A is actually distributed cyclically,
the data written out to unit 15 is subsequently going to be read 
into a block distributed array. The array B now has no associated 
io-distribution for the purposes of this WRITE statement.



Possible extension:

Allow keyword SERIAL as another option to enforce the FORTRAN 
order.  Note that using the distribution of (*,*,...,*,BLOCK) 
with n-1 `*'s for a rank n array essentially signifies a similar 
arrangement.



Barbara Chapman
Piyush Mehrotra 


From highnam@slcs.slb.com  Tue Sep  8 18:23:33 1992
Received: from SLCS.SLB.COM by cs.rice.edu (AA10439); Tue, 8 Sep 92 18:23:33 CDT
From: highnam@slcs.slb.com
Received: from speedy.SLCS.SLB.COM
	by SLCS.SLB.COM (4.1/SLCS Mailhost 3.13)
	id AA10720; Tue, 8 Sep 92 18:23:05 CDT
Received: by speedy.SLCS.SLB.COM (4.1/SLCS Subsidiary 1.10)
	id AA02931; Tue, 8 Sep 92 18:23:06 CDT
Date: Tue, 8 Sep 92 18:23:06 CDT
Message-Id: <9209082323.AA02931.highnam@speedy.SLCS.SLB.COM>
To: hpff-io@cs.rice.edu
Subject: (anti**3)-proposal for I/O


> From: Piyush Mehrotra <pm@icase.icase.edu>
> Subject: An anti-anti-proposal for I/O
> Date: Tue, 8 Sep 92 16:39:32 -0400
>
> An Anti-Anti-I/O-Proposal:
>
> 
> There is some sentiment that nothing needs to be added to FORTRAN 90 I/O.
> However, we perceive this as a problem, since FORTRAN 
> enforces a sequential ordering (the column-major ordering)
> on the data elements that are written out.  The point is
> that relaxing this ordering may allow the compiler
> and/or runtime system to optimize the input/output
> operations.
>
> ...
>

But, I don't think that the physical layout of the bits on the disk(s)
has to be in FORTRAN order.  Rather, if a ``sequential style'' access
is made to the file, then the bits are returned in FORTRAN order.

I *expect* the compiler, the run-time system, and the file system to
collude in order to exploit distributions in order to provide high
performance I/O.  I *expect* the subsystems concerned with I/O to
keep ``hidden'' data on the in-processor data distribution.  The
distribution information for an HPF object is passed whenever a
a routine call is made, so it's not a secret...  The system knows
where it's I/O resources are, so I don't understand why the programmer
should have to spell out any more information.  It's not a language
issue.


> Secondary storage is used for two purposes:
> a) to temporarily store scratch data which is later
> reused in a program, and
> b) to communicate data to other programs (which includes 
> display to user).
>
> In the first case, it is possible that the compiler can analyze
> the program and figure out that the ordering can be relaxed.
> However, in the second case it is only the user that can 
> provide any hints about the subsequent usage of the data.


``Scratch'' files that have no lifetime beyond the execution of
a program can be arbitrarily crazy inside.  But files that persist
should be available for normal access by regular F77 or F90 or HPF
programs. 

> ...
>
> Barbara Chapman
> Piyush Mehrotra 

Peter


From jb@vnet.ibm.com  Tue Sep 22 08:10:06 1992
Received: from vnet.ibm.com by cs.rice.edu (AA25444); Tue, 22 Sep 92 08:10:06 CDT
Message-Id: <9209221310.AA25444@cs.rice.edu>
Received: from KGNVMA by vnet.ibm.com (IBM VM SMTP V2R2) with BSMTP id 6204;
   Tue, 22 Sep 92 09:12:36 EDT
Date: Tue, 22 Sep 92 09:03:30 EDT
From: "Jason Behm" <jb@vnet.ibm.com>
To: hpff-io@cs.rice.edu
Subject: I think that I'm subscribed now
Reply-To: jb@vnet.ibm.com

Organization: IBM Technical Computing - Kingston, NY USA
News-Software: UReply 3.0
X-X-From: Jason Behm
References:

I think that I'm subscribed.  (Now if only there were some discussion...)

Jason Behm
-------------------------------------------------------------------------
IBM Technical Computing           Internet: jb@vnet.ibm.com
MM5A/278 Neighborhood Road        Phone:(914) 385-1853, internal 695-1853
Kingston, NY  12401               FAX:  (914) 385-4372, internal 695-4372
-------------------------------------------------------------------------

From knighten@ssd.intel.com  Tue Sep 22 10:51:22 1992
Received: from SSD.intel.com by cs.rice.edu (AA29432); Tue, 22 Sep 92 10:51:22 CDT
Received: from tualatin.SSD.intel.com by SSD.intel.com (4.1/SMI-4.1)
	id AA10910; Tue, 22 Sep 92 08:51:15 PDT
Date: Tue, 22 Sep 92 08:51:15 PDT
Message-Id: <9209221551.AA10910@SSD.intel.com>
Received: by tualatin.SSD.intel.com (4.1/SMI-4.0)
	id AA17089; Tue, 22 Sep 92 08:51:15 PDT
From: Bob Knighten <knighten@ssd.intel.com>
Sender: knighten@ssd.intel.com
To: jb@vnet.ibm.com
Cc: hpff-io@cs.rice.edu
Subject: Re: I think that I'm subscribed now
In-Reply-To: <9209221310.AA25444@cs.rice.edu>
References: <9209221310.AA25444@cs.rice.edu>
Reply-To: knighten@ssd.intel.com (Bob Knighten)

At the last HPFF meeting, there seemed to be agreement that in this initial
version of HPF there will be *NO* language support for parallel I/O.  Below is
an excerpt/paraphrase from Chuck Koelbel's minutes on this topic:


Marc Snir offered a number of solutions (to the problem of supporting parallel
I/O) for consideration in HPF:

  Solution 0: Define no language extensions. (The compiler does it all) 
    Metadata can be stored with the file to specify its layout.
    Does this assume too much of the compiler and file system?  A
    (small) majority of the I/O subgroup thinks this is insufficient,
    since it puts lots of burden on the compiler/operating system.

  Solution 1: Define hints (annotations) that do not change file
    semantics, in the spirit of data distribution. (This gives some
    information to the compiler.)

  Solution 1.1 (Piyush Mehrotra): On write, give a hint about how the
    data will be read.

  Solution 1.2: Give hints about the physical layout (number of spins,
    record length, striping function, etc.) of the file when it is
    opened.

    This uses the HPF array mapping mechanisms. (A file is a
    1-dimensional array of records.) The syntax needs a "name" for the
    file "template": we suggest FILEMAP. The programmer can
    align/distribute FILEMAP (on I/O nodes), associate FILEMAP with a
    file on OPEN, etc. There are again no changes in semantics or file
    system.

  Solution 2: Introduce parallel read/write operations that are not
    necessarily compatible with sequential ones.

  PWRITE a
  PREAD a


There were straw polls to guide the I/O group.  A series of such polls was
expected, but the first poll found a count of 16 yes, 10 no, and 8 abstain in
favor of solution 0 (doing nothing, as opposed to any active solution).  Ken
Kennedy recommended that the subgroup come back in another round and provide
more functionality then. A recommendation was made and seconded that a
rationale for not handling I/O be added to the draft; some of the e-mail
discussions appeared suitable for this.

Robert L. Knighten	             | knighten@ssd.intel.com
Intel Supercomputer Systems Division | 
15201 N.W. Greenbrier Pkwy.	     | (503) 629-4315
Beaverton, Oregon  97006	     | (503) 629-9147 [FAX]

From knighten@ssd.intel.com  Mon Oct 19 02:43:36 1992
Received: from SSD.intel.com by titan.cs.rice.edu (AA28759); Mon, 19 Oct 92 02:43:36 CDT
Received: from chaos.SSD.intel.com by SSD.intel.com (4.1/SMI-4.1)
	id AA03807; Mon, 19 Oct 92 00:41:31 PDT
Date: Mon, 19 Oct 92 00:41:31 PDT
Message-Id: <9210190741.AA03807@SSD.intel.com>
Received: by chaos.SSD.intel.com (4.1/SMI-4.0)
	id AA01791; Mon, 19 Oct 92 00:41:30 PDT
From: Bob Knighten <knighten@ssd.intel.com>
Sender: knighten@ssd.intel.com
To: loveman@mpsg.enet.dec.com
Cc: hpff-io@cs.rice.edu
Subject: I/O chapter
Reply-To: knighten@ssd.intel.com (Bob Knighten)

%io.tex

%Version of October 16, 1992 - Robert L. Knighten, Intel Corporation

\chapter{Input/Output}
\label{io}

\footnote{Version of October 16, 1992 - Robert L. Knighten, Intel
  Corporation}

High Performance Fortran has exactly the same Input/Output facilities
that are available in Fortran 90.

One of the High Performance Fortran Forum working groups was dedicated
to I/O extensions for parallel machines. The objective here was to
define a set of extensions to standard Fortran 90 I/O which would
provide high I/O performance a massively parallel computer running
HPF.

Three proposals in this spirit were offered by Marc Snir of IBM and
Piyush Melrota of ICASE.  The basic idea of these proposal is outlined
below and the proposals themselves are in the Journal of Development
appendix.  

From the beginning there was also a strong feeling that no extensions
to Fortran I/O should be added to High Performance Fortran, and this
is the position ultimately taken by the High Performance Fortran
Forum.

Among the arguments for this position are:

\begin{itemize}
\item I/O systems on parallel computers from different vendors are too
  architecturally different for there to be a useful abstraction on
  which the language model can build.  For example, consider the
  difference between a machine with disks on each node, compared with
  one which has a high bandwidth disk system connected to the
  communication network, and thus globally accessible.


\item Fortran I/O is already highly expressive, some would say
  too expressive.

\item The HPF compiler must already know when it is performing I/O on
  distributed arrays, and can optimize the I/O to distributed files
  without any extensions to the source language.

\item The management of distributed files (and their implementation)
  is a matter for the operating system, not the language.

\end{itemize}

Moreover the current lack of extensions does {\bf not} limit features
that may be added by system vendors.  In particular:

\begin{itemize}
\item Vendors are allowed to implement any I/O extensions to the
  language they may wish (indeed this would be impossible to prevent
  it!)  There are simply no special I/O mechanisms mandated by HPF.

\item The HPF run-time system may use whatever facilities the
  operating system provides for accessing "high performance" files,
  though the HPF language contains no I/O extensions that specifically
  describe such access.  For example the HPF system is entirely free
  to place a status='SCRATCH' file in the highest performance file
  system it likes, and distribute it as is appropriate for the
  machine, but it is not up to the user to say all this.

\end{itemize}


The proposals made to the IO subgroup were based on the following
observations:  
\begin{itemize}
\item A massively parallel machine needs massively parallel I/O

\item Efficient programs must avoid sequential bottlenecks from
  processors to file systems
  
\item Fortran specifies that a file appears in element storage order; this
  conflicts with striped files (for example, an array distributed by
  rows may be written to a file striped by columns).

\end{itemize}

The proposals were that HPF should provide explicit control to obtain
high performance I/O.  In essence the three proposals were:
\begin{enumerate}

\item On a write, give a hint about how the data will be read.

\CODE
          DISTRIBUTE (CYCLIC) :: a
          !HPF$ IO_DISTRIBUTE * :: a
          WRITE a, b, c
          !HPF$ IO_DISTRIBUTE * :: b
\EDOC

When an array is written, it can be easily read back in the given
distribution. The annotation can be associated with either the
declaration or the write itself; in the first case it applies to all
writes of the array, while in the second it only applies to the one
statement.  The intent is that metatdata is kept in the file system to
record the "right" data layout. The advantages of this proposal
include notation and efficiency


\item Give hints about the physical layout (number of spins, record
  length, striping function, etc.) of the file when it is opened.

  This uses the HPF array mapping mechanisms. (A file is a
  1-dimensional array of records.) The syntax needs a "name" for the
  file "template"; the proposal is to use FILEMAP. The programmer can
  align/distribute FILEMAP (on I/O nodes), associate FILEMAP with a
  file on OPEN, etc. There are no changes in semantics or file system.

\item Introduce parallel read/write operations that are not
  necessarily compatible with sequential ones.

\CODE
  PWRITE a
  PREAD a
\EDOC

Data can be read back only into arrays of the same shape and mapping.
Data written by PWRITE must be read by PREAD. This solution does not
need metadata in file system or changes in the file system but is
incompatible with the standard READ and WRITE

\end{enumerate}


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\end{document}

%Revision history:

%July 7, 1992 - Original version by Marc Snir, International
%  Business Machines 
%October 16, 1992 - Revised by Robert L. Knighten,Intel Corporation,
%  to explain the HPFF position.  Used comments from James Cownie, Meiko
%  Limited.  Proposals from Marc Snir and Piyush Mehrota moved to Journal of
%  Development.

From knighten@ssd.intel.com  Mon Oct 19 03:56:56 1992
Received: from SSD.intel.com by titan.cs.rice.edu (AA29150); Mon, 19 Oct 92 03:56:56 CDT
Received: from chaos.SSD.intel.com by SSD.intel.com (4.1/SMI-4.1)
	id AA04145; Mon, 19 Oct 92 01:55:31 PDT
Date: Mon, 19 Oct 92 01:55:31 PDT
Message-Id: <9210190855.AA04145@SSD.intel.com>
Received: by chaos.SSD.intel.com (4.1/SMI-4.0)
	id AA01972; Mon, 19 Oct 92 01:55:29 PDT
From: Bob Knighten <knighten@ssd.intel.com>
Sender: knighten@ssd.intel.com
To: loveman@mpsg.enet.dec.com
Cc: hpff-io@cs.rice.edu
Subject: Journal of Development section on IO
Reply-To: knighten@ssd.intel.com (Bob Knighten)

Dave --

I took the Appendix (which was empty in the version I had) and added a section
on I/O to the Journal of Development chapter.  I am working blind at the
moment (i.e. I have not actually seen what the formatted text looks like), so
I would appreciate having you take a look to see if there are any format
errors.

-- Bob
==============================================================================
%appendixes.tex

%Version of October 16, 1992

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%journal_of_development.tex

\chapter{Journal of Development}
\label{journal}

%Version of October October 16, 1992 - Robert L. Knighten, Intel Corporation;
%Piyush Mehrota, ICASE; Marc Snir, International Business Machines Corporation


\section{Summary}

\footnote{Version of October 16, 1992 - Robert L. Knighten, Intel
  Corporation, Piyush Mehrota, ICASE, Marc Snir, IBM}

High Performance Fortran is primarily designed to obtain high
performance on massively parallel computers.  Such massively parallel
machines also need massively parallel I/O.

There are difficulties in getting high performance I/O:
\begin{itemize}

\item Efficient programs must avoid sequential bottlenecks from
  processors to file systems

\item Fortran specifies that a file appears in element storage order;
  this conflicts with striped files (for example, an array distributed
  by rows may be written to a file striped by columns).

\end{itemize}
  
In particular Fortran file organization has limits:
\begin{itemize}

\item Files have a sequential organization. (Even direct access files
  have records in sequential order, though they can be accessed out of
  order)
  
\item Fortran files are record oriented

\item Storage and sequence association are in force (when writing and
  then reading a file, for instance)
  
\item No specification of the physical organization is possible


\item No compatibility with other languages/machines is guaranteed

\end{itemize}
  

With these in mind there are two major approaches that have been
suggested:  
\begin{enumerate}

\item Define hints (annotations) that do not change file semantics, in
  the spirit of data distribution. (This gives some information to the
  compiler.)

\item Introduce parallel read/write operations that are not
  necessarily compatible with sequential ones.

\end{enumerate}


\subsection{Hints}

Two ideas have been advanced which use the idea of giving hints to the
compiler without changing the Fortran file semantics.

The first is based on the observation that although the distribution
of an array when it is written may be available to the compiler or
runtime system, the distribution into which that array will be read
cannot generally be known, even though the programmer may have this
knowledge.  So the proposal is to provide on a write a hint about how
the data will be read.  

\CODE
          DISTRIBUTE (CYCLIC) :: a
          !HPF$ IO_DISTRIBUTE * :: a
          WRITE a, b, c
          !HPF$ IO_DISTRIBUTE * :: b
\EDOC

When an array is written, it can be easily read back in the given
distribution. The annotation can be associated with either the
declaration or the write itself; in the first case it applies to all
writes of the array, while in the second it only applies to the one
statement.  The intent is that meta-data is kept in the file system to
record the "right" data layout. The advantages of this proposal
include notation and efficiency

The second proposal is to give hints about the physical layout (number
of spins, record length, striping function, etc.) of the file when it
is opened.

This uses the HPF array mapping mechanisms. (A file is a 1-dimensional
array of records.) The syntax needs a "name" for the file "template":
we suggest FILEMAP. The programmer can align/distribute FILEMAP (on
I/O nodes), associate FILEMAP with a file on OPEN, etc. There are
again no changes in semantics or file system.


\subsubsection{Mapping Files}

A Fortran file is a sequence of records.  We treat such file as a 1-D
array of records with LB=1 and infinite UB.  This array can be mapped
to a (storage) node arrangement in a manner analogous to the mapping
of an array to a (processor) node arrangement.  Files are mapped using
the same notation as for array mapping.  The mapping defines a
partition of the file, and each part is associated with one abstract
node.

The mapping of a file to a node arrangement can be interpreted in two
ways:

\begin{enumerate}

\item The nodes may represent (abstract) independent storage units,
  each storing a fixed part of the file.

\item The nodes may represent (abstract) independent file caches, with
  a fixed association of each cache with a part of the file.

\end{enumerate}

In both cases the file is mapped onto physical I/O devices so as to
allow maximal concurrency for accesses directed to distinct parts of
the file.  If the second interpretation is used, then it is meaningful
to align arrays and files onto the same templates.

We introduce a new filemap object.  Filemaps are, essentially, named
files.  They appear where an array names would appear in a array
mapping expression.  An actual file is associated with a FILEMAP in an
OPEN statement.  Filemaps are introduced because files are not first
class objects in FORTRAN (files are not declared).  Also, Filemaps can
have rank $>$ 1, giving more flexibility in the types of mappings that
can be specified.

The following diagram illustrates the mapping

\begin{verbatim}

                                   Node           Physical
File      Filemap     Template   arrangement    storage units  (or caches)
 _           _           _           _               _
|_|-------->|_|-------->|_|-------->|_|------------>|_|

     OPEN        ALIGN    DISTRIBUTE   Implementation
                                         Dependent
\end{verbatim}

\subsubsection{Node Directive}

We suggest to replace the keyword {\tt PROCESSOR} with the keyword
{\tt NODE}, which is more neutral.  Node arrangements (ex processor
arrangements) can be targets both for file mappings and for array
mappings.  Some implementations may disallow the use of the same node
arrangement name as a target both for array mappings and for file
mappings.  In such case an {\tt AFFINITY} directive, that specifies
affinity between io nodes and processor nodes, would be useful.  (Such
directive would also be useful to specify affinity between nodes of
different arrangements, e.g. nodes in arrangements of different rank.)

The set of allowable node arrangements that can be used to map files
is implementation dependent -- however, a node arrangement with {\tt
  NUMBER\_OF\_IONODES} nodes is always legal.

The mapping of nodes to physical storage units is implementation
dependent.

For example:

\CODE
!HPF$ NODE :: D1(2,4), D2(2,2)
      PARAMETER(NOD=NUMBER_OF_IONODES())
!HPF$ NODE, DIMENSION(NOD) :: D3,D4

\EDOC

\subsubsection{FILEMAP Directive}

A Fortran file is an infinite one-dimensional array of records, with
LB=1.  A filemap can be thought of as an assumed-size array of
records.  This array is associated with (one-dimensional) files, using
storage association rules.  The filemap name is used to specify a
mappings for files.  The association between a filemap name and an
actual file is effected by the OPEN statement.

A FILEMAP directive declares filemap names.  The syntax is

\BNF
filemap-directive   is   FILEMAP [::] filemap-name ( assumed-size-spec )
                                      [, filemap-name (assumed-size-spec ) ] ...
               or  FILEMAP, DIMENSION ( assumed-size-spec ) :: filemap-name-list
\FNB

An {\it assumed-size-spec} is a specification of the form used for
assumed sized arrays: All dimensions are specified, with the exception
of the last, which is assumed.  In our case, the last dimension is
infinite.  Only initialization expressions may occur in this
specification (including expressions that depend on {\tt
  NUMBER\_OF\_IONODES}).

For example:

\CODE
!HPF$ FILEMAP :: F1(2,4,*)
!HPF$ FILEMAP, DIMENSION(2,2,1:*) :: F2,F3
\EDOC

A FILEMAP directive does not allocate space, neither in memory, nor on
disk.

\subsubsection{File mapping}

ALIGN and DISTRIBUTE statements are used to map FILEMAPs onto nodes.
The syntax is identical to the syntax for processor mappings, with one
restriction: Block distributions cannot be used for the last
(infinite) dimension of the filemap.

For example:

\CODE
!HPF$ DISTRIBUTE (CYCLIC,CYCLIC,*) ONTO D2 :: F2,F3
!HPF$ DISTRIBUTE F1(*,BLOCK,CYCLIC(2)) ONTO D1
\EDOC

Assume that {\tt F1, F2} are the filemaps and {\tt D1, D2} are the
node arrangements from the previous examples.

The first distribute statement specifies the following mapping for
successive records of a file associated with {\tt F2} or {\tt F3}.

\begin{verbatim}

D2(1,1)         D2(1,2)

1 (1,1,1)       3 (1,2,1)
5 (1,1,2)       7 (1,2,2)
.               .
.               .
.               .



D2(2,1)          D2(2,2)

2 (2,1,1)        4 (2,2,1)
6 (2,1,2)        8 (2,2,2)
.                .
.                .

\end{verbatim}

The second distribute statement specifies the following mapping for
successive records of a file associated with {\tt F1}.

\begin{verbatim}

D1(1,1)           D1(1,2)          D1(1,3)            D1(1,4)

 1 (1,1,1)       17                33                 49
 2 (2,1,1)       .                 .                  .
 3 (1,2,1)       .                 .                  .
 4 (2,2,1)       20                36                 52
 9 (1,1,2)       25                41                 57
10 (2,1,2)       .                 .                  .
11 (1,2,2)                         .                  .
12 (2,2,2)       28                44                 60
65               81                97                113
.                .                 .                  .
.                .                 .                  .



D1(2,1)            D1(2,2)         D1(2,3)            D1(2,4)

 5 (1,3,1)         21              37                 53
 6 (2,3,1)         .               .                  .
 7 (1,4,1)         .               .                  .
 8 (2,4,1)         24              40                 56
13 (1,3,2)         29              45                 61
14 (2,3,2)         .               .                  .
15 (1,4,2)         .               .                  .
16 (2,4,2)         32              48                 64
69                 85             101                117
.                  .               .                  .
.                  .               .                  .

\end{verbatim}

\subsection{OPEN statement}

A new connection specifier of the form {\tt FILEMAP = filemap-name}
associates a mapping with the opened file.  If the file exists then
the mapping must be one of the mappings allowed for the file.  The set
of allowed file mappings for an existing file is implementation
dependent, but always include the mapping under which the file was
created.  More generally, it will include any mapping where the file
is mapped onto the same storage node arrangement, and with the same
allocations of file records to storage nodes (different mappings may
result in the same allocation of records to storage nodes).  One
choice is to allow any mapping, with possible degraded performance for
ill matched mappings; another choice is to remap an existing file when
it is opened with a new mapping, either offline or online.  Vendors
are expected to provide implementation dependent mechanisms to
exercise such choices.

The default mapping is implementation dependent.

Only external files can be mapped.

Implementations may restrict the use of the FILEMAP connection
specifier to files that are open for direct access (i.e., fixed size
record files).

\subsection{Parallel Data Transfer}

The READ, WRITE, CLOSE, INQUIRE, BACKSPACE, ENDFILE, REWIND statements
can be used to access distributed files; there are no changes in the
syntax or semantics of these statements.

PREAD and PWRITE statements are added to allow efficient input or
output of distributed arrays.  The PREAD and PWRITE statements have
the same syntax as unformatted I/O statements with READ or WRITE,
respectively; they are semantically different.  The data
representation created on a file by a PWRITE statement may be
different from the data representation that obtains if PWRITE is
replaced by WRITE.  In particular, whereas an unformatted WRITE
statement will create a single record (stored on one I/O node), a
PWRITE statement may create multiple records, possibly on multiple I/O
nodes.  Whereas an unformatted READ statement accesses a unique
record, a PREAD statement may access multiple records.

If a a PWRITE statement was used to write a list of output items on a
file, then a PREAD that starts at the same point in the file, and has
a compatible list of input items, will return the values that were
written.  Two lists of items are compatible if the corresponding items
in each list occupy the same number of storage units and have
compatible mappings (informally, if the distribution of entries onto
abstract processors is the same).

Examples

The program below exchanges the values of arrays {\tt A} and {\tt B}.
The exchange is legal because the arrays are compatible.

\CODE
REAL, DIMENSION(1000,1000) :: A, B
ALIGN A WITH B
...
OPEN(UNIT = 15, ACTION = READWRITE)
PWRITE (UNIT = 15) A, B
REWIND (UNIT = 15)
PREAD (UNIT = 15) B, A
\EDOC


The behavior of the program below is undefined.  More than one record
could have been created by the PWRITE statement, so that the BACKSPACE
statement does not necessarily return the file position to where it
was before PWRITE executed.


\CODE
REAL, DIMENSION(1000,1000) :: A, B ALIGN A WITH B ...  OPEN(UNIT = 15,
ACTION = READWRITE) PWRITE (UNIT = 15) A, B BACKSPACE (UNIT = 15)
PREAD (UNIT = 15) B, A
\EDOC

The behavior of the program below is undefined, since the two arrays
{\tt A} and {\tt B} don't have compatible distributions.

\CODE
REAL, DIMENSION(1000,1000) :: A, B
DISTRIBUTE A(BLOCK,BLOCK)
DISTRIBUTE B(CYCLIC, CYCLIC)
...
OPEN(UNIT = 15, ACTION = READWRITE)
PWRITE (UNIT = 15) A, B
REWIND (UNIT = 15)
PREAD (UNIT = 15) B, A
\EDOC

Data written by a WRITE statement cannot be read with PREAD, and data
written with PWRITE cannot be read with READ, or by a PREAD that does
not start at exactly the same point in the file (otherwise the program
outcome is undefined).

PREAD and PWRITE can be used both for sequential access and for direct
access.  In the later case, the REC specifier indicates the position
in the file where from the transfer starts.  It is still the case that
a transfer may involve several records.

\subsection{restrictions}

The following restrictions allow for a simpler, more efficient
implementation of parallel I/O.  We may either put them in the
language, or list them as recommended programming style.

\begin{enumerate}

\item Items in the item list of a PREAD or PWRITE statements are
  restricted to be variables (no io-implied-do).  [Compilers may want
  to relax this rule, by considering an io-implied-do as being an
  operation that defines a new variable, akin to an array section,
  with a distribution induced by the distribution of the variables
  appearing in the implied-do-loop.]

\item All values needed to determine which entities are specified by a
  parallel I/O item list need be specified before the I/O statement.
  That is, we prohibit a statement of the form {\tt PREAD (...) N,
    A(1:N)}.

\end{enumerate}

\subsection{Extensions}

\begin{itemize}

\item We may want to write an array with a layout that is suited to
  the mapping of the array that will appear in the input item list,
  rather than suited to the mapping of the array in the output list.
  To achieve this, we need to add align/distribute information as part
  of the PWRITE statement.

\item We may want a {\tt REMAP} statement, to be used instead of the
  sequence {\tt CLOSE ... OPEN}, in order to associate a new mapping
  to an existing file.

\item We may want to extend the {\tt INQUIRE} statement to return file
  mapping information (alternatively, we may use the same query
  intrinsics used to query array partitions).

\item A new intrinsics of the form {\tt INDEX(filemap-name,
    list-of-indices)} would be handy, as it would allow to address
  random-access files as multi-dimensional arrays.  E.g.

\CODE
READ (7, REC = INDEX(F1,3,5) ) A
\EDOC

\item Each data transfer operation specifies an association between
  parts of the file and abstract processor nodes where from (where to)
  the data in the record is transferred.  We may want to add
  additional directives to the OPEN statement to indicate that this
  association fulfills certain restrictions for as long as the file is
  open.

\begin{itemize}

\item Accesses to a file are {\em independent} if, in all data
  transfers, each file part is associated with the same processor
  node.  An {\tt INDEPENDENT} argument in the OPEN statement may be
  used to specify this condition (which simplifies file caching).

\item A data transfer is {\em aligned} if each file part is associated
  with a unique processor node (is not split among two processor
  nodes).  We may use an {\tt ALIGNED} argument in the OPEN statement
  to specify that all data transfers are aligned.  (INDEPENDENT
  implies ALIGNED, but not vice versa).
\end{itemize}
\end{itemize}

\begin{thebibliography}{xxx}
  \bibitem{vienaio} P.\ Brezany, M.\ Gernt, P.\ Mehrota and H.\ Zima,
  ``Concurrent File Operations in a High Performance Fortran.''
\end{thebibliography}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\end{document}

%Revision history:
%August 2, 1992 - Original version of David Loveman, Digital Equipment
%	Corporation and Charles Koelbel, Rice University
%October 14, 1992 - RLK - Added a section on I/O to the Journal of
%Development chapter based on the proposal from Marc Snir, IBM, and Piyush
%Mehrota, ICASE.

From chk@erato.cs.rice.edu  Tue Jan 26 22:41:40 1993
Received: from erato.cs.rice.edu by titan.cs.rice.edu (AA01848); Tue, 26 Jan 93 22:41:40 CST
Received: from localhost.cs.rice.edu by erato.cs.rice.edu (AA08481); Tue, 26 Jan 93 22:41:23 CST
Message-Id: <9301270441.AA08481@erato.cs.rice.edu>
To: hpff@erato.cs.rice.edu
Cc: hpff-core@erato.cs.rice.edu, hpff-distribute@erato.cs.rice.edu,
        hpff-forall@erato.cs.rice.edu, hpff-io@erato.cs.rice.edu,
        hpff-f90@erato.cs.rice.edu, hpff-intrinsics@erato.cs.rice.edu
Word-Of-The-Day: salariat : (n) the class of salaried workers
Subject: HPF Language Specification, version 1.0
Date: Tue, 26 Jan 93 22:41:22 -0600
From: chk@erato.cs.rice.edu


It's available!  (For sure from titan.cs.rice.edu; availability from
other sites will depend on how fast e-mail travels and how dedicated
administrators at other sites are.)  Below are the "standard"
announcement and call for comments.

Many thanks to everyone involved in producing this document, including
(but not limited to!):
	The HPFF working group.
	People who commented on version 0.4 of the spec.
	People who attended (and asked questions at) many
		presentations, including the Supercomputing '92 workshop.
	Our friendly funding agencies: DARPA, NSF, ESPRIT, and the
		employers who bankrolled most of the HPFF committee
		members.
Special thanks to David Loveman, who edited the document.

						Chuck Koelbel
						Executive Director, NSF

----------------------------------------------------------------

The most recent draft of the High Performance Fortran Language
Specification is version 1.0 Darft, dated January 25, 1993.  See
"Version History" below for a description of the changes.

How to Get the High Performance Fortran Language Specification
==============================================================

There are three ways to get a copy of the draft:

	1. Anonymous FTP: The most recent draft is available on 
	   titan.cs.rice.edu in the directory public/HPFF/draft.
	   Several files are kept there, including compressed
	   Postscript files of previous versions of the draft.  The
	   most current version of this draft is 0.4, which can be
	   retrieved as a tar file containing LaTeX source
	   (hpf-v10.tar) or in Postscript format (hpf-v10.ps);
	   both of these are also available as compressed files.
	   Several other sites also have the draft available in one or
           more formats, including think.com, ftp.gmd.de,
	   theory.tc.cornell.edu, and minerva.npac.syr.edu.

	2. Electronic mail: The most recent draft is available from
	   the Softlib server at Rice University.  This can be
	   accessed in two ways:
	     A. Send electronic mail to softlib@cs.rice.edu with "send 
		hpf-v10.ps" in the message body. The report is sent as a 
		Postscript file.
	     B. Send electronic mail to softlib@cs.rice.edu with "send 
		hpf-v10.tar.Z" in the message body. The report is
		sent as a uuencodeded compressed tar file containing
		LaTeX source.
             C. Send electronic mail to netlib@ornl.gov with "send
                hpf-v10.ps from hpf" in the message body.  The report
                is sent as a Postscript file.  This site also has the
                LaTeX source of the draft; use "send index from hpf"
                to see the file names.
             D. Send electronic mail to netlib@research.att.com with
	        "send hpf-v10.ps from hpff" in the message body.  The
		report is sent as a Postscript file.
	   (In all cases, the reply is sent as several messages to
	   avoid mailer restrictions; edit the message bodies together
	   to obtain the whole file.)  The same files can be obtained
	   from David Loveman (loveman@mpsg.enet.dec.com) and Chuck
	   Koelbel (chk@cs.rice.edu), but replies will take longer
	   because real people have to answer the mail.

	3. Hardcopy: The most recent draft is available as technical report 
	   CRPC-TR 92225 from the Center for Research on Parallel
	   Computation at Rice University.  Send requests to
		Theresa Chatman
		CITI/CRPC, Box 1892
		Rice University
		Houston, TX 77251
	   There is a charge of $50.00 for this report to cover copying and 
	   mailing costs.

Disclaimers
===========

A few caveats about the HPF draft:

	A. The current version contains some material that is still
	   under active discussion.  Changes will be fairly frequent
	   until at least December 1992.  New versions will be
           announced on the HPFF mailing list and in the newsgroups
	   comp.parallel, comp.lang.misc, and comp.lang.fortran.

	B. The HPF Language Specification does not necessarily
	   represent the official view of any individual, company,
	   university, government, or other agency.

	C. Please address any questions, comments, or possible
	   inconsistencies in the draft to hpff-comments@cs.rice.edu.
	   Include the chapter number you are commenting on in the
	   "Subject:" line of the message.


Version History
===============

Version 0.1:
August 14, 1992
EXTREMELY preliminary version.  

First collection of the proposals active in the High Performance Fortran 
Forum.  Established much of the outline for later documents, and 
represented most decisions made through the July HPFF meeting.


Version 0.2:
September 9, 1992
Version discussed at the September 10-11 HPFF meeting

Changes:
General cleaning up of version 0.1.
Inclusion of most new proposals at that time.


Version 0.3:
October 12, 1992
Version discussed at the October 22-23 HPFF meeting

Changes:
Numerous minor and major changes due to discussions at the September meeting.
Added a section on "Model of Computation".
Presented alternate chapters for data distribution with and without
templates.
Added two proposals for ON clauses specifying where computation is to
be executed.
Added distribution inquiry intrinsics.
Total rewrite of I/O material, sending most previous material to the
Journal of Development.


Version 0.4:
November 6, 1992
Version to be presented at Supercomputing '92

Changes:
Numerous minor and major changes due to discussions at the October
meeting.
"Acknowledgements" section now much more accurate.
"The HPF Model" (replacing "Model of Computation") substantially
simplified and improved.
"Distribution without Templates" chapter removed.
Many proposals not adopted moved to "Journal of Development".


Version 1.0:
January 25, 1993
Draft final version

Changes:
Many changes for clarity or pedagogical reasons.
The examples in several sections have been significantly enlarged.
INHERIT (for dummy arguments) added to distribution chapter.
Pure procedures may now have dummy arguments with explicit
distributions, if those distributions are inherited from the caller.
Changed the names of the new reductions AND, OR, and EOR to IALL,
IANY, and IPARITY.
Clarified the status of the character array language to be not in the
subset, and as a result, removed the character array intrinsics.
Only very restricted forms of alignment subscript expressions (of the
form \(m*i + n\) where \(m\) and \(n\) are integer expressions) are
part of the subset.
[Bibliography] Correctly spelled ``Mehrotra'' and ``Gerndt''.



----------------------------------------------------------------

REQUEST FOR PUBLIC COMMENT ON HIGH PERFORMANCE FORTRAN

To: The High Performance Computing Community

Invitation:

The High Performance Fortran Forum (HPFF), with participation from over 40 
organizations, has been meeting since January 1992 to discuss and 
define a set of extensions to Fortran called High Performance Fortran 
(HPF). Our goal is to address the problems of writing data parallel 
programs for architectures where the distribution of data impacts 
performance. While we hope that the HPF extensions become widely available, 
HPFF is not sanctioned or supported by any official standards organization. 
At this time, HPFF invites general public review comments on the initial 
version of the language draft. 

The HPF language specification, version 1.0 draft, is now available. This 
document contains all the technical features proposed for the language. 
We plan to make minor revisions to correct errors or clarify
ambiguities in March 1993, at which time we will issue a final draft;
however, we expect that there will be few (if any) major technical
changes from this draft.

HPFF invites comments on the technical content of HPF, as well as on the 
editorial presentation in the document.  To facilitate incorporation
of comments into the final document, we ask that comments be sent
before March 1, 1993.

comments, we ask that 

How to Get the Documents:

Electronic copies of the HPF language specification are available from 
numerous sources. 

    Anonymous FTP sources:      Directory:
    titan.cs.rice.edu           public/HPFF/draft
    think.com                   public/HPFF
    ftp.gmd.de                  hpf-europe
    theory.tc.cornell.edu       pub
    minerva.npac.syr.edu        public

    Email sources:              First line of message:
    netlib@research.att.com     send hpf-v10.ps from hpff
    netlib@ornl.gov             send hpf-v10.ps from hpf
    softlib@cs.rice.edu         send hpf-v10.ps    

The following formats are available (xx will be 04 or 10, depending on 
version). Note that not all formats are available from all sources. 
    hpf-v10.dvi                 DVI file
    hpf-v10.ps                  Postscript
    hpf-v10.ps.Z                Compressed Postscript
    hpf-v10.tar                 Tar file of LaTeX version
    hpf-v10.tar.Z               Compressed tar file

For more detailed instructions, send email to hpff-info@cs.rice.edu. This 
will return a message with expanded detail about accessing the above 
document sources, as well as other information about HPFF. 

We strongly encourage reviewers to obtain an electronic copy of the 
document. However, if electronic access is impossible the draft is also 
available in hard copy form as CRPC Technical Report #92225. This report is 
available for $50 (copying/handling fee) from: 

    Theresa Chatman
    CITI/CRPC, Box 1892
    Rice University
    Houston, TX 77251

Make checks payable to Rice University. This document will be sent surface 
mail unless additional airmail postage is included in the payment. 


How to Submit Comments:

HPFF encourages reviewers to submit comments as soon as possible, with a 
deadline of February 15 for consideration. Please do not submit comments 
for any version of the draft earlier than the 0.4 version. 

Please send comments by email to hpff-comments@cs.rice.edu. To facilitate 
the processing of comments we request that separate comment messages be 
submitted for each chapter of the document and that the chapter be clearly 
identified in the "Subject:" line of the message. Comments about general 
overall impressions of the HPF document should be labeled as Chapter 1. All 
comments on the language specification become the property of Rice 
University. 

If email access is impossible for comment responses, hard copy may be sent 
to 

    HPF Comments
    c/o Theresa Chatman
    CITI/CRPC, Box 1892
    Rice University
    Houston, TX 77251

HPFF plans to process the feedback received at a meeting in March. Best 
efforts will be made to reply to comments submitted. 


Sincerely,


Charles Koelbel
Rice University
HPFF Executive Director




From dfk@wildcat.dartmouth.edu  Sat May  8 09:25:46 1993
Received: from wildcat.dartmouth.edu by cs.rice.edu (AA19818); Sat, 8 May 93 09:25:46 CDT
Received: by wildcat.dartmouth.edu (5.65D1/4.1)
	id AA14244; Sat, 8 May 93 10:25:38 -0400
Date: Sat, 8 May 93 10:25:38 -0400
From: dfk@wildcat.dartmouth.edu (David Kotz)
Message-Id: <9305081425.AA14244@wildcat.dartmouth.edu>
To: hpff-io@cs.rice.edu, ciosig@ssd.intel.com
Subject: DAGS 93 registration info


I thought you might be interested in this. Apologies if you have
already received this through our DAGS mailing list. If you have NOT
already received this, and want to be on our DAGS mailing list (I am
sending this to two other I/O mailing lists), please send a message to
dags@cs.dartmouth.edu.  [Because I don't plan to send to these lists
again.]

Thanks,
dave

*********************  REMINDER ***********************************
     THE DEADLINE FOR EARLY REGISTRATION AND FUNDING IS MAY 15   
*******************************************************************         

	           CALL FOR PARTICIPATION
                    SYMPOSIUM and SCHOOL
 
                  DAGS/PC '93:  June 21-29
               Dartmouth College, Hanover, NH

              SECOND ANNUAL SUMMER INSTITUTE ON

    "ISSUES AND OBSTACLES IN THE PRACTICAL IMPLEMENTATION OF
     PARALLEL ALGORITHMS AND THE USE OF PARALLEL MACHINES"

        Symposium: PARALLEL I/O AND DATABASES (June 21-23)
        School:    PARALLEL PROGRAMMING	      (June 24-29)

                         SPONSORED BY

      The Dartmouth Institute for Advanced Graduate Studies
                in Parallel Computation (DAGS/PC)
         Department of Mathematics and Computer Science
                 The National Science Foundation
			    BellCore
	       Dartmouth Experimental Visualization Lab
		     Digital Equipment Corporation
			Kiewit Computer Center
			      NCD
		    NorthStar Computing Project


This summer, Dartmouth will hold its second DAGS institute to
promote the use of high-performance computing.  As in 1992, 
the institute will bring together researchers and
practitioners from academia and industry.  This year, the
institute will be composed of two parts: a symposium (June
21--23) of invited and contributed talks, followed by a hands-on
school (June 24--29) on parallel programming.  The informal and
beautiful setting of Dartmouth in the summer is a good working
environment which promotes discussions and new interactions among
all participants.  Housing will be available at economical rates
in college dormitories, near to conference sessions and centrally
located on the Dartmouth campus.




			SYMPOSIUM

The symposium will address issues and obstacles in the implementation
of parallel algorithms with a focus on the critical topics of parallel
I/O and databases.  The symposium will consist of prominent invited
speakers, a small number of contributed papers, and panels on
industrial concerns, large scale applications and one additional topic.
The list of invited speakers appears below:

INVITED SPEAKERS:

Alok Aggarwal (IBM T.J. Watson): 
	Communication  Latency  In  PRAM  Computations

Garth Gibson (Carnegie Mellon):
	Informed Prefetching: Converting High Throughput to Low Latency

David Scott (Intel Supercomputers):
	Parallel I/O for Dense Matrix Factorizations
	
Jeffrey Vitter (Duke):
	Paradigms for Optimal Sorting with Parallel Disks 
	and Memory Hierarchies.

David Waltz (Thinking Machines):
	Innovative Database Applications of Massively Parallel Processing

John Wilkes (Hewlett-Packard):
	Lessons from the DataMesh Parallel Storage System Project

PANEL DISCUSSIONS

	How is Industry Addressing Parallel I/O Issues?
Chaired by: Marina Chen (Yale University)

	Large Scale Applications 
Chaired by: George Cybenko (Dartmouth College)



		SYMPOSIUM SCHEDULE

Sunday 6/20
 evening: reception for early arrivals

Monday 6/21
11:00 - 1:00 pm: REGISTRATION
 1:00 - 1:30 pm: OPENING: Fillia Makedon (Dartmouth College) 
 1:30 - 2:45 pm: INVITED: John Wilkes (Hewlett-Packard)
		  Lessons from the DataMesh Parallel Storage System Project
 2:45 - 3:15 pm: BREAK
 3:15 - 3:45 pm: Orran Krieger and Michael Stumm (University of Toronto)
		  HFS: A Flexible File System for Large-Scale Multiprocessors
 3:45 - 4:15 pm: J. A. Keane, T. N. Franklin, A. J. Grant, 
		  R. Sumner, and M. Q. Xu (University of Manchester)
		  Commercial Users' Requirements for Parallel Machines
 4:15 -  5:30 pm: PANEL: Obstacles in the Implementation of Parallel Algorithms
		  Moderator: Alok Aggarwal (IBM T.J. Watson Research Center)
		  Panelists: TBD

 6:30 pm: pizza party


Tuesday 6/22

 8:30 -  9:00 am: REGISTRATION and breakfast
 9:00 - 10:15 am: INVITED: Jeffrey Vitter (Duke University)
		   Paradigms for Optimal Sorting with Parallel Disks 
		   and Memory Hierarchies.
10:15 - 10:45 am: BREAK
10:45 - 12:00 am: INVITED: David Waltz (Thinking Machines Corporation)
		   Innovative Database Applications of Massively
		   Parallel Processing

12:00 -  1:30 pm: Lunch

 1:30 -  2:45 pm: INVITED: David Scott (Intel Scientific
		   Supercomputing Division)
		   Parallel I/O for Dense Matrix Factorizations
 2:45 -  3:15 pm: BREAK
 3:15 -  3:45 pm: David Womble, David Greenberg, 
		   Stephen Wheat, and Rolf Riesen (Sandia Laboratories)
		   Beyond Core: Making Parallel Computer I/O Practical
 3:45 -  4:15 pm: Thomas Cormen and David Kotz (Dartmouth College) 
		   Integrating Theory and Practice in Parallel File Systems
 4:15 - 5:30 pm: INDUSTRIAL PANEL: 
		  Challenges of Very Large Data Storage, Retrieval,
		  and Manipulation in Parallel Computing
    	    	 Moderator: Marina Chen (Yale)
    	    	 Panelists:
    	    	   Peter Corbett (IBM T. J. Watson)
    	    	   Andrew Ogielski (Bellcore)
    	    	   ... TBD ...

 6:30 pm: cookout


Wednesday 6/23

 8:30 -  9:00 am: REGISTRATION and breakfast
 9:00 - 10:15 am: INVITED: Alok Aggarwal (IBM T.J. Watson Research Center)
		   Communication  Latency In PRAM Computations
10:15 - 10:45 am: BREAK
10:45 - 11:30 am: P. Gloor, D. Johnson, F. Makedon, J. Matthews, P. Metaxas
                  (Dartmouth College) 
                  DEMO: DAGS '92 Multimedia Proceedings Project 
11:30 - 12:00 am: Andrew Chin (Texas A&M University)
		   Locality-Preserving Hash Functions for General
		   Purpose Parallel Computation

12:00 -  1:30 pm: Lunch

 1:30 -  2:00 pm: B. Dixon and A. K. Lenstra (Princeton University, Bellcore)
		   Fast Massively Parallel Modular Arithmetic
 2:00 -  2:30 pm: Lars S. Nyland, Jan F. Prins, and John H. Reif
		   (University of North Carolina, Duke University)
		   A Data-Parallel Implementation of the Adaptive Fast
		   Multipole Algorithm
 2:30 -  3:00 pm: BREAK
 3:00 -  4:15 pm: INVITED: Garth Gibson (Carnegie Mellon University)
		   Informed Prefetching: Converting High Throughput to
		   Low Latency 
 4:15 -  5:30 pm: PANEL: Large Scale Applications
		   Moderator: George Cybenko (Dartmouth College)
		   Panelists: TBD



DAGS '93 PROGRAM COMMITTEE: Guy Blelloch, Tom Cormen, George
Cybenko, Phil Hatcher, Donald Johnson, Dave Kotz, Fillia Makedon
(chair), Panagiotis Metaxas, Grammati Pantziou, Eric Schwabe,
Clifford Stein.

STEERING COMMITTEE: Ken Bogart, Scot Drysdale, Peter Gloor, Donald
Johnson (co-chair), David Kotz, Don Kreider, Bruce Maggs, Fillia Makedon
(co-chair), Panagiotis Metaxas.

LOCAL ARRANGEMENTS: Wayne Cripps, Linda Hathorn, Debra Minichiello
(chair), Janice Thompson, Patricia Wilson.

		SCHOOL ON PARALLEL PROGRAMMING

SCHOOL DESCRIPTION:

The school for parallel programming will be a hands-on course on
programming parallel algorithms using a parallel language called NESL
as the focal system, as well as C* and High Performance Fortran (HPF).
The course will introduce several parallel data structures and a
variety of parallel algorithms and then look at how they can be
programmed.  The students will complete programming assignments on a
parallel computer.  The only prior knowledge needed is knowledge of C
programming, and after completing the course, the students come away
with an understanding of why data-parallel programming on MIMD
computers is such a pervasive trend today.  Students will also have a
chance to juxtapose NESL with C*, a language used in the real world.
Demonstrations of other parallel systems and of tools and issues
related to the uses of visualization in parallel programming are being
planned with various companies.

SCHOOL FACULTY:

	Professor Guy Blelloch, Carnegie Mellon University 

Prof Blelloch received his Ph.D. from MIT in 1988 and has been on the
faculty of CMU since.  He is an excellent lecturer, a well-known
researcher in the field of parallel computation, and the developer of
NESL.

	Professor Phil Hatcher, University of New Hampshire  

Prof. Hatcher received his Ph.D. from the Illinois Institute of
Technology and is currently an Associate Professor of CS at UNH.  He
is co-author of the book "Data-Parallel Programming on MIMD Computers".

	Professor Michael Quinn, Oregon  State University 

Prof. Quinn received his Ph.D. from Washington State Univ. and is
currently an Associate Professor of Computer Science at Oregon State.
He is co-author of the book "Data-Parallel Programming on MIMD
Computers" and the editor of IEEE Parallel and Distributed Technology.

	Dr. Gary W. Sabot, Thinking Machines Corporation

Dr. Sabot received his Ph.D. from Harvard University in 1988.  He
is the author of the book "The Paralation Model:
Architecture-Independent Parallel Programming" and editor of the
book "High Performance Computing".  His research interests
include parallel computer architecture, programming languages,
and performance analysis.

OTHER SUPPORTING SCHOOL FACULTY: 
Tom Cormen, George Cybenko, Donald Johnson, David Kotz, Fillia
Makedon, Panagiotis Metaxas, Grammati Pantziou, Clifford Stein.

			FUNDING
Funding is available, on a competitive basis, to help defray the
costs of attending both the school and the symposium.  A
description of the three programs appears below.

THE DAGS FELLOWS PROGRAM

Persons who have received their Ph.D. within the last four years can
qualify for a grant to defray the cost of participation.  To apply,
please submit a vita and a letter of support from an established
researcher and/or advisor as well as a 500 word statement indicating
your interests in parallel computation and how participation in DAGS
will help you in your work.  Deadline for the DAGS FELLOWS PROGRAM is
May 15 and awards will be given on a first-come first-served basis
according to merit.  DAGS Fellows are expected to cover their own
expenses in advance as necessary.  While grant awards will be
announced by June 1, the awards will not be given until the end of the
institute session.  Please address any questions about the DAGS
FELLOWS PROGRAM to Professor Cliff Stein 
(email: cliff@bondcliff.dartmouth.edu tel: 603-646-2760)

DAGS INTERNS PROGRAM: GRADUATE STUDENT TRAVEL GRANTS

A limited number of travel grants will be available to graduate
students who wish to participate.  Full time graduate students may
apply by sending a letter of request explaining how this institute
relates to their Ph.D. work in progress.  A letter supplying evidence
that the student is full-time and a letter of reference or support
from the chairman or academic advisor are both required.  Deadline for
the DAGS INTERNS PROGRAM is May 15.  The selections will be done by
the program committee.  DAGS INTERNS are expected to cover their own
expenses in advance as necessary.  While grant awards will be
announced by June 1, the awards will not be given until the end of the
institute session.  Please address any questions about the DAGS
INTERNS PROGRAM to Professor Tom Cormen
(email: thc@monroe.dartmouth.edu tel: 603-646-2417)

DAGS SCHOLARS PROGRAM: UNDERGRADUATE STUDENTS AWARDS

A limited number of undergraduate awards are available to students
that are interested in attending the 1993 DAGS Symposium and/or School 
on Parallel Computation. The  awards are available on a competitive 
basis to full-time undergraduate students. Women and minorities are 
encouraged to apply. Six  of the awards will be given to undergraduate 
students from NECUSE institutions and will cover participation and 
living expenses. To apply, send (a) a letter describing your interest 
in parallel computation, (b) an official copy of your transcripts 
supplying evidence of your full-time student status and (c) a letter  of 
reference  from a faculty member of your department.
While the awards will be announced by June 1, the awards will not be
given until the end of the institute session.
Please address any questions about the DAGS SCHOLARS PROGRAM to DAGS
Scholars Program, c/o Prof. Panagiotis Metaxas, Computer Science
Department, Wellesley College, Wellesley, MA 02181, by May 15, 1993.

                     TRAVEL INFORMATION

Dartmouth is served by Delta Airlines, Northwest Airlines and USAir
via the Lebanon, NH, airport, which is about six miles from the
campus.  Dartmouth may also be reached via the Manchester, NH, and
Burlington, VT airports, both about ninety miles from the campus.  The
most convenient way to travel between either Manchester or Burlington
and Dartmouth is to rent a car.

Dartmouth is five miles from Interstate 89 and two miles from
Interstate 91, the preferred routes for those who travel by car.  Free
parking is available for those who travel by car.

Dartmouth is also served by Vermont Transit bus from Boston and
Burlington at a cost of about $30 each way.  Some but not all buses
stop at the Dartmouth campus.  Amtrak serves White River Junction,
Vermont, which is about ten miles from Dartmouth.

                   DINING AND ACCOMMODATIONS
LODGING

A limited number of rooms have been  reserved  at  two  Dartmouth
dormitories  and at the Hanover Inn, all within two blocks of the
conference site.  Daily rates for the Hanover Inn  are  $129/room
(single  or double) plus 8% tax.  For reservations please contact
the Hanover Inn directly:  1-800-443-7024.  Mention DAGS'93.

All dormitory rooms have shared baths.  Rooms will  be  allocated
on a first-come first-served basis.  There is also a small number
of rooms for handicapped persons.

Daily room rates (do not include meals) are as follows:

Andres - (new dorm, air conditioned) $33.50/person plus $19.00 for spouse and
$17.00/child.

Ripley(older dorm, not air conditioned) $25.25/person plus $15.25 for spouse
and $14.00/child.

The dorms are not open on Sunday, June 20, so if you are arriving
early, you must make other arrangements for the first night.  Rooms
have been reserved at the Hanover Inn, and less expensive rooms are
available at area motels.

To guarantee availability at these rates, room and dining reservations
can be made no later than June 15.



DINING

There is an optional meal plan for the school, which includes breakfast
lunch and dinner.  The cost is $115 for the duration of the school. 
These meals will be served in the Dartmouth Thayer Dining Hall, one
block from the meeting room and two blocks from the dormitories.
Meals may also be purchased in the dining hall on a meal-by-meal basis
or at restaurants in Hanover.

There is no meal plan for the symposium.  Meals not provided at the
symposium may be purchased in the dining hall on a meal-by-meal basis
or at restaurants in Hanover.
 

            DAGS/PC '93 REGISTRATION INFORMATION
			
			SYMPOSIUM

    EARLY REGISTRATION (by 5/15)     LATE REGISTRATION (after 5/15)

Regular participant    $195                            $250
Full time student       $75                             $90

The registration fee for all participants, student and non-student
includes a reception on Sunday evening, a pizza party on Monday
evening, a cookout on Tuesday evening, breakfast on Tuesday and
Wednesday morning, coffee breaks, a set of informal proceedings for
this year's symposium and a multimedia-CDROM version of last year's
proceedings.  Individual cookout tickets are available for $20 and
must be reserved and paid for by June 20.

Please fill in the information needed for  registration  on  the
attached form.  Make your payment by check or international money
order in US dollars and payable through a US bank to "Dartmouth
DAGS'93".

			SCHOOL

    EARLY REGISTRATION (by 5/15)     LATE REGISTRATION (after 5/15)

All participants     $350                         $450

School registration includes materials, coffee breaks, DAGS'93
symposium proceedings, and multimedia-CD proceedings for DAGS'92.

			COMBINED REGISTRATION

People registering for both the school and the symposium may
subtract $70 from the cost, and receive only one copy of both
proceedings. 

    EARLY REGISTRATION (by 5/15)     LATE REGISTRATION (after 5/15)

Regular participant    $475                            $630
Full time student      $355                            $455


In case of questions, please contact Deb Minichiello, local
arrangements chair (email: deb.minichiello@Dartmouth.edu, tel:
603-643-1358, fax: 603-646-1312, msgs: 603-646-3048).  You can also
contact Professor Fillia Makedon, DAGS General Chair (email:
makedon@dartmouth.edu, tel: 603-646-3048) Confirmations will be done
by email.  If you don't receive a confirmation within three weeks of
payment, please contact Deb at the address above.  Updated versions of
this document can be obtained by sending email to
dags@cs.dartmouth.edu.


---------------------- REGISTRATION FORM ------------------------

Return this form together with payment by May 15 (to avoid late
fee) to:

     DAGS '93 
     Dartmouth Institute for Advanced Graduate Studies
     Department of Mathematics and Computer Science
     Dartmouth College
     6188 Bradley Hall
     Hanover NH 03755-3551


Please, print legibly:
LAST NAME__________________ FIRST ________________ MIDDLE________
TITLE______________________ DEPT_________________________________
TELEPHONE____________(HOME) _____________(OFFICE) FAX____________
ELECTRONIC ADDRESS____________________________
FULL INSTITUTIONAL ADDRESS





NUMBER OF ACCOMPANYING PERSONS ______ (no  registration  fee  re-
quired)

	
REGISTERING FOR SYMPOSIUM?  		         YES ____   NO ____    
REGISTERING FOR SCHOOL?	    		         YES ____   NO ____


DO YOU QUALIFY AND WISH TO APPLY TO 
THE DAGS INTERNS/FELLOWS/SCHOLARS  PROGRAM       YES ____   NO ____
(please include application materials as described above)

NOTE: Even if you are applying to the DAGS INTERNS/FELLOWS/SCHOLARS
program, you must send in full payment with your registration.  If you
are accepted into one of these programs, you will receive your funding
at the end of the symposium/school.


AMOUNT ENCLOSED FOR REGISTRATION $__________ 
Make check payable to "Dartmouth DAGS'93" 
(Payment for registration does not include meals and accommodations)

-----------------------------------------------------------------

-----------------ROOM AND MEAL REGISTRATION FORM-----------------
Please return this form with your remittance to:

     Conference Administration
     Attn:  DAGS
     Hallgarten Hall
     Dartmouth College
     Hanover, NH 03755

LAST  NAME_______________   FIRST____________   MIDDLE_____
TITLE______________   DEPT______________
TELEPHONE______________(h) ______________(o)     FAX____________
ELECTRONIC ADDRESS__________________
FULL INSTITUTIONAL ADDRESS





ACCOMPANYING MEMBERS  ______________
DAY AND TIME OF ARRIVAL_________ DAY AND TIME OF DEPARTURE_________

SPECIAL NEEDS: Handicapped Yes__ No__
Food  restrictions  (please describe):
Number and ages of accompanying children:


Please reserve accommodations for ___ persons @____ total per day
Facility  chosen  (circle one) Andres   Ripley
for (circle)   Mo 6/21, Tu 6/22, We 6/23, 
               Th 6/24, Fr 6/25, Sa 6/26, Su 6/27,
               Mo 6/28, Tu 6/29, We 6/30

                         Total for accommodations:        $_______


Optional  school meal plan at $115.00/person for ___ person
                         Total for meals:                $_______
Additional cookout tickets @ $20.00/person for ____ persons
                          Total for additional cookout: $________

(Make check payable to "Dartmouth College")
                                        CHECK  ENCLOSED  $_______
-----------------------------------------------------------------

From dfk@wildcat.dartmouth.edu  Sun Jul 18 18:08:28 1993
Received: from wildcat.dartmouth.edu by titan.cs.rice.edu (AA19491); Sun, 18 Jul 93 18:08:28 CDT
Received: by wildcat.dartmouth.edu (5.65D1/4.1)
	id AA20261; Sun, 18 Jul 93 19:08:20 -0400
Date: Sun, 18 Jul 93 19:08:20 -0400
From: dfk@wildcat.dartmouth.edu (David Kotz)
Message-Id: <9307182308.AA20261@wildcat.dartmouth.edu>
To: hpff-io@cs.rice.edu
Subject: updated parallel I/O bibliography


[apologies if you receive multiple copies due to replication on many lists]

BibTeX bibliography file: Parallel I/O

Fourth Edition
July 18, 1993

This supercedes my older bibliographies.

Nearly a year has passed since the third edition of this bibliography,
and a lot has happened in the field of parallel I/O. There was the
JPDC special issue, the IPPS workshop, and the Dartmouth DAGS
symposium, all with a focus on parallel I/O issues. I expect interest
in this topic to grow even more rapidly in the coming year.  It is
truly exciting to be involved in this field right now.

This bibliography covers parallel I/O, with a significant emphasis on
file systems rather than, say, network or graphics I/O. This includes
architecture, operating systems, some algorithms, some databases, and
some workload characterization. Because of the expanding nature of
this field, I cannot cover everything, and this bibliography is
admittedly spotty on topics like disk array reliability, parallel I/O
algorithms, parallel databases, and parallel networking.

The entries are alphabetized by cite key. The emphasis is on including
everything I have, rather than selecting a few key articles of
interest.  Thus, you probably don't want (or need) to read everything
here. There are many repeated entries, in the sense that a paper is
often published first as a TR, then in a conference, then in a
journal. There is a net gain of 82 entries since last year. 

Except where noted, all comments are mine, and any opinions expressed
there are mine only. In some cases I am simply restating the opinion
or result obtained by the paper's authors, and thus even I might
disagree with the statement. I keep most editorial comments to a
minimum.

Please let me know if you have any additions or corrections.  You may
use the bibliography (and copy it around) as you please except for
publishing it as a whole, since the compilation is mine.

Please leave this header on the collection; BibTeX won't mind. 

This bibliography (and many others) is archived in ftp.cse.ucsc.edu:pub/bib.

David Kotz
Assistant Professor
Mathematics and Computer Science
Dartmouth College
6188 Bradley Hall
Hanover NH 03755-3551
@string {email = "David.Kotz@Dartmouth.edu"} % have to hide this from bibtex
-----------------------------------------------------------------------------

@InProceedings{abali:ibm370,
  author = {B\"{u}lent Abali and Bruce D. Gavril and Richard W. Hadsell and
  Linh Lam and Brion Shimamoto},
  title = {{Many/370: A} Parallel Computer Prototype for {I/O} Intensive
  Applications},
  booktitle = {Sixth Annual Distributed-Memory Computer Conference},
  year = {1991},
  pages = {728--730},
  keyword = {parallel I/O, multiprocessor file system, pario bib},
  comment = {Describes a parallel IBM/370, where they attach several small 370s
  to a switch, and several disks to each 370. Not much in the way of striping.}
}

@Article{abu-safah:speedup,
  author = {Walid Abu-Safah and Harlan Husmann and David Kuck},
  title = {On {Input/Output} Speed-up in Tightly-coupled Multiprocessors},
  journal = {IEEE Transactions on Computers},
  year = {1986},
  pages = {520--530},
  keyword = {parallel I/O, I/O, pario bib},
  comment = {Derives formulas for the speedup with and without I/O considered
  and with parallel software and hardware format conversion. Considering I/O
  gives a more optimistic view of the speedup of a program {\em assuming} that
  the parallel version can use its I/O bandwidth as effectively as the serial
  processor. Concludes that, for a given number of processors, increasing the
  I/O bandwidth is the most effective way to speed up the program (over the
  format conversion improvements).}
}

@Article{aggarwal:sorting,
  author = {Alok Aggarwal and Jeffrey Scott Vitter},
  title = {The Input/Output Complexity of Sorting and Related Problems},
  journal = {Communications of the ACM},
  year = {1988},
  month = {September},
  volume = {31},
  number = {9},
  pages = {1116--1127},
  keyword = {parallel I/O, sorting, pario bib},
  comment = {Good comments on typical external sorts, and how big they are.
  Focuses on parallelism at the disk. They give tight theoretical bounds on the
  number of I/O's required to do external sorting and other problems (FFTs,
  matrix transpose, {\em etc.}). If $x$ is the number of blocks in the file and $y$
  is the number of blocks that fit in memory, then the number of I/Os needed
  grows as $\Theta (x \log x / \log y)$. If parallel transfers of $p$ blocks
  are allowed, speedup linear in $p$ is obtained.}
}

@InProceedings{alverson:tera,
  author = {Robert Alverson and David Callahan and Daniel Cummings and Brian
  Koblenz and Allan Porterfield and Burton Smith},
  title = {The {Tera} Computer System},
  booktitle = {1990 International Conference on Supercomputing},
  year = {1990},
  pages = {1--6},
  keyword = {parallel architecture, MIMD, NUMA, pario bib},
  comment = {Interesting architecture. 3-d mesh of pipelined packet-switch
  nodes, e.g., 16x16x16 is 4096 nodes, with 256 procs, 512 memory units, 256 I/O
  cache units, and 256 I/O processors attached. 2816 remaining nodes are just
  switching nodes. Each processor is 64-bit custom chip with up to 128
  simultaneous threads in execution. It alternates between ready threads, with
  a deep pipeline. Inter-instruction dependencies explicitly encoded by the
  compiler, stalling those threads until the appropriate time. Each thread has
  a complete set of registers! Memory units have 4-bit tags on each word, for
  full/empty and trap bits. Shared memory across the network: NUMA.}
}

@TechReport{arendt:genome,
  author = {James W. Arendt},
  title = {Parallel Genome Sequence Comparison Using a Concurrent File System},
  year = {1991},
  number = {UIUCDCS-R-91-1674},
  institution = {University of Illinois at Urbana-Champaign},
  keyword = {parallel file system, parallel I/O, Intel iPSC/2, pario bib},
  comment = {Studies the performance of Intel CFS. Uses an application that
  reads in a huge file of records, each a genome sequence, and compares each
  sequence against a given sequence. Looks at cache performance, message
  latency, cost of prefetches and directory reads, and throughput. He tries
  one-disk, one-proc transfer rates. Because of contention with the directory
  server on one of the two I/O nodes, it was faster to put all of the file on
  the other I/O node. Striping is good for multiple readers. Best access
  pattern was interleaved, not segmented or separate files, because it avoided
  disk seeks. Perhaps the files are stored contiguously? Can get good speedup
  by reading the sequences in big integral record sizes, from CFS, using a
  load-balancing scheduled by the host. Contention for directory blocks --
  through single-node directory server.}
}

@InProceedings{asbury:fortranio,
  author = {Raymond K. Asbury and David S. Scott},
  title = {{FORTRAN} {I/O} on the {iPSC/2}: Is there read after write?},
  booktitle = {Fourth Conference on Hypercube Concurrent Computers and
  Applications},
  year = {1989},
  pages = {129--132},
  keyword = {parallel I/O, hypercube, Intel iPSC/2, file access pattern, pario
  bib}
}

@InProceedings{baird:disa,
  author = {R. Baird and S. KaraMooz and H. Vazire},
  title = {Distributed Information Storage Architecture},
  booktitle = {Proceedings of the Twelfth IEEE Symposium on Mass Storage
  Systems},
  year = {1993},
  pages = {145--155},
  keyword = {parallel I/O, distributed file system, mass storage, pario bib},
  comment = {Architecture for distributed information storage. Integrates file
  systems, databases, etc. Single system image, lots of support for
  administration. O-O model, with storage device objects, logical device
  objects, volume objects, and file objects. Methods for each type of object,
  including administrative methods.}
}

@InProceedings{baldwin:hyperfs,
  author = {C. H. Baldwin and W. C. Nestlerode},
  title = {A Large Scale File Processing Application on a Hypercube},
  booktitle = {Fifth Annual Distributed-Memory Computer Conference},
  year = {1990},
  pages = {1400-1404},
  keyword = {multiprocessor file system, file access pattern, parallel I/O,
  hypercube, pario bib},
  comment = {Census-data processing on an nCUBE/10 at USC. Their program uses
  an interleaved pattern, which is like my lfp or gw with multi-record records
  (i.e., the application does its own blocking). Shifted to asynchronous I/O to
  do OBL manually. Better results if they did more computation per I/O (of
  course).}
}

@TechReport{barak:hfs,
  author = {Amnon Barak and Bernard A. Galler and Yaron Farber},
  title = {A Holographic File System for a Multicomputer with Many Disk Nodes},
  year = {1988},
  month = {May},
  number = {88-6},
  institution = {Dept. of Computer Science, Hebrew University of Jerusalem},
  keyword = {parallel I/O, hashing, reliability, disk mirroring, pario bib},
  comment = {Describes a file system for a distributed system that scatters
  records of each file over many disks using hash functions. The hash function
  is known by all processors, so no one processor must be up to access the
  file. Any portion of the file whose disknode is available may be accessed.
  Shadow nodes are used to take over for nodes that go down, saving the info
  for later use by the proper node. Intended to easily parallelize read/write
  accesses and global file operations, and to increase file availability.}
}

@Article{batcher:staran,
  author = {K. E. Batcher},
  title = {{STARAN} Parallel Processor System Hardware},
  journal = {AFIPS Conference Proceedings},
  year = {1974},
  pages = {405--410},
  keyword = {parallel architecture, array processor, parallel I/O, SIMD, pario
  bib},
  comment = {This paper is reproduced in Kuhn and Padua's (1981, IEEE) survey
  ``Tutorial on Parallel Processing.'' The STARAN is an array processor that
  uses Multi-Dimensional-Access (MDA) memories and permutation networks to
  access data in bit slices in a variety of ways, with high-speed I/O
  capabilities. Its router (called the {\em flip} network) could permute data
  among the array processors, or between the array processors and external
  devices, including disks, video input, and displays.}
}

@Manual{bbn:admin,
  key = {BBN},
  author = {BBN Advanced {Computers Inc.}},
  title = {{TC2000} System Administration Guide},
  edition = {Revision 3.0},
  year = {1991},
  month = {April},
  keyword = {BBN, parallel I/O, pario bib},
  comment = {Administrative manual for the TC2000 I/O system. Can stripe over
  partitions in a user-specified set of disks. Large requests automatically
  split and done in parallel.}
}

@InProceedings{bell:physics,
  author = {Jean L. Bell},
  title = {A Specialized Data Management System for Parallel Execution of
  Particle Physics Codes},
  booktitle = {ACM SIGMOD Conference},
  year = {1988},
  pages = {277--285},
  keyword = {file access pattern, disk prefetch, file system, pario bib},
  comment = {A specialized database system for particle physics codes. Valuable
  for its description of access patterns and subsequent file access
  requirements. Particle-in-cell codes iterate over timesteps, updating the
  position of each particle, and then the characteristics of each cell in the
  grid. Particles may move from cell to cell. Particle update needs itself and
  nearby gridcell data. The whole dataset is too big for memory, and each
  timestep must be stored on disk for later analysis anyway. Regular file
  systems are inadequate: specialized DBMS is more appropriate. Characteristics
  needed by their application class: multidimensional access (by particle type
  or by location, i.e., multiple views of the data), coordination between grid
  and particle data, coordination between processors, coordinated access to
  meta-data, inverted files, horizontal clustering, large blocking of data,
  asynchronous I/O, array data, complicated joins, and prefetching according to
  user-prespecified order. Note that many of these things can be provided by a
  file system, but that most are hard to come by in typical file systems, if
  not impossible. Many of these features are generalizable to other
  applications.}
}

@InProceedings{benner:pargraphics,
  author = {Robert E. Benner},
  title = {Parallel Graphics Algorithms on a 1024-Processor Hypercube},
  booktitle = {Fourth Conference on Hypercube Concurrent Computers and
  Applications},
  year = {1989},
  pages = {133--140},
  keyword = {hypercube, graphics, parallel algorithm, parallel I/O, pario bib},
  comment = {About using the nCUBE/10's RT Graphics System. They were
  frustrated by an unusual mapping from the graphics memory to the display, a
  shortage of memory on the graphics nodes, and small message buffers on the
  graphics nodes. They wrote some algorithms for collecting the columns of
  pixels from the hypercube nodes, and routing them to the appropriate graphics
  node. They also would have liked a better interconnection network between the
  graphics nodes, at least for synchronization.}
}

@InProceedings{best:cmmdio,
  author = {Michael L. Best and Adam Greenberg and Craig Stanfill and Lewis W.
  Tucker},
  title = {{CMMD I/O}: A Parallel {Unix I/O}},
  booktitle = {Proceedings of the Seventh International Parallel Processing
  Symposium},
  year = {1993},
  pages = {489--495},
  keyword = {parallel I/O, multiprocessor file system, pario bib},
  comment = {Much like Intel CFS, with different I/O modes that determine when
  the compute nodes synchronize, and the semantics of I/Os written to the file.
  They found it hard to get good bandwidth for independent I/Os, as opposed to
  coordinated I/Os; part of this was due to their RAID~3 disk array, but it is
  more complicated than that. Some performance numbers were given in talk.}
}

@InProceedings{bestavros:raid,
  author = {Azer Bestavros},
  title = {{IDA}-Based Redundant Arrays of Inexpensive Disks},
  booktitle = {Proceedings of the First International Conference on Parallel
  and Distributed Information Systems},
  year = {1991},
  pages = {2--9},
  keyword = {RAID, disk array, reliability, parallel I/O, pario bib},
  comment = {Uses the Information Dispersal Algorithm (IDA) to generate $n+m$
  blocks from $n$ blocks, to tolerate $m$ disk failures; all of the data from
  the $n$ blocks is hidden in the $n+m$ blocks. Not with the RAID project.}
}

@InProceedings{bitton:schedule,
  author = {Dina Bitton},
  title = {Arm Scheduling in Shadowed Disks},
  booktitle = {Proceedings of IEEE Compcon},
  year = {1989},
  month = {Spring},
  pages = {132--136},
  keyword = {parallel I/O, disk shadowing, reliability, disk mirroring, disk
  optimization, pario bib},
  comment = {Goes further than bitton:shadow. Uses simulation to verify results
  from that paper, which were expressions for the expected seek distance of
  shadowed disks, using shortest-seek-time arm scheduling. Problem is her
  assumption that arm positions stay independent, in the face of correlating
  effects like writes, which move all arms to the same place. Simulations match
  model only barely, and only in some cases. Anyway, shadowed disks can improve
  performance for workloads more than 60 or 70\% reads.}
}

@InProceedings{bitton:shadow,
  author = {D. Bitton and J. Gray},
  title = {Disk Shadowing},
  booktitle = {14th International Conference on Very Large Data Bases},
  year = {1988},
  pages = {331--338},
  keyword = {parallel I/O, disk shadowing, reliability, disk mirroring, disk
  optimization, pario bib},
  comment = {Also TR UIC EECS 88-1 from Univ of Illinois at Chicago. Shadowed
  disks are mirroring with more than 2 disks. Writes to all disks, reads from
  one with shortest seek time. Acknowledges but ignores problem posed by
  lo:disks. Also considers that newer disk technology does not have linear seek
  time $(a+bx)$ but rather $(a+b\sqrt{x})$. Shows that with either seek
  distribution the average seek time for workloads with at least 60\% reads
  decreases in the number of disks. See also bitton:schedule.}
}

@Article{boral:bubba,
  author = {Haran Boral and William Alexander and Larry Clay and George
  Copeland and Scott Danforth and Michael Franklin and Brian Hart and Marc
  Smith and Patrick Valduriez},
  title = {Prototyping {Bubba}, a Highly Parallel Database System},
  journal = {IEEE Transactions on Knowledge and Data Engineering},
  year = {1990},
  month = {March},
  volume = {2},
  number = {1},
  keyword = {parallel I/O, database, disk caching, pario bib},
  comment = {More recent than copeland:bubba, and a little more general. This
  gives few details, and doesn't spend much time on the parallel I/O. Bubba
  does use parallel independent disks, with a significant effort to place data
  on the disks, and do the work local to the disks, to balance the load and
  minimize interprocessor communication. Also they use a single-level store
  (i.e., memory-mapped files) to improve performance of their I/O system,
  including page locking that is assisted by the MMU. The OS has hooks for the
  database manager to give memory-management policy hints.}
}

@InProceedings{boral:critique,
  author = {H. Boral and D. {DeWitt}},
  title = {Database machines: an idea whose time has passed?},
  booktitle = {Proceedings of the Fourth International Workshop on Database
  Machines},
  year = {1983},
  pages = {166--187},
  publisher = {Springer-Verlag},
  keyword = {file access pattern, parallel I/O, database machine, pario bib},
  comment = {Improvements in I/O bandwidth crucial for supporting database
  machines, otherwise highly parallel DB machines are useless (I/O bound). Two
  ways to do it: 1) synchronized interleaving by using custom controller and
  regular disks to read/write same track on all disks, which speeds individual
  accesses. 2) use very large cache (100-200M) to keep blocks to re-use and to
  do prefetching. But see dewitt:pardbs.}
}

@TechReport{bordawekar:delta-fs,
  author = {Rajesh Bordawekar and Alok Choudhary and Juan Miguel Del Rosario},
  title = {An Experimental Performance Evaluation of {Touchstone Delta
  Concurrent File System}},
  year = {1992},
  number = {SCCS-420},
  institution = {NPAC, Syracuse University},
  note = {To appear, 1993 International Conference on Supercomputing},
  keyword = {performance evaluation, multiprocessor file system,
  parallel I/O, pario bib},
  comment = {Evaluating the Caltech Touchstone Delta (512 nodes, 32 I/O nodes,
  64 disks, 8 MB cache per I/O node). Basic measurements of different access
  patterns and I/O modes. Location in network doesn't seem to matter.
  Throughput is often limited by the software; at least, the full hardware
  throughputs are rarely obtained. Sometimes they are compnode-limited, and
  other times they may be being limited by the cache management. There must be
  a way to push bottleneck back to the disks .}
}

@InProceedings{bradley:ipsc2io,
  author = {David K. Bradley and Daniel A. Reed},
  title = {Performance of the {Intel iPSC/2} Input/Output System},
  booktitle = {Fourth Conference on Hypercube Concurrent Computers and
  Applications},
  year = {1989},
  pages = {141--144},
  keyword = {hypercube, parallel I/O, Intel, pario bib},
  comment = {Some measurements and simulations of early CFS performance. Looks
  terrible, but they disclaim that it is a beta version of the first CFS. They
  determined that the disks are the bottleneck. But this may just imply that
  they need more disks. Their parallel synthetic applications had each process
  read a separate file. Files were too short (16K??). CFS had ridiculous
  traffic overhead. Again, this was beta CFS.}
}

@InProceedings{brezany:hpf,
  author = {Peter Brezany and Michael Gernt and Piyush Mehotra and Hans Zima},
  title = {Concurrent File Operations in a {High Performance FORTRAN}},
  booktitle = {Proceedings of Supercomputing '92},
  year = {1992},
  pages = {230--237},
  keyword = {supercomputing, fortran, multiprocessor file system interface,
  pario bib},
  comment = {Describing their way of writing arrays to files so that they are
  written in a fast, parallel way, and so that (if read in same distribution)
  they can be read fast and parallel. Normal read and write forces standard
  ordering, but cread and cwrite uses a compiler and runtime selected ordering,
  which is stored in the file so it can be used when rereading. Good for temp
  files.}
}

@InProceedings{broom:acacia,
  author = {Bradley M. Broom},
  title = {A Synchronous File Server for Distributed File Systems},
  booktitle = {Proceedings of the 16th Australian Computer Science Conference},
  year = {1993},
  keyword = {distributed file system, pario bib},
  comment = {See broom:acacia-tr. See also broom:impl, lautenbach:pfs,
  mutisya:cache, and broom:cap.}
}

@TechReport{broom:acacia-tr,
  author = {Bradley M. Broom},
  title = {A Synchronous File Server for Distributed File Systems},
  year = {1992},
  month = {August},
  number = {TR--CS--92--12},
  institution = {Dept. of Computer Science, Australian National University},
  keyword = {distributed file system, pario bib},
  comment = {This paper is not specifically about parallel I/O, but the file
  system will be used in the AP-1000 multiprocessor. Acacia is a file server
  that is optimized for synchronous writes, like those used in stateless
  protocols (eg, NFS). It writes inodes in blocks in any free location that is
  close to the current head position, using indirect inode blocks to track
  those. Indirect blocks are in turn written anywhere convenient, and their
  positions are tracked by the superblock. There is one slot in each cylinder
  reserved for the superblock, which is timestamped. They get good performance
  but claim to need a better implementation, and a faster allocation algorithm.
  No indication of effect on read performance. Cite broom:acacia.}
}

@InProceedings{broom:cap,
  author = {Bradley M. Broom and Robert Cohen},
  title = {Acacia: A Distributed, Parallel File System for the {CAP-II}},
  booktitle = {Proceedings of the First Fujitsu-ANU CAP Workshop},
  year = {1990},
  month = {November},
  keyword = {distributed file system, multiprocessor file system, pario bib},
  comment = {See also broom:acacia, broom:impl, lautenbach:pfs, and
  mutisya:cache.}
}

@InProceedings{broom:impl,
  author = {Bradley M. Broom},
  title = {Implementation and Performance of the Acacia File System},
  booktitle = {Proceedings of the Second Fujitsu-ANU CAP Workshop},
  year = {1991},
  month = {November},
  keyword = {distributed file system, multiprocessor file system, pario bib},
  comment = {See also broom:acacia, lautenbach:pfs, mutisya:cache, and
  broom:cap.}
}

@InProceedings{browne:io-arch,
  author = {J. C. Browne and A. G. Dale and C. Leung and R. Jenevein},
  title = {A Parallel Multi-Stage {I/O} Architecture with Self-managing Disk
  Cache for Database Management Applications},
  booktitle = {Proceedings of the Fourth International Workshop on Database
  Machines},
  year = {1985},
  month = {March},
  publisher = {Springer-Verlag},
  keyword = {parallel I/O, disk caching, database, pario bib},
  comment = {A fancy interconnection from procs to I/O processors, intended
  mostly for DB applications, that uses cache at I/O end and a switch with
  smarts. Cache is associative. Switch helps out in sort and join operations.}
}

@Article{cabrera:pario,
  author = {Luis-Felipe Cabrera and Darrell D. E. Long},
  title = {Swift: {Using} Distributed Disk Striping to Provide High {I/O} Data
  Rates},
  journal = {Computing Systems},
  year = {1991},
  month = {Fall},
  volume = {4},
  number = {4},
  keyword = {parallel I/O, disk striping, distributed file system, pario bib},
  comment = {See cabrera:swift, cabrera:swift2. Describes the performance of a
  Swift prototype and simulation results. They stripe data over multiple disk
  servers (here SPARC SLC with local disk), and access it from a SPARC2 client.
  Their prototype gets nearly linear speedup for reads and asynchronous writes;
  synchronous writes are slower. They hit the limit of the Ethernet and/or the
  client processor with three disk servers. Adding another Ethernet allowed
  them to go higher. Simulation shows good scaling. Seems like a smarter
  implementation would help, as would special- purpose parity-computation
  hardware. Good arguments for use of PID instead of RAID, to avoid a
  centralized controller that is both a bottleneck and a single point of
  failure.}
}

@TechReport{cabrera:pariotr,
  author = {Luis-Felipe Cabrera and Darrell D. E. Long},
  title = {Swift: {Using} Distributed Disk Striping to Provide High {I/O} Data
  Rates},
  year = {1991},
  number = {CRL-91-46},
  institution = {UC Santa Cruz},
  note = {Appeared in {\em Computing Systems}},
  keyword = {parallel I/O, disk striping, distributed file system, pario bib},
  comment = {Cite cabrera:pario. See that for notes.}
}

@TechReport{cabrera:stripe,
  author = {Luis-Felipe Cabrera and Darell D. E. Long},
  title = {Using Data Striping in a Local Area Network},
  year = {1992},
  month = {March},
  number = {UCSC-CRL-92-09},
  institution = {Univ. California at Santa Cruz},
  keyword = {striping, parallel I/O, distributed system, pario bib},
  comment = {See cabrera:swift2, cabrera:swift, cabrera:pario. Not much new
  here. Simulates higher-performance architectures. Shows reasonable
  scalability. Counts 5 inst/byte for parity computation.}
}

@TechReport{cabrera:swift,
  author = {Luis-Felipe Cabrera and Darrell D. E. Long},
  title = {Swift: A Storage Architecture fo Large Objects},
  year = {1990},
  number = {UCSC-CRL-89-04},
  institution = {U.C. Santa Cruz},
  keyword = {parallel I/O, disk striping, distributed file system, multimedia,
  pario bib},
  comment = {See cabrera:swift2. A brief outline of a design for a
  high-performance storage system, designed for storing and retrieving large
  objects like color video or visualization data at very high speed. They
  distribute data over several ``storage agents'', which are some form of disk
  or RAID. They are all connected by a high-speed network. A ``storage
  manager'' decides where to spread each file, what kind of reliability
  mechanism is used. User provides preallocation info such as size, reliability
  level, data rate requirements, and so forth.}
}

@InProceedings{cabrera:swift2,
  author = {Luis-Felipe Cabrera and Darell D. E. Long},
  title = {Exploiting Multiple {I/O} Streams to Provide High Data-Rates},
  booktitle = {Proceedings of the 1991 Summer Usenix Conference},
  year = {1991},
  pages = {31--48},
  keyword = {parallel I/O, disk striping, distributed file system, multimedia,
  pario bib},
  comment = {See also cabrera:swift. More detail than the other paper.
  Experimental results from a prototype that stripes files across a distributed
  file system. Gets almost linear speedup in certain cases. Much better than
  NFS. Simulation to extend it to larger systems.}
}

@InProceedings{cao:tickertaip,
  author = {Pei Cao and Swee Boon Lim and Shivakumar Venkataraman and John
  Wilkes},
  title = {The {TickerTAIP} parallel {RAID} architecture},
  booktitle = {Proceedings of the 20th Annual International Symposium on
  Computer Architecture},
  year = {1993},
  pages = {52--63},
  keyword = {parallel I/O, RAID, pario bib},
  comment = {See cao:tickertaip-tr.}
}

@TechReport{cao:tickertaip-tr,
  author = {Pei Cao and Swee Boon Lim and Shivakumar Venkataraman and John
  Wilkes},
  title = {The {TickerTAIP} parallel {RAID} architecture},
  year = {1992},
  month = {December},
  number = {HPL-92-151},
  institution = {HP Labs},
  keyword = {parallel I/O, RAID, pario bib},
  comment = {A parallelized RAID architecture that distributes the RAID
  controller operations across several worker nodes. Multiple hosts can connect
  to different workers, allowing multiple paths into the array. The workers
  then communicate on their own fast interconnect to accomplish the requests,
  distributing parity computations across multiple workers. They get much
  better performance and reliability than plain RAID. They built a prototype
  and a performance simulator. Two-phase commit was needed for request
  atomicity, and a request sequencer was needed for serialization. Also found
  it was good to give the whole request info to all workers and to let them
  figure out what to do and when. Cite cao:tickertaip.}
}

@TechReport{carter:benchmark,
  author = {Russell Carter and Bob Ciotti and Sam Fineberg and Bill Nitzberg},
  title = {{NHT-1} {I/O} Benchmarks},
  year = {1992},
  month = {November},
  number = {RND-92-016},
  institution = {NAS Systems Division, NASA Ames},
  keyword = {parallel I/O, benchmark, pario bib},
  comment = {Specs for three scalable-I/O benchmarks to be used for evaluating
  I/O for multiprocessors. One measures application I/O by mixing I/O and
  computation, one measures max disk I/O by reading and writing 80\% of the
  total RAM memory, and the last one is for sending that data from the file
  system, through the network, and back. See fineberg:nht1.}
}

@TechReport{chao:datamesh,
  author = {Chia Chao and Robert English and David Jacobson and Bart Sears and
  Alexander Stepanov and John Wilkes},
  title = {{DataMesh} architecture 1.0},
  year = {1992},
  month = {December},
  number = {HPL-92-153},
  institution = {HP Labs},
  keyword = {parallel I/O, parallel file system, pario bib},
  comment = {A more detailed spec of the datamesh architecture, specifying
  components and operations. It is a block server where blocks are
  associatively addressed by tags. Some search operations are supported, as are
  atomic tag-changing operations. See also cao:tickertaip, wilkes:datamesh1,
  wilkes:datamesh, wilkes:houses, wilkes:lessons.}
}

@InProceedings{chen:eval,
  author = {Peter Chen and Garth Gibson and Randy Katz and David Patterson},
  title = {An Evaluation of Redundant Arrays of Disks using an {Amdahl 5890}},
  booktitle = {Proceedings of the 1990 ACM Sigmetrics Conference on Measurement
  and Modeling of Computer Systems},
  year = {1990},
  month = {May},
  pages = {74--85},
  keyword = {parallel I/O, RAID, disk array, pario bib},
  comment = {A experimental validation of the performance predictions of
  patterson:raid, plus some extensions. Confirms that RAID level 5 (rotated
  parity) is best for large read/writes, and RAID level 1 (mirroring) is best
  for small reads/writes.}
}

@InProceedings{chen:maxraid,
  author = {Peter M. Chen and David A. Patterson},
  title = {Maximizing Performance in a Striped Disk Array},
  booktitle = {Proceedings of the 17th Annual International Symposium on
  Computer Architecture},
  year = {1990},
  pages = {322--331},
  keyword = {parallel I/O, RAID, disk striping, pario bib},
  comment = {Choosing the optimal striping unit, i.e., size of contiguous data
  on each disk (bit, byte, block, {\em etc.}). A small striping unit is good for
  low-concurrency workloads since it increases the parallelism applied to each
  request, but a large striping unit can support high-concurrency workloads
  where each independent request depends on fewer disks. They do simulations to
  find throughput, and thus to pick the striping unit. They find equations for
  the best compromise striping unit based on the concurrency and the disk
  parameters, or on the disk parameters alone. Some key assumptions may limit
  applicability, but this is not addressed.}
}

@TechReport{chen:raid,
  author = {Peter Chen and Garth Gibson and Randy Katz and David Patterson and
  Martin Schulze},
  title = {Two papers on {RAIDs}},
  year = {1988},
  month = {December},
  number = {UCB/CSD 88/479},
  institution = {UC Berkeley},
  keyword = {parallel I/O, RAID, disk array, pario bib},
  comment = {Basically an updated version of patterson:raid and the
  prepublished version of gibson:failcorrect.}
}

@InProceedings{chen:raid2,
  author = {Peter M. Chen and Edward K. Lee and Ann L. Drapeau and Ken Lutz and
  Ethan L. Miller and Srinivasan Seshan and Ken Shirriff and David A. Patterson
  and Randy H. Katz},
  title = {Performance and Design Evaluation of the {RAID-II} Storage Server},
  booktitle = {IPPS~'93 Workshop on Input/Output in Parallel Computer Systems},
  year = {1993},
  pages = {110--120},
  keyword = {parallel I/O, multiprocessor file system, pario bib},
  comment = {A special back-end box for a Sun4 file server, that hooks a HIPPI
  network through a crossbar to fast memory, a parity engine, and a bunch of
  disks on SCSI. They pulled about 20~MB/s through it, basically disk-limited;
  with more disks they would hit 32--40~MB/s. Much improved over RAID-I, which
  was limited by the memory bandwidth of the Sun4 server.}
}

@InProceedings{chervenak:raid,
  author = {Ann L. Chervenak and Randy H. Katz},
  title = {Performance of a Disk Array Prototype},
  booktitle = {Proceedings of the 1991 ACM Sigmetrics Conference on Measurement
  and Modeling of Computer Systems},
  year = {1991},
  pages = {188--197},
  keyword = {parallel I/O, disk array, performance evaluation, RAID, pario
  bib},
  comment = {Measuring the performance of a RAID prototype with a Sun4/280, 28
  disks on 7 SCSI strings, using 4 HBA controllers on a VME bus from the Sun.
  The found lots of bottlenecks really slowed them down. Under Sprite, the
  disks were the bottleneck for single disk I/O, single disk B/W, and string
  I/O. Sprite was a bottleneck for single disk I/O and String I/O. The host
  memory was a bottleneck for string B/W, HBA B/W, overall I/O, and overall
  B/W. With a simpler OS, that saved on data copying, they did better, but were
  still limited by the HBA, SCSI protocol, or the VME bus. Clearly they needed
  more parallelism in the busses and control system.}
}

@Manual{convex:stripe,
  title = {{CONVEX UNIX} Programmer's Manual, Part I},
  edition = {Eighth},
  year = {1988},
  month = {October},
  organization = {CONVEX Computer Corporation},
  address = {Richardson, Texas},
  keyword = {parallel I/O, parallel file system, striping, pario bib},
  comment = {Implementation of striped disks on the CONVEX. Uses partitions of
  normal device drivers. Kernel data structure knows about the interleaving
  granularity, the set of partitions, sizes, etc.}
}

@InProceedings{copeland:bubba,
  author = {George Copeland and William Alexander and Ellen Boughter and Tom
  Keller},
  title = {Data Placement in {Bubba}},
  booktitle = {ACM SIGMOD Conference},
  year = {1988},
  month = {June},
  pages = {99--108},
  keyword = {parallel I/O, database, disk caching, pario bib},
  comment = {A database machine. Experimental/analytical model of a placement
  algorithm that declusters relations across several parallel, independent
  disks. The declustering is done on a subset of the disks, and the choices
  involved are the number of disks to decluster onto, which relations to put
  where, and whether a relation should be cache-resident. Communications
  overhead limits the usefulness of declustering in some cases, depending on
  the workload. See boral:bubba.}
}

@InProceedings{corbett:vesta,
  author = {Peter F. Corbett and Sandra Johnson Baylor and Dror G. Feitelson},
  title = {Overview of the {Vesta} Parallel File System},
  booktitle = {IPPS~'93 Workshop on Input/Output in Parallel Computer Systems},
  year = {1993},
  pages = {1--16},
  keyword = {parallel I/O, multiprocessor file system, concurrent file
  checkpointing, multiprocessor file system interface, pario bib},
  comment = {Design of a file system for a message-passing MIMD multiprocessor
  to be used for scientific computing. Separate I/O nodes from compute nodes;
  I/O nodes and disks are viewed as a data-staging area. File system runs on
  I/O nodes only. Files declustered by record, among physical partitions, each
  residing on a separate disk, and each separately growable. Then the user maps
  logical partitions, one per process, on the file at open time. These are
  designed to be two-dimensional, so that mapping arrays of various strides and
  contiguities, with records as the basic unit, is easy. Various consistency
  and atomicity requirements. File checkpointing, really snapshotting, is built
  in. No client caching, no redundancy for reliability.}
}

@InProceedings{cormen:bmmc,
  author = {Thomas H. Cormen and Leonard F. Wisniewski},
  title = {Asymptotically Tight Bounds for Performing {BMMC} Permutations on
  Parallel Disk Systems},
  booktitle = {Proceedings of the 5th Annual ACM Symposium on Parallel
  Algorithms and Architectures},
  year = {1993},
  month = {June},
  pages = {130--139},
  keyword = {parallel I/O, algorithm, pario bib}
}

@InProceedings{cormen:integrate,
  author = {Thomas H. Cormen and David Kotz},
  title = {Integrating Theory and Practice in Parallel File Systems},
  booktitle = {Proceedings of the 1993 DAGS/PC Symposium},
  year = {1993},
  month = {June},
  pages = {64--74},
  organization = {Dartmouth Institute for Advanced Graduate Studies},
  address = {Hanover, NH},
  note = {Revised from Dartmouth PCS-TR93-188.},
  keyword = {parallel I/O, multiprocessor file systems, algorithm, file system
  interface, dfk, pario bib},
  abstract = {Several algorithms for parallel disk systems have appeared in the
  literature recently, and they are asymptotically optimal in terms of the
  number of disk accesses. Scalable systems with parallel disks must be able to
  run these algorithms. We present for the first time a list of capabilities
  that must be provided by the system to support these optimal algorithms:
  control over declustering, querying about the configuration, independent I/O,
  and turning off parity, file caching, and prefetching. We summarize recent
  theoretical and empirical work that justifies the need for these
  capabilities. In addition, we sketch an organization for a parallel file
  interface with low-level primitives and higher-level operations.},
  comment = {Describing the file system capabilities needed by parallel I/O
  algorithms to effectively use a parallel disk system.}
}

@TechReport{cormen:integrate-tr,
  author = {Thomas H. Cormen and David Kotz},
  title = {Integrating Theory and Practice in Parallel File Systems},
  year = {1993},
  month = {March},
  number = {PCS-TR93--188},
  institution = {Dartmouth College},
  note = {Appeared in Proceedings of the 1993 DAGS/PC Symposium},
  keyword = {parallel I/O, multiprocessor file systems, algorithm, file system
  interface, dfk, pario bib},
  abstract = {Several algorithms for parallel disk systems have appeared in the
  literature recently, and they are asymptotically optimal in terms of the
  number of disk accesses. Scalable systems with parallel disks must be able to
  run these algorithms. We present for the first time a list of capabilities
  that must be provided by the system to support these optimal algorithms:
  control over declustering, querying about the configuration, independent I/O,
  and turning off parity, file caching, and prefetching. We summarize recent
  theoretical and empirical work that justifies the need for these
  capabilities. In addition, we sketch an organization for a parallel file
  interface with low-level primitives and higher-level operations.},
  comment = {Describing the file system capabilities needed by parallel I/O
  algorithms to effectively use a parallel disk system. Cite cormen:integrate.}
}

@Article{cormen:permute,
  author = {Thomas H. Cormen},
  title = {Fast Permuting on Disk Arrays},
  journal = {Journal of Parallel and Distributed Computing},
  year = {1993},
  month = {January and February},
  volume = {17},
  number = {1--2},
  pages = {41--57},
  keyword = {parallel I/O algorithm, pario bib},
  comment = {See also cormen:thesis.}
}

@PhdThesis{cormen:thesis,
  author = {Thomas H. Cormen},
  title = {Virtual Memory for Data-Parallel Computing},
  year = {1992},
  school = {Department of Electrical Engineering and Computer Science,
  Massachusetts Institute of Technology},
  keyword = {parallel I/O, algorithm, pario bib},
  comment = {Lots of algorithms for out-of-core permutation problems. See also
  cormen:permute, cormen:integrate.}
}

@Misc{cray:pario2,
  key = {Cray90},
  author = {Cray Research},
  title = {{DS-41} Disk Subsystem},
  year = {1990},
  note = {Sales literature number MCFS-4-0790},
  keyword = {parallel I/O, disk architecture, disk array, pario bib},
  comment = {Glossy from Cray describing their new disk subsystem: up two four
  controllers and up to four ``drives'', each of which actually have four
  spindles. Thus, a full subsystem has 16 disks. Each drive or controller
  sustains 9.6 MBytes/sec sustained, for a total of 38.4 MBytes/sec. Each drive
  has 4.8 GBytes, for a total of 19.2 Gbytes. Access time per drive is 2--46.6
  msec, average 24 msec. They don't say how the 4 spindles within a driver are
  controlled or arranged.}
}

@Unpublished{crockett:manual,
  author = {Thomas W. Crockett},
  title = {Specification of the Operating System Interface for Parallel File
  Organizations},
  year = {1988},
  note = {Publication status unknown (ICASE technical report)},
  keyword = {parallel I/O, parallel file system, pario bib},
  comment = {Man pages for his Flex version of file interface. See
  crockett:par-files.}
}

@InProceedings{crockett:par-files,
  author = {Thomas W. Crockett},
  title = {File Concepts for Parallel {I/O}},
  booktitle = {Proceedings of Supercomputing '89},
  year = {1989},
  pages = {574--579},
  keyword = {parallel I/O, file access pattern, parallel file system, pario
  bib},
  comment = {Two views of a file: global (for sequential programs) and internal
  (for parallel programs). Standardized forms for these views, for long-lived
  files. Temp files have specialized forms. The access types are sequential,
  partitioned, interleaved, and self-scheduled, plus global random and
  partitioned random. He relates these to their best storage patterns. No
  mention of prefetching. Buffer cache only needed for direct (random) access.
  The application must specify the access pattern desired.}
}

@Article{csa-io,
  author = {T. J. M.},
  title = {Now: Parallel storage to match parallel {CPU} power},
  journal = {Electronics},
  year = {1988},
  month = {December},
  volume = {61},
  number = {12},
  pages = {112},
  keyword = {parallel I/O, disk array, pario bib}
}

@Article{debenedictis:modular,
  author = {Erik P. DeBenedictis and Juan Miguel {del Rosario}},
  title = {Modular Scalable {I/O}},
  journal = {Journal of Parallel and Distributed Computing},
  year = {1993},
  month = {January and February},
  volume = {17},
  number = {1--2},
  pages = {122--128},
  keyword = {parallel I/O, MIMD, pario bib},
  comment = {Journalized version of debenedictis:pario, debenedictis:ncube, and
  delrosario:nCUBE.}
}

@InProceedings{debenedictis:ncube,
  author = {Erik DeBenedictis and Juan Miguel del Rosario},
  title = {{nCUBE} Parallel {I/O} Software},
  booktitle = {Eleventh Annual IEEE International Phoenix Conference on
  Computers and Communications (IPCCC)},
  year = {1992},
  month = {April},
  pages = {0117--0124},
  keyword = {parallel file system, parallel I/O, pario bib},
  comment = {Interesting paper. Describes their mechanism for mapping I/O so
  that the file system knows both the mapping of a data structure into memory
  and on the disks, so that it can do the permutation and send the right data
  to the right disk, and back again. Interesting Unix-compatible interface.
  Needs to be extended to handle complex formats.}
}

@InProceedings{debenedictis:pario,
  author = {Erik DeBenedictis and Peter Madams},
  title = {{nCUBE's} Parallel {I/O} with {Unix} Capability},
  booktitle = {Sixth Annual Distributed-Memory Computer Conference},
  year = {1991},
  pages = {270--277},
  keyword = {parallel I/O, multiprocessor file system, file system interface,
  pario bib},
  comment = {Looks like they give the byte-level mapping, then do normal reads
  and writes; mapping routes the data to and from the correct place. But it
  does let you intermix comp with I/O. Elegant concept. Nice interface. Works
  best for cases where 1) data layout known in advance, data format is known,
  and mapping is regular enough for easy specification. I think that irregular
  or unknown mappings could still be done with a flat mapping.}
}

@Article{debenedictis:scalable-unix,
  author = {Erik P. DeBenedictis and Stephen C. Johnson},
  title = {Extending {Unix} for Scalable Computing},
  journal = {IEEE Computer},
  year = {1993},
  note = {To appear},
  keyword = {parallel I/O, Unix, pario bib},
  comment = {A more polished version of his other papers with del Rosario. The
  mapping-based mechanism is released in nCUBE software 3.0. It does support
  shared file pointers for self-scheduled I/O, as well as support for
  variable-length records, and asynchronous I/O (although the primary mechanism
  is for synchronous, i.e., SPMD, I/O). The basic idea of scalable pipes
  (between programs, devices, {\em etc.}) with mappings that determine routings to
  units seems like a good idea.}
}

@Article{delrosario:ncube,
  author = {Juan Miguel del Rosario},
  title = {High Performance Parallel {I/O} on the {nCUBE}~2},
  journal = {Institute of Electronics, Information and Communications Engineers
  (Transactions)},
  year = {1992},
  month = {August},
  note = {To appear},
  keyword = {parallel I/O, parallel file system, pario bib},
  comment = {More detail on the mapping functions, and more flexible mapping
  functions (can be user specified, or some from a library). Striped disks,
  parallel pipes, graphics, and HIPPI supported.}
}

@InProceedings{delrosario:two-phase,
  author = {Juan Miguel {del Rosario} and Rajesh Bordawekar and Alok
  Choudhary},
  title = {Improved Parallel {I/O} via a Two-Phase Run-time Access Strategy},
  booktitle = {IPPS~'93 Workshop on Input/Output in Parallel Computer Systems},
  year = {1993},
  pages = {56--70},
  keyword = {parallel I/O, multiprocessor file system, pario bib},
  comment = {See comments for delrosario:two-phase-tr.}
}

@TechReport{delrosario:two-phase-tr,
  author = {Juan Miguel del Rosario and Rajesh Bordawekar and Alok Choudhary},
  title = {Improving Parallel {I/O} Performance using a Two-Phase Access
  Strategy},
  year = {1993},
  number = {SCCS--406},
  institution = {NPAC at Syracuse University},
  keyword = {parallel I/O, multiprocessor file system, pario bib},
  comment = {They show performance measurements of various data distributions
  on an nCUBE and the Touchstone Delta, for reading matrix from a column-major
  file striped across disks, into some distribution across procs. Distributions
  that don't match the I/O distribution are really terrible, due to having
  more, smaller requests, and sometimes mismatching the stripe size (getting
  seg-like contention) or block size (reading partial blocks). They find it is
  better to read the file using the `best' distribution, then to reshuffle the
  data in memory. Big speedups.}
}

@TechReport{dewitt:gamma,
  author = {David J. {DeWitt} and Robert H. Gerber and Goetz Graefe and Michael
  L. Heytens and Krishna B. Kumar and M. Muralikrishna},
  title = {{GAMMA}: A High Performance Dataflow Database Machine},
  year = {1986},
  month = {March},
  number = {TR-635},
  institution = {Dept. of Computer Science, Univ. of Wisconsin-Madison},
  keyword = {parallel I/O, database, GAMMA, pario bib},
  comment = {Better to cite dewitt:gamma3. Multiprocessor (VAX) DBMS on a token
  ring with disk at each processor. They thought this was better than
  separating disks from processors by network since then network must handle
  {\em all} I/O rather than just what needs to move. Conjecture that shared
  memory might be best interconnection network. Relations are horizontally
  partitioned in some way, and each processor reads its own set and operates on
  them there.}
}

@InProceedings{dewitt:gamma-dbm,
  author = {David J. DeWitt and Shahram Ghandeharizadeh and Donovan Schneider},
  title = {A Performance Analysis of the {GAMMA} Database Machine},
  booktitle = {ACM SIGMOD Conference},
  year = {1988},
  month = {June},
  pages = {350--360},
  keyword = {parallel I/O, database, performance analysis, Teradata, GAMMA,
  pario bib},
  comment = {Compared Gamma with Teradata. Various operations on big relations.
  See fairly good linear speedup in many cases. They vary only one variable at
  a time. Their bottleneck was at the memory-network interface.}
}

@InProceedings{dewitt:gamma2,
  author = {David J. DeWitt and Robert H. Gerber and Goetz Graefe and Michael
  L. Heytens and Krishna B. Kumar and M. Muralikrishna},
  title = {{GAMMA} --- {A} High Performance Dataflow Database Machine},
  booktitle = {12th International Conference on Very Large Data Bases},
  year = {1986},
  pages = {228--237},
  keyword = {parallel I/O, database, GAMMA, pario bib},
  comment = {Almost identical to dewitt:gamma, with some updates. See that for
  comments, but cite this one. See also dewitt:gamma3 for a more recent paper.}
}

@Article{dewitt:gamma3,
  author = {David J. DeWitt and Shahram Ghandeharizadeh and Donovan A.
  Schneider and Allan Bricker and Hui-I Hsaio and Rick Rasmussen},
  title = {The {Gamma} Database Machine Project},
  journal = {IEEE Transactions on Knowledge and Data Engineering},
  year = {1990},
  month = {March},
  volume = {2},
  number = {1},
  pages = {44--62},
  keyword = {parallel I/O, database, GAMMA, pario bib},
  comment = {An updated version of dewitt:gamma2, with elements of
  dewitt:gamma-dbm. Really only need to cite this one. This is the same basic
  idea as dewitt:gamma2, but after they ported the system from the VAXen to an
  iPSC/2. Speedup results good. Question: how about comparing it to a
  single-processor, single-disk system with increasing disk bandwidth? That is,
  how much of their speedup comes from the increasing disk bandwidth, and how
  much from the actual use of parallelism?}
}

@Article{dewitt:pardbs,
  author = {David DeWitt and Jim Gray},
  title = {Parallel Database Systems: The Future of High-Performance Database
  Systems},
  journal = {Communications of the ACM},
  year = {1992},
  month = {June},
  volume = {35},
  number = {6},
  pages = {85--98},
  keyword = {database, parallel computing, parallel I/O, pario bib},
  comment = {They point out that the comments of boral:critique --- that
  database machines were doomed --- did really not come true. Their new thesis
  is that specialized hardware is not necessary and has not been successful,
  but that parallel database systems are clearly succesful. In particular, they
  argue for shared-nothing layouts. They survey the state-of-the-art parallel
  DB systems. Earlier version in Computer Architecture News 12/90.}
}

@InProceedings{dewitt:parsort,
  author = {David J. DeWitt and Jeffrey F. Naughton and Donovan A. Schneider},
  title = {Parallel Sorting on a Shared-Nothing Architecture using
  Probabilistic Splitting},
  booktitle = {Proceedings of the First International Conference on Parallel
  and Distributed Information Systems},
  year = {1991},
  pages = {280--291},
  keyword = {parallel I/O, parallel database, external sorting, pario bib},
  comment = {Comparing exact and probabilistic splitting for external sorting
  on a database. Model and experimental results from Gamma machine. Basically,
  the idea is to decide on a splitting vector, which defines $N$ buckets for an
  $N$-process program, and have each program read its initial segment of the
  data and send each element to the appropriate bucket (other process). All
  elements received are written to disks as small sorted runs. Then each
  process mergesorts its runs. Probabilistic split uses only a sample of the
  elements to define the vector.}
}

@InProceedings{dibble:bridge,
  author = {Peter Dibble and Michael Scott and Carla Ellis},
  title = {Bridge: {A} High-Performance File System for Parallel Processors},
  booktitle = {Proceedings of the Eighth International Conference on
  Distributed Computer Systems},
  year = {1988},
  month = {June},
  pages = {154--161},
  keyword = {Carla, Bridge, multiprocessor file system, Butterfly, parallel
  I/O, pario bib},
  comment = {See ellis:interleaved, dibble:*}
}

@Article{dibble:sort,
  author = {Peter C. Dibble and Michael L. Scott},
  title = {External Sorting on a Parallel Interleaved File System},
  journal = {University of Rochester 1989--90 Computer Science and Engineering
  Research Review},
  year = {1989},
  keyword = {parallel I/O, sorting, merging, parallel file reference pattern,
  pario bib},
  comment = {Cite dibble:sort2. Based on Bridge file system (see
  dibble:bridge). Parallel external merge-sort tool. Sort file on each disk,
  then do a parallel merge. The merge is serialized by the token-passing
  mechanism, but the I/O time dominates. The key is to keep disks busy
  constantly. Uses some read-ahead, write-behind to control fluctuations in
  disk request timing. Analytical analysis of the algorithm lends insight and
  matches well with the timings. Locality is a big win in Bridge tools.}
}

@Article{dibble:sort2,
  author = {Peter C. Dibble and Michael L. Scott},
  title = {Beyond Striping: The {Bridge} Multiprocessor File System},
  journal = {Computer Architecture News},
  year = {1989},
  month = {September},
  volume = {19},
  number = {5},
  keyword = {parallel I/O, external sorting, merging, parallel file reference
  pattern, pario bib},
  comment = {Subset of dibble:sort. Extra comments to distinguish from striping
  and RAID work. Good point that those projects are addressing a different
  bottleneck, and that they can provide essentially unlimited bandwidth to a
  single processor. Bridge could use those as individual file systems,
  parallelizing the overall file system, avoiding the software bottleneck.
  Using a very-reliable RAID at each node in Bridge could safeguard Bridge
  against failure for reasonable periods, removing reliability from Bridge
  level.}
}

@PhdThesis{dibble:thesis,
  author = {Peter C. Dibble},
  title = {A Parallel Interleaved File System},
  year = {1990},
  month = {March},
  school = {University of Rochester},
  keyword = {parallel I/O, external sorting, merging, parallel file system,
  pario bib},
  comment = {Also TR 334. Mostly covered by other papers, but includes good
  introduction, discussion of reliability and maintenance issues, and
  implementation. Short mention of prefetching implied that simple OBL was
  counter-productive, but later tool-specific buffering with read-ahead was
  often important. The three interfaces to the PIFS server are interesting. A
  fourth compromise might help make tools easier to write.}
}

@Article{dunigan:hypercubes,
  author = {T. H. Dunigan},
  title = {Performance of the {Intel iPSC/860} and {Ncube 6400} hypercubes},
  journal = {Parallel Computing},
  year = {1991},
  volume = {17},
  pages = {1285--1302},
  keyword = {intel, ncube, hypercube, multiprocessor architecture, performance,
  parallel I/O, pario bib},
  comment = {An excellent paper presenting lots of detailed performance
  measurements on the iPSC/1, iPSC/2, iPSC/860, nCUBE 3200, and nCUBE 6400:
  arithmetic, FLOPS, communication, I/O. Tables of numbers provide details
  needed for simulation. iPSC/860 definitely is fastest, but way out of balance
  wrt communication vs. computation. Number of message hops is not so important
  in newer machines.}
}

@TechReport{edelson:pario,
  author = {Daniel Edelson and Darrell D. E. Long},
  title = {High Speed Disk {I/O} for Parallel Computers},
  year = {1990},
  month = {January},
  number = {UCSC-CRL-90-02},
  institution = {Baskin Center for Computer Engineering and Information
  Science},
  keyword = {parallel I/O, disk caching, parallel file system, log-structured
  file system, Intel iPSC/2, pario bib},
  comment = {Essentially a small literature survey. No new ideas here, but it
  is a reasonable overview of the situation. Mentions caching, striping, disk
  layout optimization, log-structured file systems, and Bridge and Intel CFS.
  Plugs their ``Swift'' architecture (see cabrera:pario).}
}

@TechReport{ellis:interleaved,
  author = {Carla Ellis and P. Dibble},
  title = {An Interleaved File System for the {Butterfly}},
  year = {1987},
  month = {January},
  number = {CS-1987-4},
  institution = {Dept. of Computer Science, Duke University},
  keyword = {Carla, multiprocessor file system, Bridge, Butterfly, parallel
  I/O, pario bib},
  comment = {See dibble:bridge}
}

@InProceedings{ellis:prefetch,
  author = {Carla Schlatter Ellis and David Kotz},
  title = {Prefetching in File Systems for {MIMD} Multiprocessors},
  booktitle = {Proceedings of the 1989 International Conference on Parallel
  Processing},
  year = {1989},
  month = {August},
  pages = {I:306--314},
  keyword = {dfk, parallel file system, prefetching, disk caching, MIMD,
  parallel I/O, pario bib},
  abstract = {The problem of providing file I/O to parallel programs has been
  largely neglected in the development of multiprocessor systems. There are two
  essential elements of any file system design intended for a highly parallel
  environment: parallel I/O and effective caching schemes. This paper
  concentrates on the second aspect of file system design and specifically, on
  the question of whether prefetching blocks of the file into the block cache
  can effectively reduce overall execution time of a parallel computation, even
  under favorable assumptions. Experiments have been conducted with an
  interleaved file system testbed on the Butterfly Plus multiprocessor. Results
  of these experiments suggest that 1) the hit ratio, the accepted measure in
  traditional caching studies, may not be an adequate measure of performance
  when the workload consists of parallel computations and parallel file access
  patterns, 2) caching with prefetching can significantly improve the hit ratio
  and the average time to perform an I/O operation, and 3) an improvement in
  overall execution time has been observed in most cases. In spite of these
  gains, prefetching sometimes results in increased execution times (a negative
  result, given the optimistic nature of the study). We explore why is it not
  trivial to translate savings on individual I/O requests into consistently
  better overall performance and identify the key problems that need to be
  addressed in order to improve the potential of prefetching techniques in this
  environment.},
  comment = {Superseded by kotz:prefetch.}
}

@InProceedings{fineberg:nht1,
  author = {Samuel A. Fineberg},
  title = {Implementing the {NHT-1} application {I/O} benchmark},
  booktitle = {IPPS~'93 Workshop on Input/Output in Parallel Computer Systems},
  year = {1993},
  pages = {37--55},
  keyword = {parallel I/O, multiprocessor file system, benchmark, pario bib},
  comment = {See also carter:benchmark. Some preliminary results from one of
  their benchmarks. Note: ``I was only using a single Cray disk with a maximum
  transfer rate of 9.6MBytes/sec.'' --- Fineberg.}
}

@InProceedings{flynn:hyper-fs,
  author = {Robert J. Flynn and Haldun Hadimioglu},
  title = {A Distributed Hypercube File System},
  booktitle = {Third Conference on Hypercube Concurrent Computers and
  Applications},
  year = {1988},
  pages = {1375--1381},
  keyword = {parallel I/O, hypercube, parallel file system, pario bib},
  comment = {For hypercube-like architectures. Interleaved files, though
  flexible. Separate network for I/O, maybe not hypercube. I/O is blocked and
  buffered -- no coherency or prefetching issues discussed. Buffered close to
  point of use. Parallel access is ok. Broadcast supported? I/O nodes
  distinguished from comp nodes. I/O hooked to front-end too. See hadimioglu:fs
  and hadimioglu:hyperfs}
}

@TechReport{foster:climate,
  author = {Ian Foster and Mark Henderson and Rick Stevens},
  title = {Data Systems for Parallel Climate Models},
  year = {1991},
  month = {July},
  number = {ANL/MCS-TM-169},
  institution = {Argonne National Laboratory},
  note = {Copies of slides from a workshop by this title, with these
  organizers.},
  keyword = {parallel I/O, parallel database, multiprocessor file system,
  climate model, grand challenge, tertiary storage, archival storage, RAID,
  tape robot, pario bib},
  comment = {Includes the slides from many presenters covering climate
  modeling, data requirements for climate models, archival storage systems,
  multiprocessor file systems, and so forth. NCAR data storage growth rates
  (p.~54), 500 bytes per MFlop, or about 8~TB/year with Y/MP-8. Average file
  length 26.2~MB. Migration across both storage hierarchy and generations of
  media. LLNL researcher: typical 50-year, 3-dimensional model with 5-degree
  resolution will produce 75~GB of output. Attendee list included.}
}

@Book{fox:cubes,
  author = {G. Fox and M. Johnson and G. Lyzenga and S. Otto and J. Salmon and
  D. Walker},
  title = {Solving Problems on Concurrent Processors},
  year = {1988},
  volume = {1},
  publisher = {Prentice Hall},
  address = {Englewood Cliffs, NJ},
  keyword = {hypercube, pario bib},
  comment = {See fox:cubix for parallel I/O.}
}

@InBook{fox:cubix,
  author = {G. Fox and M. Johnson and G. Lyzenga and S. Otto and J. Salmon and
  D. Walker},
  title = {Solving Problems on Concurrent Processors},
  chapter = {6 and 15},
  year = {1988},
  volume = {1},
  publisher = {Prentice Hall},
  address = {Englewood Cliffs, NJ},
  keyword = {parallel file system, hypercube, pario bib},
  comment = {Parallel I/O control, called CUBIX. Interesting method. Depends a
  lot on ``loose synchronization'', which is sortof SIMD-like.}
}

@InProceedings{french:balance,
  author = {James C. French},
  title = {Characterizing the Balance of Parallel {I/O} Systems},
  booktitle = {Sixth Annual Distributed-Memory Computer Conference},
  year = {1991},
  pages = {724--727},
  keyword = {parallel I/O, multiprocessor file system, pario bib},
  comment = {Proposes the min\_SAR, max\_SAR, and ratio phi as measures of
  aggregate file system bandwidth. Has to do with load balance issues; how well
  the file system balances between competing nodes in a heavy-use period.}
}

@InProceedings{french:ipsc2io,
  author = {James C. French and Terrence W. Pratt and Mriganka Das},
  title = {Performance Measurement of a Parallel Input/Output System for the
  {Intel iPSC/2} Hypercube},
  booktitle = {Proceedings of the 1991 ACM Sigmetrics Conference on Measurement
  and Modeling of Computer Systems},
  year = {1991},
  pages = {178--187},
  keyword = {parallel I/O, Intel iPSC/2, pario bib},
  comment = {See french:ipsc2io-tr. Cite french:ipsc2io-jpdc.}
}

@Article{french:ipsc2io-jpdc,
  author = {James C. French and Terrence W. Pratt and Mriganka Das},
  title = {Performance Measurement of the {Concurrent File System} of the
  {Intel iPSC/2} Hypercube},
  journal = {Journal of Parallel and Distributed Computing},
  year = {1993},
  month = {January and February},
  volume = {17},
  number = {1--2},
  pages = {115--121},
  keyword = {parallel I/O, Intel iPSC/2, pario bib},
  comment = {See french:ipsc2io-tr.}
}

@TechReport{french:ipsc2io-tr,
  author = {James C. French and Terrence W. Pratt and Mriganka Das},
  title = {Performance Measurement of a Parallel Input/Output System for the
  {Intel iPSC/2} Hypercube},
  year = {1991},
  number = {IPC-TR-91-002},
  institution = {Institute for Parallel Computation, University of Virginia},
  note = {Appeared in Journal of Parallel and Distributed Computing},
  keyword = {parallel I/O, Intel iPSC/2, disk caching, prefetching, pario bib},
  comment = {Cite french:ipsc2io-jpdc. Really nice study of performance of
  existing CFS system on 32-node + 4 I/O-node iPSC/2. They show big
  improvements due to declustering, preallocation, caching, and prefetching.
  See also pratt:twofs.}
}

@Article{garcia:striping-reliability,
  author = {Hector Garcia-Molina and Kenneth Salem},
  title = {The Impact of Disk Striping on Reliability},
  journal = {{IEEE} Database Engineering Bulletin},
  year = {1988},
  month = {March},
  volume = {11},
  number = {1},
  pages = {26--39},
  keyword = {parallel I/O, disk striping, reliability, disk array, pario bib},
  comment = {Reliability of striped filesystems may not be as bad as you think.
  Parity disks help. Performance improvements limited to small number of disks
  ($n<10$). Good point: efficiency of striping will increase as the gap between
  CPU/memory performance and disk speed and file size widens. Reliability may
  be better if measured in terms of performing a task in time T, since the
  striped version may take less time. This gives disks less opportunity to fail
  during that period. Also consider the CPU failure mode, and its use over less
  time.}
}

@Article{ghosh:hyper,
  author = {Joydeep Ghosh and Kelvin D. Goveas and Jeffrey T. Draper},
  title = {Performance Evaluation of a Parallel {I/O} Subsystem for Hypercube
  Multiprocessors},
  journal = {Journal of Parallel and Distributed Computing},
  year = {1993},
  month = {January and February},
  volume = {17},
  number = {1--2},
  pages = {90--106},
  keyword = {parallel I/O, MIMD, multiprocessor architecture, hypercube, pario
  bib},
  comment = {Given a hypercube that has I/O nodes scattered throughout, they
  compare a plain one to one that has the I/O nodes also interconnected with a
  half-size hypercube. They show that this has better performance because the
  I/O traffic does not interfere with normal inter-PE traffic.}
}

@Article{gibson:arrays,
  author = {Garth A. Gibson},
  title = {Designing Disk Arrays for High Data Reliability},
  journal = {Journal of Parallel and Distributed Computing},
  year = {1993},
  month = {January/February},
  volume = {17},
  number = {1--2},
  pages = {4--27},
  keyword = {parallel I/O, RAID, redundancy, reliability, pario bib}
}

@Book{gibson:book,
  author = {Garth A. Gibson},
  title = {Redundant Disk Arrays: Reliable, Parallel Secondary Storage},
  year = {1992},
  series = {ACM Distinguished Dissertations},
  publisher = {MIT Press},
  keyword = {parallel I/O, disk array, disk striping, reliability, RAID, pario
  bib},
  comment = {Excellent book. Good source for discussion of the access gap and
  transfer gap, disk lifetimes, parity methods, reliability analysis, and
  generally the case for RAIDs. Page 220 he briefly discusses multiprocessor
  I/O architecture.}
}

@InProceedings{gibson:failcorrect,
  author = {Garth A. Gibson and Lisa Hellerstein and Richard M. Karp and Randy
  H. Katz and David A. Patterson},
  title = {Failure Correction Techniques for Large Disk Arrays},
  booktitle = {Third International Conference on Architectural Support for
  Programming Languages and Operating Systems},
  year = {1989},
  month = {April},
  pages = {123--132},
  keyword = {parallel I/O, disk array, RAID, reliability, pario bib},
  comment = {See gibson:raid for comments since it is the same.}
}

@TechReport{gibson:raid,
  author = {Garth Gibson and Lisa Hellerstein and Richard Karp and Randy Katz
  and David Patterson},
  title = {Coding techniques for handling failures in large disk arrays},
  year = {1988},
  month = {December},
  number = {UCB/CSD 88/477},
  institution = {UC Berkeley},
  keyword = {parallel I/O, RAID, reliability, disk array, pario bib},
  comment = {Published as gibson:failcorrect. Design of parity encodings to
  handle more than one bit failure in any group. Their 2-bit correcting codes
  are good enough for 1000-disk RAIDs that 3-bit correction is not needed.}
}

@InProceedings{gray:stripe,
  author = {Jim Gray and Bob Horst and Mark Walker},
  title = {Parity Striping of Disk Arrays: Low-cost Reliable Storage with
  Acceptable Throughput},
  booktitle = {Proceedings of the 16th VLDB Conference},
  year = {1990},
  pages = {148--159},
  keyword = {disk striping, reliability, pario bib},
  comment = {Parity striping, a variation of RAID 5, is just a different way of
  mapping blocks to disks. It groups parity blocks into extents, and does not
  stripe the data blocks. A logical disk is mostly contained in one physical
  disk, plus a parity region in another disk. Good for transaction processing
  workloads. Has the low cost/GByte of RAID, the reliability of RAID, without
  the high transfer rate of RAID, but with much better requests/second
  throughput than RAID 5. (But 40\% worse than mirrors.) So it is a compromise
  between RAID and mirrors.}
}

@InProceedings{grimshaw:elfs,
  author = {Andrew S. Grimshaw and Loyot, Jr., Edmond C.},
  title = {{ELFS:} Object-oriented Extensible File Systems},
  booktitle = {Proceedings of the First International Conference on Parallel
  and Distributed Information Systems},
  year = {1991},
  pages = {177},
  keyword = {parallel I/O, parallel file system, object-oriented, file system
  interface, pario bib},
  comment = {Full paper grimshaw:ELFSTR. Really neat idea. Uses OO interface to
  file system, which is mostly in user mode. The object classes represent
  particular access patterns (e.g., 2-D matrix) in the file, and hide the actual
  structure of the file. The object knows enough to taylor the cache and
  prefetch algorithms to the semantics. Class inheritance allows layering.}
}

@TechReport{grimshaw:elfstr,
  author = {Andrew S. Grimshaw and Loyot, Jr., Edmond C.},
  title = {{ELFS:} Object-oriented Extensible File Systems},
  year = {1991},
  month = {July},
  number = {TR-91-14},
  institution = {Univ. of Virginia Computer Science Department},
  keyword = {parallel I/O, parallel file system, object-oriented, file system
  interface, Intel iPSC/2, pario bib},
  comment = {From uvacs.cs.virginia.edu. See also grimshaw:elfs. provide the
  high bandwidth and low latency, reduce the cognitive burden on the
  programmer, and manage proliferation of data formats and architectural
  changes. Details of the plan to make an extensible OO interface to file
  system. Objects each have a separate thread of control, so they can do
  asynchronous activity like prefetching and caching in the background, and
  support multiple outstanding requests. The Mentat object system makes it easy
  for them to support pipelining of I/O with I/O and computation in the user
  program. Let the user choose type of consistency needed. See grimshaw:objects
  for more results.}
}

@InProceedings{grimshaw:objects,
  author = {Andrew S. Grimshaw and Jeff Prem},
  title = {High Performance Parallel File Objects},
  booktitle = {Sixth Annual Distributed-Memory Computer Conference},
  year = {1991},
  pages = {720--723},
  keyword = {parallel I/O, multiprocessor file system, file system interface,
  pario bib},
  comment = {Not much new from ELFS TR. A better citation than grimshaw:ELFS
  though. Does give CFS performance results. Note on 721 he says that CFS
  prefetches into ``local memory from which to satisfy future user requests
  {\em that never come.}'' This happens if the local access pattern isn't
  purely sequential, as in an interleaved pattern.}
}

@InProceedings{hadimioglu:fs,
  author = {Haldun Hadimioglu and Robert J. Flynn},
  title = {The Architectural Design of a Tightly-Coupled Distributed Hypercube
  File System},
  booktitle = {Fourth Conference on Hypercube Concurrent Computers and
  Applications},
  year = {1989},
  pages = {147--150},
  keyword = {hypercube, multiprocessor file system, pario bib},
  comment = {An early paper describing a proposed file system for hypercubes.
  The writing is almost impenetrable. Confusing and not at all clear what they
  propose. See also hadimioglu:hyperfs and flynn:hyper-fs.}
}

@InProceedings{hadimioglu:hyperfs,
  author = {Haldun Hadimioglu and Robert J. Flynn},
  title = {The Design and Analysis of a Tightly Coupled Hypercube File System},
  booktitle = {Fifth Annual Distributed-Memory Computer Conference},
  year = {1990},
  pages = {1405--1410},
  keyword = {multiprocessor file system, parallel I/O, hypercube, pario bib},
  comment = {Describes a hypercube file system based on I/O nodes and processor
  nodes. A few results from a hypercube simulator. See hadimioglu:fs and
  flynn:hyper-fs.}
}

@InProceedings{hartman:zebra,
  author = {John H. Hartman and John K. Ousterhout},
  title = {{Zebra: A} Striped Network File System},
  booktitle = {Proceedings of the Usenix File Systems Workshop},
  year = {1992},
  month = {May},
  pages = {71--78},
  keyword = {disk striping, distributed file system, pario bib},
  comment = {Not a parallel file system, but worth comparing to Swift.
  Certainly, a similar idea could be used in a multiprocessor.}
}

@InProceedings{hatcher:linda,
  author = {Philip J. Hatcher and Michael J. Quinn},
  title = {{C*-Linda:} {A} Programming Environment with Multiple Data-Parallel
  Modules and Parallel {I/O}},
  booktitle = {Proceedings of the Twenty-Fourth Annual Hawaii International
  Conference on System Sciences},
  year = {1991},
  pages = {382--389},
  keyword = {parallel I/O, Linda, data parallel, nCUBE, parallel graphics,
  heterogeneous computing, pario bib},
  comment = {C*-Linda is basically a combination of C* and C-Linda. The model
  is that of several SIMD modules interacting in a MIMD fashion through a Linda
  tuple space. The modules are created using {\tt eval}, as in Linda. In this
  case, the compiler statically assigns each eval to a separate subcube on an
  nCUBE 3200, although they also talk about multiprogramming several modules on
  a subcube (not supported by VERTEX). They envision having separate modules
  running on the nCUBE's graphics processors, or having the file system
  directly talk to the tuple space, to support I/O. They also envision talking
  to modules elsewhere on a network, e.g., a workstation, through the tuple
  space. They reject the idea of sharing memory between modules due to the lack
  of synchrony between modules, and message passing because it is error-prone.}
}

@InProceedings{hayes:ncube,
  author = {John P. Hayes and Trevor N. Mudge and Quentin F. Stout and Stephen
  Colley and John Palmer},
  title = {Architecture of a Hypercube Supercomputer},
  booktitle = {Proceedings of the 1986 International Conference on Parallel
  Processing},
  year = {1986},
  pages = {653--660},
  keyword = {hypercube, parallel architecture, nCUBE, pario bib},
  comment = {Description of the first nCUBE, the NCUBE/ten. Good historical
  background about hypercubes. Talks about their design choices. Says a little
  about the file system --- basically just a way of mounting disks on top of
  each other, within the nCUBE and to other nCUBEs.}
}

@Book{hennessy:arch,
  author = {John L. Hennessy and David A. Patterson},
  title = {Computer Architecture: A Quantitative Approach},
  year = {1990},
  publisher = {Morgan Kaufman},
  keyword = {computer architecture, textbook, pario bib},
  comment = {Looks like a great coverage of architecture. Of course a chapter
  on I/O (that mentions RAID).}
}

@Article{herbst:bottleneck,
  author = {Kris Herbst},
  title = {Trends in Mass Storage: vendors seek solutions to growing {I/O}
  bottleneck},
  journal = {Supercomputing Review},
  year = {1991},
  month = {March},
  pages = {46--49},
  keyword = {parallel I/O, disk media, optical disk, holographic storage,
  trends, tape storage, parallel transfer disk, disk striping, pario bib},
  comment = {A good overview of the current state-of-the art in March 1991,
  including particular numbers and vendor names. They discuss disk media
  (density, rotation, {\em etc.}), parallel transfer disks, disk arrays, parity and
  RAID, HiPPI, tape archives, optical memory, and holographic storage. Rotation
  speeds can increase as diameter goes down. Density increases are often offset
  by slower head settling times. Disk arrays will hit their ``heydey'' in the
  1990s. Trend toward network-attached storage devices, that don't need a
  computer as a server.}
}

@InProceedings{hersch:pixmap,
  author = {Roger D. Hersch},
  title = {Parallel Storage and Retrieval of Pixmap Images},
  booktitle = {Proceedings of the Twelfth IEEE Symposium on Mass Storage
  Systems},
  year = {1993},
  pages = {221--226},
  keyword = {parallel I/O, file system, pario bib},
  comment = {Ways to arrange 2-d images on disk arrays that have multiple
  processors (like Datamesh), so that retrieval time for images or subimages is
  minimized.}
}

@InProceedings{hou:disk,
  author = {Robert Y. Hou and Gregory R. Ganger and Yale N. Patt and Charles E.
  Gimarc},
  title = {Issues and Problems in the {I/O} Subsystem, Part {I} --- {The}
  Magnetic Disk},
  booktitle = {Proceedings of the Twenty-Fifth Annual Hawaii International
  Conference on System Sciences},
  year = {1992},
  pages = {48--57},
  keyword = {parallel I/O, pario bib},
  comment = {A short summary of disk I/O issues: disk technology, latency
  reduction, parallel I/O, {\em etc.}. Nothing new.}
}

@InProceedings{hsiao:decluster,
  author = {Hui-I Hsiao and David DeWitt},
  title = {{Chained Declustering}: {A} New Availability Strategy for
  Multiprocessor Database Machines},
  booktitle = {Proceedings of 6th International Data Engineering Conference},
  year = {1990},
  pages = {456--465},
  keyword = {disk array, reliability, parallel I/O, pario bib},
  comment = {Chained declustering has cost like mirroring, since it replicates
  each block, but has better load increase during failure than mirrors,
  interleaved declustering, or RAID. (Or parity striping (my guess)). Has
  reliability between that of mirrors and RAID, and much better than
  interleaved declustering. Would also be much easier in a distributed
  environment. See hsiao:diskrep.}
}

@InProceedings{hsiao:diskrep,
  author = {Hui-I Hsiao and David DeWitt},
  title = {A Performance Study of Three High Availability Data Replication
  Strategies},
  booktitle = {Proceedings of the First International Conference on Parallel
  and Distributed Information Systems},
  year = {1991},
  pages = {18--28},
  keyword = {disk array, reliability, disk mirroring, parallel I/O, pario bib},
  comment = {Compares mirrored disks (MD) with interleaved declustering (ID)
  with chained declustering (CD). ID and CD found to have much better
  performance in normal and failure modes. See hsiao:decluster.}
}

@Article{hsiao:diskrep2,
  author = {Hui-I Hsiao and David DeWitt},
  title = {A Performance Study of Three High Availability Data Replication
  Strategies},
  journal = {Journal of Distributed and Parallel Databases},
  year = {1993},
  month = {January},
  volume = {1},
  number = {1},
  pages = {53--79},
  keyword = {disk array, reliability, disk mirroring, parallel I/O, pario bib},
  comment = {See hsiao:diskrep.}
}

@MastersThesis{husmann:format,
  author = {Harlan Edward Husmann},
  title = {High-Speed Format Conversion and Parallel {I/O} in Numerical
  Programs},
  year = {1984},
  month = {January},
  school = {Department of Computer Science, Univ. of Illinois at
  Urbana-Champaign},
  note = {Available as TR number UIUCDCS-R-84-1152.},
  keyword = {parallel I/O, I/O, pario bib},
  comment = {Does FORTRAN format conversion in software in parallel or in
  hardware, to obtain good speedups for lots of programs. However he found that
  increasing the I/O bandwidth was the most significant change that could be
  made in the parallel program.}
}

@Booklet{intel:examples,
  key = {Intel},
  title = {Concurrent {I/O} Application Examples},
  year = {1989},
  howpublished = {Intel Corporation Background Information},
  keyword = {file access pattern, parallel I/O, Intel iPSC/2, hypercube, pario
  bib},
  comment = {Lists several examples and the amount and types of data they
  require, and how much bandwidth. Fluid flow modeling, Molecular modeling,
  Seismic processing, and Tactical and strategic systems.}
}

@Booklet{intel:ipsc2io,
  key = {Intel},
  title = {{iPSC/2} {I/O} Facilities},
  year = {1988},
  howpublished = {Intel Corporation},
  note = {Order number 280120-001},
  keyword = {parallel I/O, hypercube, Intel iPSC/2, pario bib},
  comment = {Simple overview, not much detail. See intel:ipsc2, pierce:pario,
  asbury:fortranio. Separate I/O nodes from compute nodes. Each I/O node has a
  SCSI bus to the disks, and communicates with other nodes in the system via
  Direct-Connect hypercube routing.}
}

@Booklet{intel:paragon,
  key = {Intel},
  title = {Paragon {XP/S} Product Overview},
  year = {1991},
  howpublished = {Intel Corporation},
  keyword = {parallel architecture, parallel I/O, Intel, pario bib},
  comment = {Not a bad glossy.}
}

@Misc{intelio,
  key = {Intel},
  title = {Intel beefs up its {iPSC/2} supercomputer's {I/O} and memory
  capabilities},
  year = {1988},
  month = {November},
  volume = {61},
  number = {11},
  pages = {24},
  howpublished = {Electronics},
  keyword = {parallel I/O, hypercube, Intel iPSC/2, pario bib}
}

@Proceedings{ipps-io93,
  title = {IPPS~'93 Workshop on Input/Output in Parallel Computer Systems},
  editor = {Ravi Jain and John Werth and J. C. Browne},
  year = {1993},
  month = {April},
  address = {Newport Beach, CA},
  keyword = {parallel I/O, multiprocessor file system, pario bib}
}

@Article{jain:pario,
  author = {Ravid Jain and Kiran Somalwar and John Werth and J. C. Browne},
  title = {Scheduling Parallel {I/O} Operations in Multiple Bus Systems},
  journal = {Journal of Parallel and Distributed Computing},
  year = {1992},
  month = {December},
  volume = {16},
  number = {4},
  pages = {353--362},
  keyword = {parallel I/O, shared memory, scheduling, pario bib}
}

@PhdThesis{jensen:thesis,
  author = {David Wayne Jensen},
  title = {Disk {I/O} In High-Performance Computing Systems},
  year = {1993},
  school = {Univ. Illinois, Urbana-Champagne},
  keyword = {parallel I/O, pario bib}
}

@InProceedings{johnson:insertions,
  author = {Theodore Johnson},
  title = {Supporting Insertions and Deletions in Striped Parallel
  Filesystems},
  booktitle = {Proceedings of the Seventh International Parallel Processing
  Symposium},
  year = {1993},
  pages = {425--433},
  keyword = {parallel I/O, multiprocessor file system, pario bib},
  comment = {If you insert blocks into a striped file, you mess up the nice
  striping. So he breaks the file into striped extents, and keeps track of the
  extents with a distributed B-tree index. Deletions also fit into the same
  scheme.}
}

@Article{johnson:wave,
  author = {Olin G. Johnson},
  title = {Three-dimensional Wave Equation Computations on Vector Computers},
  journal = {Proceedings of the IEEE},
  year = {1984},
  month = {January},
  volume = {72},
  number = {1},
  pages = {90--95},
  keyword = {computational physics, parallel I/O, pario bib},
  comment = {Old paper on the need for large memory and fast paging and I/O in
  out-of-core solutions to 3-d seismic modeling. They used 4-way parallel I/O
  to support their job. Needed to transfer a 3-d matrix in and out of memory by
  rows, columns, and vertical columns. Stored in block-structured form to
  improve locality on the disk.}
}

@Article{katz:diskarch,
  author = {Randy H. Katz and Garth A. Gibson and David A. Patterson},
  title = {Disk System Architectures for High Performance Computing},
  journal = {Proceedings of the IEEE},
  year = {1989},
  month = {December},
  volume = {77},
  number = {12},
  pages = {1842--1858},
  keyword = {parallel I/O, RAID, disk striping, pario bib},
  comment = {Good review of the background of disks and I/O architectures, but
  a shorter RAID presentation than patterson:RAID. Also addresses controller
  structure. Good ref for the I/O crisis background, though they don't use that
  term here. Good taxonomy of previous array techniques.}
}

@Article{katz:io-subsys,
  author = {Randy H. Katz and John K. Ousterhout and David A. Patterson and
  Michael R. Stonebraker},
  title = {A Project on High Performance {I/O} Subsystems},
  journal = {{IEEE} Database Engineering Bulletin},
  year = {1988},
  month = {March},
  volume = {11},
  number = {1},
  pages = {40--47},
  keyword = {parallel I/O, RAID, Sprite, reliability, disk striping, disk
  array, pario bib},
  comment = {Early RAID project paper. Describes the Berkeley team's plan to
  use an array of small (100M) hard disks as an I/O server for network file
  service, transaction processing, and supercomputer I/O. Considering
  performance, reliability, and flexibility. Initially hooked to their SPUR
  multiprocessor, using Sprite operating system, new filesystem. Either
  asynchronous striped or independent operation. Supercomputer I/O is
  characterized as sequential, minimum latency, low throughput. Use of parity
  disks to boost reliability. Files may be striped across one or more disks and
  extend over several sectors, thus a two-dimensional filesystem; striping need
  not involve all disks.}
}

@InProceedings{katz:netfs,
  author = {Randy H. Katz},
  title = {Network-Attached Storage Systems},
  booktitle = {Scalable High Performance Computing Conference},
  year = {1992},
  pages = {68--75},
  keyword = {distributed file system, supercomputer file system, file striping,
  RAID, parallel I/O, pario bib},
  comment = {Comments on the emerging trend of file systems for mainframes and
  supercomputers that are not attached directly to the computer, but instead to
  a network attached to the computer. Avoiding data copying seems to be a
  critical issue in the OS and controllers, for disk and network interfaces.
  Describes RAID-II prototype.}
}

@Article{katz:update,
  author = {Randy H. Katz and John K. Ousterhout and David A. Patterson and
  Peter Chen and Ann Chervenak and Rich Drewes and Garth Gibson and Ed Lee and
  Ken Lutz and Ethan Miller and Mendel Rosenblum},
  title = {A Project on High Performance {I/O} Subsystems},
  journal = {Computer Architecture News},
  year = {1989},
  month = {September},
  volume = {17},
  number = {5},
  pages = {24--31},
  keyword = {parallel I/O, RAID, reliability, disk array, pario bib},
  comment = {A short summary of the RAID project. Some more up-to-date info,
  like that they have completed the first prototype with 8 SCSI strings and 32
  disks.}
}

@InProceedings{keane:commercial,
  author = {J. A. Keane and T. N. Franklin and A. J. Grant and R. Sumner and M.
  Q. Xu},
  title = {Commercial Users' Requirements for Parallel Systems},
  booktitle = {Proceedings of the 1993 DAGS/PC Symposium},
  year = {1993},
  month = {June},
  pages = {15--25},
  organization = {Dartmouth Institute for Advanced Graduate Studies},
  address = {Hanover, NH},
  keyword = {parallel architecture, parallel I/O, databases, commercial
  requirements, pario bib},
  abstract = {This paper reports on part of an on-going analysis of parallel
  systems for commercial users. The particular focus of this paper is on the
  requirements that commercial users, in particular users with financial
  database systems, have of parallel systems. The issues of concern to such
  users differ from those of concern to science and engineering users.
  Performance of the parallel system is not the only, or indeed primary, reason
  for moving to such systems for commercial users. Infra-structure issues are
  important, such as system availability and inter-working with existing
  systems. These issues are discussed in the context of a banking customer's
  requirements. The various technical concerns that these requirements impose
  are discussed in terms of commercially available systems.}
}

@Article{kim:asynch,
  author = {Michelle Y. Kim and Asser N. Tantawi},
  title = {Asynchronous Disk Interleaving: {Approximating} Access Delays},
  journal = {IEEE Transactions on Computers},
  year = {1991},
  month = {July},
  volume = {40},
  number = {7},
  pages = {801--810},
  keyword = {disk interleaving, parallel I/O, performance modeling, pario bib},
  comment = {As opposed to synchronous disk interleaving, where disks are
  rotationally synchronous and one access is processed at a time. They develop
  a performance model and validate it with traces of a database system's disk
  accesses. Average access delay on each disk can be approximated by a normal
  distribution.}
}

@Article{kim:fft,
  author = {Michelle Y. Kim and Anil Nigam and George Paul and Robert H.
  Flynn},
  title = {Disk Interleaving and Very Large Fast {F}ourier Transforms},
  journal = {International Journal of Supercomputer Applications},
  year = {1987},
  volume = {1},
  number = {3},
  pages = {75--96},
  keyword = {parallel I/O, disk striping, scientific computing, algorithm,
  pario bib}
}

@PhdThesis{kim:interleave,
  author = {Michelle Y. Kim},
  title = {Synchronously Interleaved Disk Systems with their Application to the
  Very Large {FFT}},
  year = {1986},
  school = {IBM Thomas J. Watson Research Center},
  address = {Yorktown Heights, New York 10598},
  note = {IBM Report number RC12372},
  keyword = {parallel I/O, disk striping, file access pattern, disk array,
  pario bib},
  comment = {Uniprocessor interleaving techniques. Good case for interleaving.
  Probably better to reference kim:interleaving and kim:fft. Discusses an 3D
  FFT algorithm in which the matrix is broken into subblocks that are accessed
  in layers. The layers are stored so this is either contiguous or with a
  regular stride, in fairly large chunks.}
}

@Article{kim:interleaving,
  author = {Michelle Y. Kim},
  title = {Synchronized Disk Interleaving},
  journal = {IEEE Transactions on Computers},
  year = {1986},
  month = {November},
  volume = {C-35},
  number = {11},
  pages = {978--988},
  keyword = {parallel I/O, disk striping, disk array, pario bib},
  comment = {See kim:interleave.}
}

@TechReport{kotz:fsint,
  author = {David Kotz},
  title = {Multiprocessor File System Interfaces},
  year = {1992},
  month = {March},
  number = {PCS-TR92-179},
  institution = {Dept. of Math and Computer Science, Dartmouth College},
  note = {Revised version appeared in PDIS'93.},
  keyword = {dfk, parallel I/O, multiprocessor file system, file system
  interface, pario bib},
  abstract = {Increasingly, file systems for multiprocessors are designed with
  parallel access to multiple disks, to keep I/O from becoming a serious
  bottleneck for parallel applications. Although file system software can
  transparently provide high-performance access to parallel disks, a new file
  system interface is needed to facilitate parallel access to a file from a
  parallel application. We describe the difficulties faced when using the
  conventional (Unix-like) interface in parallel applications, and then outline
  ways to extend the conventional interface to provide convenient access to the
  file for parallel programs, while retaining the traditional interface for
  programs that have no need for explicitly parallel file access. Our interface
  includes a single naming scheme, a {\em multiopen\/} operation, local and
  global file pointers, mapped file pointers, logical records, {\em
  multifiles}, and logical coercion for backward compatibility.},
  comment = {Cite kotz:fsint2.}
}

@InProceedings{kotz:fsint2,
  author = {David Kotz},
  title = {Multiprocessor File System Interfaces},
  booktitle = {Proceedings of the Second International Conference on Parallel
  and Distributed Information Systems},
  year = {1993},
  pages = {194--201},
  keyword = {dfk, parallel I/O, multiprocessor file system, file system
  interface, pario bib},
  abstract = {Increasingly, file systems for multiprocessors are designed with
  parallel access to multiple disks, to keep I/O from becoming a serious
  bottleneck for parallel applications. Although file system software can
  transparently provide high-performance access to parallel disks, a new file
  system interface is needed to facilitate parallel access to a file from a
  parallel application. We describe the difficulties faced when using the
  conventional (Unix-like) interface in parallel applications, and then outline
  ways to extend the conventional interface to provide convenient access to the
  file for parallel programs, while retaining the traditional interface for
  programs that have no need for explicitly parallel file access. Our interface
  includes a single naming scheme, a {\em multiopen\/} operation, local and
  global file pointers, mapped file pointers, logical records, {\em
  multifiles}, and logical coercion for backward compatibility.}
}

@InProceedings{kotz:fsint2p,
  author = {David Kotz},
  title = {Multiprocessor File System Interfaces},
  booktitle = {Proceedings of the Usenix File Systems Workshop},
  year = {1992},
  month = {May},
  pages = {149--150},
  keyword = {dfk, parallel I/O, multiprocessor file system, file system
  interface, pario bib},
  comment = {Short paper (2 pages). See kotz:fsint2.}
}

@Article{kotz:jpractical,
  author = {David Kotz and Carla Schlatter Ellis},
  title = {Practical Prefetching Techniques for Multiprocessor File Systems},
  journal = {Journal of Distributed and Parallel Databases},
  year = {1993},
  month = {January},
  volume = {1},
  number = {1},
  pages = {33--51},
  keyword = {dfk, parallel file system, prefetching, disk caching, parallel
  I/O, MIMD, pario bib},
  abstract = {Improvements in the processing speed of multiprocessors are
  outpacing improvements in the speed of disk hardware. Parallel disk I/O
  subsystems have been proposed as one way to close the gap between processor
  and disk speeds. In a previous paper we showed that prefetching and caching
  have the potential to deliver the performance benefits of parallel file
  systems to parallel applications. In this paper we describe experiments with
  practical prefetching policies that base decisions only on on-line reference
  history, and that can be implemented efficiently. We also test the ability of
  these policies across a range of architectural parameters.},
  comment = {Journal version of kotz:practical. See also kotz:jwriteback,
  kotz:fsint2, cormen:integrate.}
}

@Article{kotz:jwriteback,
  author = {David Kotz and Carla Schlatter Ellis},
  title = {Caching and Writeback Policies in Parallel File Systems},
  journal = {Journal of Parallel and Distributed Computing},
  year = {1993},
  month = {January and February},
  volume = {17},
  number = {1--2},
  pages = {140--145},
  keyword = {dfk, parallel file system, disk caching, parallel I/O, MIMD, pario
  bib},
  abstract = {Improvements in the processing speed of multiprocessors are
  outpacing improvements in the speed of disk hardware. Parallel disk I/O
  subsystems have been proposed as one way to close the gap between processor
  and disk speeds. Such parallel disk systems require parallel file system
  software to avoid performance-limiting bottlenecks. We discuss cache
  management techniques that can be used in a parallel file system
  implementation for multiprocessors with scientific workloads. We examine
  several writeback policies, and give results of experiments that test their
  performance.},
  comment = {Journal version of kotz:writeback. See kotz:jpractical,
  kotz:fsint2, cormen:integrate.}
}

@InProceedings{kotz:practical,
  author = {David Kotz and Carla Schlatter Ellis},
  title = {Practical Prefetching Techniques for Parallel File Systems},
  booktitle = {Proceedings of the First International Conference on Parallel
  and Distributed Information Systems},
  year = {1991},
  month = {December},
  pages = {182--189},
  keyword = {dfk, parallel file system, prefetching, disk caching, parallel
  I/O, MIMD, pario bib},
  abstract = {Parallel disk subsystems have been proposed as one way to close
  the gap between processor and disk speeds. In a previous paper we showed that
  prefetching and caching have the potential to deliver the performance
  benefits of parallel file systems to parallel applications. In this paper we
  describe experiments with practical prefetching policies, and show that
  prefetching can be implemented efficiently even for the more complex parallel
  file access patterns. We test these policies across a range of architectural
  parameters.},
  comment = {Short form of primary thesis results. Cite kotz:jpractical. See
  kotz:jwriteback, kotz:fsint2, cormen:integrate.}
}

@Article{kotz:prefetch,
  author = {David Kotz and Carla Schlatter Ellis},
  title = {Prefetching in File Systems for {MIMD} Multiprocessors},
  journal = {IEEE Transactions on Parallel and Distributed Systems},
  year = {1990},
  month = {April},
  volume = {1},
  number = {2},
  pages = {218--230},
  keyword = {dfk, parallel file system, prefetching, MIMD, disk caching,
  parallel I/O, pario bib},
  abstract = {The problem of providing file I/O to parallel programs has been
  largely neglected in the development of multiprocessor systems. There are two
  essential elements of any file system design intended for a highly parallel
  environment: parallel I/O and effective caching schemes. This paper
  concentrates on the second aspect of file system design and specifically, on
  the question of whether prefetching blocks of the file into the block cache
  can effectively reduce overall execution time of a parallel computation, even
  under favorable assumptions. Experiments have been conducted with an
  interleaved file system testbed on the Butterfly Plus multiprocessor. Results
  of these experiments suggest that 1) the hit ratio, the accepted measure in
  traditional caching studies, may not be an adequate measure of performance
  when the workload consists of parallel computations and parallel file access
  patterns, 2) caching with prefetching can significantly improve the hit ratio
  and the average time to perform an I/O operation, and 3) an improvement in
  overall execution time has been observed in most cases. In spite of these
  gains, prefetching sometimes results in increased execution times (a negative
  result, given the optimistic nature of the study). We explore why is it not
  trivial to translate savings on individual I/O requests into consistently
  better overall performance and identify the key problems that need to be
  addressed in order to improve the potential of prefetching techniques in this
  environment.}
}

@PhdThesis{kotz:thesis,
  author = {David Kotz},
  title = {Prefetching and Caching Techniques in File Systems for {MIMD}
  Multiprocessors},
  year = {1991},
  month = {April},
  school = {Duke University},
  note = {Available as technical report CS-1991-016.},
  keyword = {dfk, parallel file system, prefetching, MIMD, disk caching,
  parallel I/O, pario bib},
  abstract = {The increasing speed of the most powerful computers, especially
  multiprocessors, makes it difficult to provide sufficient I/O bandwidth to
  keep them running at full speed for the largest problems. Trends show that
  the difference in the speed of disk hardware and the speed of processors is
  increasing, with I/O severely limiting the performance of otherwise fast
  machines. This widening access-time gap is known as the ``I/O bottleneck
  crisis.'' One solution to the crisis, suggested by many researchers, is to
  use many disks in parallel to increase the overall bandwidth. This
  dissertation studies some of the file system issues needed to get high
  performance from parallel disk systems, since parallel hardware alone cannot
  guarantee good performance. The target systems are large MIMD multiprocessors
  used for scientific applications, with large files spread over multiple disks
  attached in parallel. The focus is on automatic caching and prefetching
  techniques. We show that caching and prefetching can transparently provide
  the power of parallel disk hardware to both sequential and parallel
  applications using a conventional file system interface. We also propose a
  new file system interface (compatible with the conventional interface) that
  could make it easier to use parallel disks effectively. Our methodology is a
  mixture of implementation and simulation, using a software testbed that we
  built to run on a BBN GP1000 multiprocessor. The testbed simulates the disks
  and fully implements the caching and prefetching policies. Using a synthetic
  workload as input, we use the testbed in an extensive set of experiments. The
  results show that prefetching and caching improved the performance of
  parallel file systems, often dramatically.},
  comment = {Published as kotz:jwriteback, kotz:jpractical, kotz:fsint2.}
}

@TechReport{kotz:throughput,
  author = {David Kotz},
  title = {Throughput of Existing Multiprocessor File Systems},
  year = {1993},
  month = {May},
  number = {PCS-TR93-190},
  institution = {Dartmouth College},
  keyword = {parallel I/O, multiprocessor file system, performance, survey,
  dfk, pario bib},
  comment = {A brief note on the reported performance of existing file systems
  (Intel CFS, nCUBE, CM-2, CM-5, and Cray). Many have disappointingly low
  absolute throughput, in MB/s.}
}

@InProceedings{kotz:writeback,
  author = {David Kotz and Carla Schlatter Ellis},
  title = {Caching and Writeback Policies in Parallel File Systems},
  booktitle = {1991 IEEE Symposium on Parallel and Distributed Processing},
  year = {1991},
  month = {December},
  pages = {60--67},
  keyword = {dfk, parallel file system, disk caching, parallel I/O, MIMD, pario
  bib},
  abstract = {Improvements in the processing speed of multiprocessors are
  outpacing improvements in the speed of disk hardware. Parallel disk I/O
  subsystems have been proposed as one way to close the gap between processor
  and disk speeds. Such parallel disk systems require parallel file system
  software to avoid performance-limiting bottlenecks. We discuss cache
  management techniques that can be used in a parallel file system
  implementation. We examine several writeback policies, and give results of
  experiments that test their performance.},
  comment = {Cite kotz:jwriteback. See also kotz:jpractical, kotz:fsint2,
  cormen:integrate.}
}

@TechReport{krieger:asf,
  author = {Orran Krieger and Michael Stumm and Ronald Unrau},
  title = {The {Alloc Stream Facility}: A Redesign of Application-level Stream
  {I/O}},
  year = {1992},
  month = {October},
  number = {CSRI-275},
  institution = {Computer Systems Research Institute, University of Toronto},
  address = {Toronto, Canada, M5S 1A1},
  keyword = {memory-mapped file, file system, parallel I/O, pario bib},
  comment = {See also krieger:mapped. A 3-level interface structure: interface,
  backplane, and stream-specific modules. Different interfaces available: unix,
  stdio, ASI (theirs), C++. Common backplane. Stream-specific implementations
  that export operations like salloc and sfree, which return pointers to data
  buffers. ASI exports that interface to the user, for maximum efficiency.
  Performance is best when using mapped files as underlying implementation.
  Many stdio or unix apps are faster only after relinking. ASI is even faster.
  In addition to better performance, also get multithreading support, multiple
  interfaces, and extensibility.}
}

@InProceedings{krieger:hfs,
  author = {Orran Krieger and Michael Stumm},
  title = {{HFS:} A Flexible File System for large-scale Multiprocessors},
  booktitle = {Proceedings of the 1993 DAGS/PC Symposium},
  year = {1993},
  month = {June},
  pages = {6--14},
  organization = {Dartmouth Institute for Advanced Graduate Studies},
  address = {Hanover, NH},
  keyword = {multiprocessor file system, parallel I/O, operating system, shared
  memory, pario bib},
  abstract = {The {H{\sc urricane}} File System (HFS) is a new file system
  being developed for large-scale shared memory multiprocessors with
  distributed disks. The main goal of this file system is scalability; that is,
  the file system is designed to handle demands that are expected to grow
  linearly with the number of processors in the system. To achieve this goal,
  HFS is designed using a new structuring technique called Hierarchical
  Clustering. HFS is also designed to be flexible in supporting a variety of
  policies for managing file data and for managing file system state. This
  flexibility is necessary to support in a scalable fashion the diverse
  workloads we expect for a multiprocessor file system.},
  comment = {Designed for scalability on the hierarchical clustering model (see
  unrau:cluster), the Hurricane File System for NUMA shared-memory MIMD
  machines. Each cluster has its own full file system, which communicates with
  those in other clusters. Pieces are name server, open-file server, and
  block-file server. On first access, the file is mapped into the application
  space. VM system calls BFS to arrange transfers. Open questions: policies for
  file state management, block distribution, caching, and prefetching.
  Object-oriented approach used to allow for flexibility and extendability.
  Local disk file systems are log-structured.}
}

@TechReport{krystynak:datavault,
  author = {John Krystynak},
  title = {{I/O} Performance on the {Connection Machine DataVault} System},
  year = {1992},
  month = {May},
  number = {RND-92-011},
  institution = {NAS Systems Division, NASA Ames},
  keyword = {parallel I/O, parallel file system, parallel I/O, performance
  measurement, pario bib},
  comment = {Short measurements of CM-2 Datavault. Faster if you access through
  Paris. Can get nearly full 32 MB/s bandwidth. Problem in its ability to use
  multiple CMIO busses.}
}

@InProceedings{krystynak:pario,
  author = {John Krystynak and Bill Nitzberg},
  title = {Performance Characteristics of the {iPSC/860} and {CM-2} {I/O}
  Systems},
  booktitle = {Proceedings of the Seventh International Parallel Processing
  Symposium},
  year = {1993},
  pages = {837--841},
  keyword = {parallel I/O, multiprocessor file system, pario bib},
  comment = {Essentially a (short) combination of krystynak:datavault and
  nitzberg:cfs.}
}

@InProceedings{kucera:libc,
  author = {Julie Kucera},
  title = {Making {\em libc}\/ Suitable for Use by Parallel Programs},
  booktitle = {Proceedings of the Usenix Distributed and Multiprocessor Systems
  Workshop},
  year = {1989},
  pages = {145--152},
  keyword = {parallel file system interface, pario bib},
  comment = {Experience making libc reentrant, adding semaphores, {\em etc.}, on a
  Convex. Some problems with I/O. Added semaphores and private memory to make
  libc calls reentrant, i.e., callable in parallel by multiple threads.}
}

@PhdThesis{kwan:sort,
  author = {Sai Choi Kwan},
  title = {External Sorting: {I/O} Analysis and Parallel Processing
  Techniques},
  year = {1986},
  month = {January},
  school = {University of Washington},
  note = {Available as technical report 86--01--01},
  keyword = {parallel I/O, sorting, pario bib},
  comment = {Examines external sorting techniques such as merge sort, tag sort,
  multi-pass distribution sort, and one-pass distribution sort. The model is
  one where I/O complexity is included, assuming a linear seek time
  distribution and a cost of 1/2 rotation for each seek. Parallel I/O or
  computing are not considered until the distribution sorts. Architectural
  model on page 58.}
}

@InProceedings{lake:pario,
  author = {Brian Lake and Chris Gray},
  title = {Parallel {I/O} for {MIMD} Machines},
  booktitle = {Proceedings of SS'93: High Performance Computing},
  year = {1993},
  month = {June},
  pages = {301--308},
  address = {Calgary},
  keyword = {parallel I/O, MIMD, multiprocessor file system, pario bib}
}

@InProceedings{lautenbach:pfs,
  author = {Berin F. Lautenbach and Bradley M. Broom},
  title = {A Parallel File System for the {AP1000}},
  booktitle = {Proceedings of the Third Fujitsu-ANU CAP Workshop},
  year = {1992},
  month = {November},
  keyword = {distributed file system, multiprocessor file system, pario bib},
  comment = {See also broom:acacia, broom:impl, mutisya:cache, and broom:cap.}
}

@TechReport{lee:impl,
  author = {Edward K. Lee},
  title = {Software and Performance Issues in the Implementation of a {RAID}
  Prototype},
  year = {1990},
  month = {May},
  number = {UCB/CSD 90/573},
  institution = {EECS, Univ. California at Berkeley},
  keyword = {parallel I/O, disk striping, performance, pario bib},
  comment = {Details of their prototype. Defines terms like stripe unit.
  Explores ways to lay out parity. Does performance simulations. Describes ops
  needed in device driver. Good to read if you plan to implement a RAID.
  Results: small R+W, or high loads, don't care about parity placement; in low
  load, there are different best cases for large R+W. Best all-around is
  left-symmetric. See also lee:parity.}
}

@InProceedings{lee:parity,
  author = {Edward K. Lee and Randy H. Katz},
  title = {Performance Consequences of Parity Placement in Disk Arrays},
  booktitle = {Fourth International Conference on Architectural Support for
  Programming Languages and Operating Systems},
  year = {1991},
  pages = {190--199},
  keyword = {RAID, reliability, parallel I/O, pario bib},
  comment = {Interesting comparison of several parity placement schemes. Boils
  down to two basic choices, depending on whether read performance or write
  performance is more important to you.}
}

@InProceedings{livny:stripe,
  author = {M. Livny and S. Khoshafian and H. Boral},
  title = {Multi-Disk Management Algorithms},
  booktitle = {Proceedings of the 1987 ACM Sigmetrics Conference on Measurement
  and Modeling of Computer Systems},
  year = {1987},
  month = {May},
  pages = {69--77},
  keyword = {parallel I/O, disk striping, disk array, pario bib}
}

@TechReport{lo:disks,
  author = {Raymond Lo and Norman Matloff},
  title = {A Probabilistic Limit on the Virtual Size of Replicated File
  Systems},
  year = {1989},
  institution = {Department of EE and CS, UC Davis},
  keyword = {parallel I/O, replication, file system, disk mirroring, disk
  shadowing, pario bib},
  comment = {A look at shadowed disks. If you have $k$ disks set up to read
  from the disk with the shortest seek, but write to all disks, you have
  increased reliability, read time like the min of the seeks, and write time
  like the max of the seeks. It appears that with increasing $k$ you can get
  good performance. But this paper clearly shows, since writes move all disk
  heads to the same location, that the effective value of $k$ is actually quite
  low. Only 4--10 disks are likely to be useful for most traffic loads.}
}

@InProceedings{loverso:sfs,
  author = {Susan J. LoVerso and Marshall Isman and Andy Nanopoulos and William
  Nesheim and Ewan D. Milne and Richard Wheeler},
  title = {{\em sfs}: {A} Parallel File System for the {CM-5}},
  booktitle = {Proceedings of the 1993 Summer Usenix Conference},
  year = {1993},
  pages = {291--305},
  keyword = {parallel I/O, multiprocessor file system, pario bib},
  comment = {They took the Unix file system from SunOS and extended it to run
  on the CM-5. This involved handling non-power-of-two block sizes, parallel
  I/O calls, large file sizes, and more encouragement for extents to be
  allocated. The hardware is particularly suited to RAID~3 with a 16 byte
  striping unit, although in theory the software could do anything it wants.
  Geared to data-parallel model. Proc nodes (PNs) contact the timesharing
  daemon (TD) on the control processor (CP), who gets block lists from the file
  system, which runs on one of the CPs. The TD then arranges with the disk
  storage nodes (DSNs) to do the transfer directly with the PNs. Each DSN has
  8~MB of buffer space, 8 disk drives, 4 SCSI busses, and a SPARC as
  controller. Partition managers mount non-local sfs via NFS. Performance
  results good. Up to 185~MB/s on 118 (2~MB/s) disks.}
}

@Article{manuel:logjam,
  author = {Tom Manuel},
  title = {Breaking the Data-rate Logjam with arrays of small disk drives},
  journal = {Electronics},
  year = {1989},
  month = {February},
  volume = {62},
  number = {2},
  pages = {97--100},
  keyword = {parallel I/O, disk array, I/O bottleneck, pario bib},
  comment = {See also Electronics, Nov. 88 p 24, Dec. 88 p 112. Trade journal
  short on disk arrays. Very good intro. No new technical content. Concentrates
  on RAID project. Lists several commercial versions. Mostly concentrates on
  single-controller versions.}
}

@Misc{maspar:pario,
  key = {Mas},
  title = {Parallel File {I/O} Routines},
  year = {1992},
  howpublished = {MasPar Computer Corporation},
  keyword = {parallel I/O, multiprocessor file system interface, pario bib},
  comment = {Man pages for MasPar file system interface. They have either a
  single shared file pointer, after which all processors read or write in an
  interleaved pattern, or individual (plural) file pointer, allowing arbitrary
  access patterns. Updated in 1992 with many more features.}
}

@Article{masters:pario,
  author = {Del Masters},
  title = {Improve Disk Subsystem Performance with Multiple Serial Drives in
  Parallel},
  journal = {Computer Technology Review},
  year = {1987},
  month = {July},
  volume = {7},
  number = {9},
  pages = {76--77},
  keyword = {parallel I/O, pario bib},
  comment = {Information about the early Maximum Strategy disk array, which
  striped over 4 disk drives, apparently synchronously.}
}

@Article{matloff:multidisk,
  author = {Norman S. Matloff},
  title = {A Multiple-Disk System for both Fault Tolerance and Improved
  Performance},
  journal = {IEEE Transactions on Reliability},
  year = {1987},
  month = {June},
  volume = {R-36},
  number = {2},
  pages = {199--201},
  keyword = {parallel I/O, reliability, disk shadowing, disk mirroring, pario
  bib},
  comment = {Variation on mirrored disks using more than 2 disks, to spread the
  files around. Good performance increases.}
}

@InProceedings{meador:array,
  author = {Wes E. Meador},
  title = {Disk Array Systems},
  booktitle = {Proceedings of IEEE Compcon},
  year = {1989},
  month = {Spring},
  pages = {143--146},
  keyword = {parallel I/O, disk array, disk striping, pario bib},
  comment = {Describes {\em Strategy 2 Disk Array Controller}, which allows 4
  or 8 drives, hardware striped, with parity drive and 0-4 hot spares. Up to 4
  channels to cpu(s). Logical block interface. Defects, errors, formatting,
  drive failures all handled automatically. Peak 40 MB/s data transfer on each
  channel.}
}

@TechReport{milenkovic:model,
  author = {Milan Milenkovic},
  title = {A Model for Multiprocessor {I/O}},
  year = {1989},
  month = {July},
  number = {89-CSE-30},
  institution = {Dept. of Computer Science and Engineering, Southern Methodist
  University},
  keyword = {multiprocessor I/O, I/O architecture, distributed system, pario
  bib},
  comment = {Advocates using dedicated server processors for all I/O, e.g., disk
  server, terminal server, network server. Pass I/O requests and data via
  messages or RPC calls over the interconnect (here a shared bus). Server
  handles packaging, blocking, caching, errors, interrupts, and so forth,
  freeing the main processors and the interconnect from all this activity.
  Benefits: encapsulates I/O-related stuff in specific places, accommodates
  heterogeneity, improves performance. Nice idea, but allows for an I/O
  bottleneck, unless server can handle all the demand. Otherwise would need
  multiple servers, more expensive than just multiple controllers.}
}

@Article{miller:pario,
  author = {L. L. Miller and A. R. Hurson},
  title = {Multiprogramming and concurrency in parallel file environments},
  journal = {International Journal of Mini and Microcomputers},
  year = {1991},
  volume = {13},
  number = {2},
  pages = {37--45},
  keyword = {parallel file system, parallel I/O, database, pario bib},
  comment = {This is really for databases. They identify two types of file
  access: one where the file can be operated on as a set of subfiles, each
  independently by a processor (what they call MIMD mode), and another where
  the file must be operated on with a centralized control (SIMD mode), in their
  case to search a B-tree whose nodes span the set of processors. Basically it
  is a host connected to a controller, that is connected to a set of small I/O
  processors, each of which has access to disk. In many ways a uniprocessor
  perspective. Paper design, with simulation results.}
}

@InProceedings{miller:rama,
  author = {Ethan L. Miller and Randy H. Katz},
  title = {{RAMA:} A File System for Massively-Parallel Computers},
  booktitle = {Proceedings of the Twelfth IEEE Symposium on Mass Storage
  Systems},
  year = {1993},
  pages = {163--168},
  keyword = {parallel I/O, multiprocessor file system, pario bib},
  comment = {The multiprocessor's file system acts as a block cache for
  tertiary storage. Disk space is broken into ``lines'' of a few MB. Each line
  has a descriptor telling what blocks it has, and their status. (fileid,
  offset) hashed to find (disk, linenum). Intrinsic metadata stored at start of
  each file; positional metadata implicit in hashing, and line descriptors.
  Sequentiality parameter puts several blocks of a file in the same line, to
  improve medium-sized requests (otherwise generate lots of request-response
  net traffic). Not clear on best choice of size. No mention of atomicity wrt
  concurrent writes to same data. Blocks migrate to tertiary storage as they
  get old. Fetched on demand, by block (not file). Self-describing blocks have
  ids in block -- leads to screwy block sizes?}
}

@Article{milligan:bifs,
  author = {P. Milligan and L. C. Waring and A. S. C. Lee},
  title = {{BIFS}: {A} filing system for multiprocessor based systems},
  journal = {Microprocessing and Microprogramming},
  year = {1991},
  volume = {31},
  pages = {9--12},
  note = {Euromicro~'90 conference, Amsterdam},
  keyword = {multiprocessor file system, pario bib},
  comment = {A simple file system for a transputer network, attached to a
  single disk device. Several procs are devoted to the file system, but really
  just act as buffers for the host processor that runs the disk. They provide
  sequential, random access, and indexed files, either byte- or
  record-oriented. Some prototypes; no results. They add buffering and double
  buffering, but don't really get into anything interesting.}
}

@Article{mokhoff:pario,
  author = {Nicholas Mokhoff},
  title = {Parallel Disk Assembly Packs 1.5 {GBytes}, runs at 4 {MBytes/s}},
  journal = {Electronic Design},
  year = {1987},
  month = {November},
  pages = {45--46},
  keyword = {parallel I/O, I/O, disk architecture, disk striping, reliability,
  pario bib},
  comment = {Commercially available: Micropolis Systems' Parallel Disk 1800
  series. Four disks plus one parity disk, synchronized and byte-interleaved.
  SCSI interface. Total capacity 1.5 GBytes, sustained transfer rate of 4
  MBytes/s. MTTF 140,000 hours. Hard and soft errors corrected in real-time.
  Failed drives can be replaced while system is running.}
}

@TechReport{montague:swift,
  author = {Bruce R. Montague},
  title = {The {Swift/RAID} Distributed Transaction Driver},
  year = {1993},
  month = {January},
  number = {UCSC-CRL-93-99},
  institution = {UC Santa Cruz},
  keyword = {RAID, parallel I/O, distributed file system, transaction, pario
  bib},
  comment = {See other Swift papers, e.g., cabrera:pario.}
}

@Article{moren:controllers,
  author = {William D. Moren},
  title = {Design of Controllers is Key Element in Disk Subsystem Throughput},
  journal = {Computer Technology Review},
  year = {1988},
  month = {Spring},
  pages = {71--73},
  keyword = {parallel I/O, disk architecture, pario bib},
  comment = {A short paper on some basic techniques used by disk controllers to
  improve throughput: seek optimization, request combining, request queuing,
  using multiple drives in parallel, scatter/gather DMA, data caching,
  read-ahead, cross-track read-ahead, write-back caching, segmented caching,
  reduced latency (track buffering), and format skewing. [Most of these are
  already handled in Unix file systems.]}
}

@InProceedings{muntz:failure,
  author = {Richard R. Muntz and John C. S. Lui},
  title = {Performance Analysis of Disk Arrays Under Failure},
  booktitle = {16th International Conference on Very Large Data Bases},
  year = {1990},
  pages = {162--173},
  keyword = {disk array, parallel, performance analysis, pario bib},
  comment = {Looked at RAID5 when in failure mode. For small-reads workload,
  could only get 50\% of normal. So they decouple cluster size and parity-group
  size, so that they decluster over more disks than group size; during failure,
  this causes less of a load increase on surviving disks.}
}

@InProceedings{mutisya:cache,
  author = {Gerald Mutisya and Bradley M. Broom},
  title = {Distributed File Caching for the {AP1000}},
  booktitle = {Proceedings of the Third Fujitsu-ANU CAP Workshop},
  year = {1992},
  month = {November},
  keyword = {distributed file system, multiprocessor file system, pario bib},
  comment = {See also broom:acacia, broom:impl, lautenbach:pfs, and broom:cap.}
}

@InProceedings{nagashima:pario,
  author = {Umpei Nagashima and Takashi Shibata and Hiroshi Itoh and Minoru
  Gotoh},
  title = {An Improvement of {I/O} Function for Auxiliary Storage: {Parallel
  I/O} for a Large Scale Supercomputing},
  booktitle = {1990 International Conference on Supercomputing},
  year = {1990},
  pages = {48--59},
  keyword = {parallel I/O, pario bib},
  comment = {Using parallel I/O channels to access striped disks, in parallel
  from a supercomputer. They {\em chain}\/ (i.e., combine) requests to a disk
  for large contiguous accesses.}
}

@TechReport{ncr:3600,
  key = {NCR},
  title = {{NCR 3600} Product Description},
  year = {1991},
  month = {September},
  number = {ST-2119-91},
  institution = {NCR},
  address = {San Diego},
  keyword = {multiprocessor architecture, MIMD, parallel I/O, pario bib},
  comment = {Has 1-32 50MHz Intel 486 processors. Parallel independent disks on
  the disk nodes, separate from the processor nodes. Tree interconnect. Aimed
  at database applications.}
}

@InProceedings{ng:diskarray,
  author = {Spencer Ng},
  title = {Some Design Issues of Disk Arrays},
  booktitle = {Proceedings of IEEE Compcon},
  year = {1989},
  month = {Spring},
  pages = {137--142},
  note = {San Francisco, CA},
  keyword = {parallel I/O, disk array, pario bib},
  comment = {Discusses disk arrays and striping. Transfer size is important to
  striping success: small size transfers are better off with independent disks.
  Synchronized rotation is especially important for small transfer sizes, since
  then the increased rotational delays dominate. Fine grain striping involves
  less assembly/disassembly delay, but coarse grain (block) striping allows for
  request parallelism. Fine grain striping wastes capacity due to fixed size
  formatting overhead. He also derives exact MTTF equation for 1-failure
  tolerance and on-line repair.}
}

@InProceedings{ng:interleave,
  author = {S. Ng and D. Lang and R. Selinger},
  title = {Trade-offs Between Devices and Paths in Achieving Disk
  Interleaving},
  booktitle = {Proceedings of the 15th Annual International Symposium on
  Computer Architecture},
  year = {1988},
  pages = {196--201},
  keyword = {parallel I/O, disk architecture, disk caching, I/O bottleneck,
  pario bib},
  comment = {Compares four different ways of restructuring IBM disk controllers
  and channels to obtain more parallelism. They use parallel heads or parallel
  actuators. The best results come when they replicate the control electronics
  to maintain the number of data paths through the controller. Otherwise the
  controller bottleneck reduces performance. Generally, for large or small
  transfer sizes, parallel heads with replication gave better performance.}
}

@InProceedings{nishino:sfs,
  author = {H. Nishino and S. Naka and K Ikumi},
  title = {High Performance File System for Supercomputing Environment},
  booktitle = {Proceedings of Supercomputing '89},
  year = {1989},
  pages = {747--756},
  keyword = {supercomputer, file system, parallel I/O, pario bib},
  comment = {A modification to the Unix file system to allow for supercomputer
  access. Workload: file size from few KB to few GB, I/O operation size from
  few bytes to hundreds of MB. Generally programs split into I/O-bound and
  CPU-bound parts. Sequential and random access. Needs: giant files (bigger
  than device), peak hardware performance for large files, NFS access. Their FS
  is built into Unix ``transparently''. Space allocated in clusters, rather
  than blocks; clusters might be as big as a cylinder. Allows for efficient,
  large files. Mentions parallel disks as part of a ``virtual volume'' but does
  not elaborate. Prefetching within a cluster.}
}

@TechReport{nitzberg:cfs,
  author = {Bill Nitzberg},
  title = {Performance of the {iPSC/860} Concurrent File System},
  year = {1992},
  month = {December},
  number = {RND-92-020},
  institution = {NAS Systems Division, NASA Ames},
  keyword = {Intel, parallel file system, performance measurement, parallel
  I/O, pario bib},
  comment = {Straightforward measurements of an iPSC/860 with 128 compute
  nodes, 10 I/O nodes, and 10 disks. This is a bigger system than has been
  measured before. Has some basic MB/s measurements for some features in Tables
  1--2. CFS bug prevents more than 2 asynch requests at a time. Another bug
  forced random-writes to use preallocated files. For low number of procs, they
  weren't able to pull the full disk bandwidth. Cache thrashing caused problems
  when they had a large number of procs, because each read prefetched 8 blocks,
  which were flushed by some other proc doing a subsequent read. Workaround by
  synchronizing procs to limit concurrency. Increasing cache size is the right
  answer, but is not scalable.}
}

@TechReport{nodine:greed,
  author = {Mark H. Nodine and Jeffrey Scott Vitter},
  title = {Greed Sort: An Optimal External Sorting Algorithm for Multiple
  Disks},
  year = {1992},
  number = {CS--91--20},
  institution = {Brown University},
  note = {A summary appears in SPAA~'91.},
  keyword = {parallel I/O algorithms, sorting, pario bib},
  comment = {Summary is nodine:sort.}
}

@InProceedings{nodine:loadbalance,
  author = {Mark H. Nodine and Jeffrey Vitter},
  title = {Load Balancing Paradigms for Optimal Use of Parallel Disks and
  Parallel Memory Hierarchies},
  booktitle = {Proceedings of the 1993 DAGS/PC Symposium},
  year = {1993},
  month = {June},
  pages = {26--39},
  organization = {Dartmouth Institute for Advanced Graduate Studies},
  address = {Hanover, NH},
  keyword = {parallel I/O algorithm, memory hierarchy, load balance, sorting,
  pario bib},
  abstract = {We present several load balancing paradigms pertinent to
  optimizing I/O performance with disk and processor parallelism. We use
  sorting as our canonical application to illustrate the paradigms, and we
  survey a wide variety of applications in computational geometry. The use of
  parallel disks can help overcome the I/O bottleneck in sorting if the records
  in each read or write are evenly balanced among the disks. There are three
  known load balancing paradigms that lead to optimal I/O algorithms: using
  randomness to assign blocks to disks, using the disks predominantly
  independently, and deterministically balancing the blocks by matching. In
  this report, we describe all of these techniques in detail and compare their
  relative advantages. We show how randomized and deterministic balancing can
  be extended to provide sorting algorithms that are optimal both in terms of
  the number of I/Os and the internal processing time for parallel-processing
  machines with scalable I/O subsystems and with parallel memory hierarchies.
  We also survey results achieving optimal performance in the these models for
  a large range of online and batch problems in computational geometry.},
  comment = {Invited speaker: Jeffrey Vitter.}
}

@InProceedings{nodine:sort,
  author = {Mark H. Nodine and Jeffrey Scott Vitter},
  title = {Large-Scale Sorting in Parallel Memories},
  booktitle = {Proceedings of the 3rd Annual ACM Symposium on Parallel
  Algorithms and Architectures},
  year = {1991},
  pages = {29--39},
  keyword = {external sorting, file access pattern, parallel I/O, pario bib},
  comment = {Describes algorithms for external sorting that are optimal in the
  number of I/Os. Proposes a couple of fairly-realistic memory hierarchy
  models.}
}

@TechReport{nodine:sort2,
  author = {Mark H. Nodine and Jeffrey Scott Vitter},
  title = {Optimal Deterministic Sorting in Parallel Memory Hierarchies},
  year = {1992},
  month = {August},
  number = {CS--92--38},
  institution = {Brown University},
  note = {Submitted.},
  keyword = {parallel I/O algorithms, parallel memory, sorting, pario bib}
}

@TechReport{nodine:sortdisk,
  author = {Mark H. Nodine and Jeffrey Scott Vitter},
  title = {Optimal Deterministic Sorting on Parallel Disks},
  year = {1992},
  month = {August},
  number = {CS--92--08},
  institution = {Brown University},
  note = {Submitted.},
  keyword = {parallel I/O algorithms, sorting, pario bib}
}

@TechReport{ogata:diskarray,
  author = {Mikito Ogata and Michael J. Flynn},
  title = {A Queueing Analysis for Disk Array Systems},
  year = {1990},
  number = {CSL-TR-90-443},
  institution = {Stanford University},
  keyword = {disk array, performance analysis, pario bib},
  comment = {Fairly complex analysis of a multiprocessor attached to a disk
  array system through a central server that is the buffer. Assumes
  task-oriented model for parallel system, where tasks can be assigned to any
  CPU; this makes for an easy model. Like Reddy, they compare declustering and
  striping (they call them striped and synchronized disks).}
}

@Article{olson:random,
  author = {Thomas M. Olson},
  title = {Disk Array Performance in a Random {I/O} Environment},
  journal = {Computer Architecture News},
  year = {1989},
  month = {September},
  volume = {17},
  number = {5},
  pages = {71--77},
  keyword = {I/O benchmark, transaction processing, pario bib},
  comment = {See wolman:iobench. Used IOBENCH to compare normal disk
  configuration with striped disks, RAID level 1, and RAID level 5, under a
  random I/O workload. Multiple disks with files on different disks gave good
  performance (high throughput and low response time) when multiple users.
  Striping ensures balanced load, similar performance. RAID level 1 or level 5
  ensures reliability at performance cost over striping, but still good.
  Especially sensitive to write/read ratio --- performance lost for large
  number of writes.}
}

@InProceedings{oyang:m2io,
  author = {Yen-Jen Oyang},
  title = {Architecture, Operating System, and {I/O} Subsystem Design of the
  {$M^2$} Database Machine},
  booktitle = {Proceedings of the Parallel Systems Fair at the International
  Parallel Processing Symposium},
  year = {1993},
  pages = {31--38},
  keyword = {parallel I/O, multiprocessor file system, parallel database, pario
  bib},
  comment = {A custom multiprocessor with a shared-memory clusters networked
  together and to shared disks. Runs Mach. Directory-based coherence protocol
  for the distributed file system. Background writeback.}
}

@TechReport{park:pario,
  author = {Arvin Park and K. Balasubramanian},
  title = {Providing Fault Tolerance in Parallel Secondary Storage Systems},
  year = {1986},
  month = {November},
  number = {CS-TR-057-86},
  institution = {Department of Computer Science, Princeton University},
  keyword = {parallel I/O, reliability, pario bib},
  comment = {They use ECC with one or more parity drives in bit-interleaved
  systems, and on-line regeneration of failed drives from spares. More
  cost-effective than mirrored disks. One of the earliest references to
  RAID-like concepts.}
}

@InProceedings{patterson:raid,
  author = {David Patterson and Garth Gibson and Randy Katz},
  title = {A case for redundant arrays of inexpensive disks {(RAID)}},
  booktitle = {ACM SIGMOD Conference},
  year = {1988},
  month = {June},
  pages = {109--116},
  keyword = {parallel I/O, RAID, reliability, cost analysis, I/O bottleneck,
  disk array, pario bib},
  comment = {Make a good case for the upcoming I/O crisis, compare single large
  expensive disks (SLED) with small cheap disks. Outline five levels of RAID
  the give different reliabilities, costs, and performances. Block-interleaved
  with a single check disk (level 4) or with check blocks interspersed (level
  5) seem to give best performance for supercomputer I/O or database I/O or
  both. Note: the TR by the same name (UCB/CSD 87/391) is essentially
  identical.}
}

@InProceedings{patterson:raid2,
  author = {David Patterson and Peter Chen and Garth Gibson and Randy H. Katz},
  title = {Introduction to Redundant Arrays of Inexpensive Disks {(RAID)}},
  booktitle = {Proceedings of IEEE Compcon},
  year = {1989},
  month = {Spring},
  pages = {112--117},
  keyword = {parallel I/O, RAID, reliability, cost analysis, I/O bottleneck,
  disk array, pario bib},
  comment = {A short version of patterson:raid, with some slight updates.}
}

@InProceedings{philippsen:triton,
  author = {Michael Philippsen and Thomas M. Warschko and Walter F. Tichy and
  Christian G. Herter},
  title = {{Project Triton:} Towards improved Programmability of Parallel
  Machines},
  booktitle = {Proceedings of the Twenty-Sixth Annual Hawaii International
  Conference on System Sciences},
  year = {1993},
  volume = {I},
  pages = {192--201},
  keyword = {parallel programming, parallel architecture, parallel I/O, pario
  bib},
  comment = {A language- and application-driven proposal for parallel
  architecture, that mixes SIMD and MIMD, high-performance networking, large
  memory, shared address space, and so forth. Their prototype, the Triton, fits
  most of their claims. Fairly convincing arguments. One disk per node. Little
  mention of a file system though. Email from student Udo Boehm:``We use in the
  version of Triton/1 with 256 PE's 72 Disks at the moment (the filesystem is
  scalable up to 256 Disks). These Disks are divided into 8 Groups with 9
  Disks. In each group exists one parity disk. Our implementation of the
  filesystem is an parallel version of RAID Level 3 with some extensions. We
  use so called vector files for diskaccess. A file is always distributed over
  all disks of the diskarray. A vectorfile is divided in logical blocks. A
  logical block exist of 72 physical blocks, each block is on one of the 72
  disks and all these 72 physical blocks have the same blocknumber on each
  disk. A logical block has 18432 Bytes, where 16384 Bytes are for Data. The
  filesystem uses these logical blocks to save data. We do not use special PE's
  for the I/O. All PE's can be (are) used to do I/O ! There exists no central
  which coordinates the PE's.''}
}

@InProceedings{pierce:pario,
  author = {Paul Pierce},
  title = {A Concurrent File System for a Highly Parallel Mass Storage System},
  booktitle = {Fourth Conference on Hypercube Concurrent Computers and
  Applications},
  year = {1989},
  pages = {155--160},
  keyword = {parallel I/O, hypercube, Intel iPSC/2, multiprocessor file system,
  pario bib},
  comment = {Intel iPSC/2 Concurrent File System. Chose to tailor system for
  high performance for large files, read in large chunks. Uniform logical file
  system view, Unix stdio interface. Blocks scattered over all disks, but not
  striped. Blocksize 4K optimizes message-passing performance without using
  blocks that are too big. Tree-directory is stored in ONE file and managed by
  ONE process, so opens are bottlenecked, but that is not their emphasis. File
  headers, however, are scattered. The file header info contains a list of
  blocks. File header is managed by disk process on its I/O node. Data caching
  is done only at the I/O node of the originating disk drive. Read-ahead is
  used but not detailed here.}
}

@InProceedings{poston:hpfs,
  author = {Alan Poston},
  title = {A High Performance File System for {UNIX}},
  booktitle = {Proceedings of the Usenix Workshop on UNIX and Supercomputers},
  year = {1988},
  pages = {215--226},
  keyword = {file system, unix, parallel I/O, disk striping, pario bib},
  comment = {A new file system for Unix based on striped files. Better
  performance for sequential access, better for large-file random access and
  about the same for small-file random access. Allows full striping track
  prefetch, or even volume prefetch. Needs a little bit of buffer management
  change. Talks about buffer management and parity blocks.}
}

@InProceedings{pratt:twofs,
  author = {Terrence W. Pratt and James C. French and Phillip M. Dickens and
  Janet, Jr., Stanley A.},
  title = {A Comparison of the Architecture and Performance of Two Parallel
  File Systems},
  booktitle = {Fourth Conference on Hypercube Concurrent Computers and
  Applications},
  year = {1989},
  pages = {161--166},
  keyword = {parallel I/O, Intel iPSC/2, nCUBE, pario bib},
  comment = {Simple comparison of the iPSC/2 and nCUBE/10 parallel I/O systems.
  Short description of each system, with simple transfer rate measurements. See
  also french:ipsc2io-tr.}
}

@InProceedings{reddy:hyperio1,
  author = {A. L. Reddy and P. Banerjee and Santosh G. Abraham},
  title = {{I/O} Embedding in Hypercubes},
  booktitle = {Proceedings of the 1988 International Conference on Parallel
  Processing},
  year = {1988},
  volume = {1},
  pages = {331--338},
  keyword = {parallel I/O, hypercube, pario bib},
  comment = {Emphasis is on adjacency. It also implies (and they assume) that
  data is distributed well across the disks so no data needs to move beyond the
  neighbors of an I/O node. Still, the idea of adjacency is good since it
  allows for good data distribution while not requiring it, and for balancing
  I/O procs among procs in a good way. Also avoids messing up the hypercube
  regularity with (embedded) dedicated I/O nodes.}
}

@InProceedings{reddy:hyperio2,
  author = {A. L. Reddy and P. Banerjee},
  title = {{I/O} issues for hypercubes},
  booktitle = {International Conference on Supercomputing},
  year = {1989},
  pages = {72--81},
  keyword = {parallel I/O, hypercube, pario bib},
  comment = {See reddy:hyperio3 for extended version.}
}

@Article{reddy:hyperio3,
  author = {A. L. Narasimha Reddy and Prithviraj Banerjee},
  title = {Design, Analysis, and Simulation of {I/O} Architectures for
  Hypercube Multiprocessors},
  journal = {IEEE Transactions on Parallel and Distributed Systems},
  year = {1990},
  month = {April},
  volume = {1},
  number = {2},
  pages = {140--151},
  keyword = {parallel I/O, hypercube, pario bib},
  comment = {An overall paper restating their embedding technique from
  reddy:hyperio1, plus a little bit of evaluation along the lines of
  reddy:pario2, plus some ideas about matrix layout on the disks. They claim
  that declustering is important, since synchronized disks do not provide
  enough parallelism, especially in the communication across the hypercube
  (since the synchronized disks must hang off one node).}
}

@InProceedings{reddy:pario,
  author = {A. Reddy and P. Banerjee},
  title = {An Evaluation of multiple-disk {I/O} systems},
  booktitle = {Proceedings of the 1989 International Conference on Parallel
  Processing},
  year = {1989},
  pages = {I:315--322},
  keyword = {parallel I/O, disk array, disk striping, pario bib},
  comment = {see also expanded version reddy:pario2}
}

@Article{reddy:pario2,
  author = {A. Reddy and P. Banerjee},
  title = {Evaluation of multiple-disk {I/O} systems},
  journal = {IEEE Transactions on Computers},
  year = {1989},
  month = {December},
  volume = {38},
  pages = {1680--1690},
  keyword = {parallel I/O, disk array, disk striping, pario bib},
  comment = {See also reddy:pario and reddy:pario3. Compares declustered disks
  (sortof MIMD-like) to synchronized-interleaved (SIMD-like). Declustering
  needed for scalability, and is better for scientific workloads. Handles large
  parallelism needed for scientific workloads and for RAID-like architectures.
  Synchronized interleaving is better for general file system workloads due to
  better utilization and reduction of seek overhead.}
}

@Article{reddy:pario3,
  author = {A. L. Reddy and Prithviraj Banerjee},
  title = {A Study of Parallel Disk Organizations},
  journal = {Computer Architecture News},
  year = {1989},
  month = {September},
  volume = {17},
  number = {5},
  pages = {40--47},
  keyword = {parallel I/O, disk array, disk striping, pario bib},
  comment = {nothing new over expanded version reddy:pario2, little different
  from reddy:pario}
}

@InProceedings{reddy:perfectio,
  author = {A. L. Narasimha Reddy and Prithviraj Banerjee},
  title = {A Study of {I/O} Behavior of {Perfect} Benchmarks on a
  Multiprocessor},
  booktitle = {Proceedings of the 17th Annual International Symposium on
  Computer Architecture},
  year = {1990},
  pages = {312--321},
  keyword = {parallel I/O, file access pattern, workload, multiprocessor file
  system, benchmark, pario bib},
  comment = {Using five applications from the Perfect benchmark suite, they
  studied both implicit (paging) and explicit (file) I/O activity. They found
  that the paging activity was relatively small and that sequential access to
  VM was common. All access to files was sequential, though this may be due to
  the programmer's belief that the file system is sequential. Buffered I/O
  would help to make transfers bigger and more efficient, but there wasn't
  enough rereferencing to make caching useful.}
}

@PhdThesis{reddy:thesis,
  author = {Narasimha {Reddy L. Annapareddy}},
  title = {Parallel Input/Output Architectures for Multiprocessors},
  year = {1990},
  month = {August},
  school = {University of Illinois at Urbana-Champaign},
  note = {Available as UILU-ENG-90-2235 or CRHC-90-5.},
  keyword = {parallel I/O, multiprocessor architecture, pario bib}
}

@Article{rettberg:monarch,
  author = {Randall D. Rettberg and William R. Crowther and Philip P. Carvey
  and Raymond S. Tomlinson},
  title = {The {Monarch Parallel Processor} Hardware Design},
  journal = {IEEE Computer},
  year = {1990},
  month = {April},
  volume = {23},
  number = {4},
  pages = {18--30},
  keyword = {MIMD, parallel architecture, shared memory, parallel I/O, pario
  bib},
  comment = {This describes the Monarch computer from BBN. It was never built.
  65K processors and memory modules. 65GB RAM. Bfly-style switch in dance-hall
  layout. Switch is synchronous; one switch time is a {\em frame} (one
  microsecond, equal to 3 processor cycles) and all processors may reference
  memory in one frame time. Local I-cache only. Contention reduces full
  bandwidth by 16 percent. Full 64-bit machine. Custom VLSI. Each memory
  location has 8 tag bits. One allows for a location to be locked by a
  processor. Thus, any FetchAndOp or full/empty model can be supported. I/O is
  done by adding I/O processors (up to 2K in a 65K-proc machine) in the switch.
  They plan 200 disks, each with an I/O processor, for 65K nodes. They would
  spread each block over 9 disks, including one for parity (essentially RAID).}
}

@InProceedings{rothnie:ksr,
  author = {James Rothnie},
  title = {{Kendall Square Research:} Introduction to the {KSR1}},
  booktitle = {Proceedings of the 1992 DAGS/PC Symposium},
  year = {1992},
  month = {June 23--27},
  pages = {200--210},
  organization = {Dartmouth Institute for Advanced Graduate Studies},
  address = {Hanover, NH},
  keyword = {parallel architecture, shared memory, MIMD, interconnection
  network, parallel I/O, memory-mapped files, pario bib},
  comment = {Overview of the KSR1.}
}

@InProceedings{salem:diskstripe,
  author = {Kenneth Salem and Hector Garcia-Molina},
  title = {Disk Striping},
  booktitle = {IEEE 1986 Conference on Data Engineering},
  year = {1986},
  pages = {336--342},
  keyword = {parallel I/O, disk striping, disk array, pario bib},
  comment = {See the techreport salem:striping for a nearly identical but more
  detailed version.}
}

@TechReport{salem:striping,
  author = {Kenneth Salem and Hector Garcia-Molina},
  title = {Disk Striping},
  year = {1984},
  month = {December},
  number = {332},
  institution = {EECS Dept. Princeton Univ.},
  keyword = {parallel I/O, disk striping, disk array, pario bib},
  comment = {Cite salem:diskstripe instead. Basic paper on striping. For
  uniprocessor, single-user machine. Interleaving asynchronous, even without
  matching disk locations though this is discussed. All done with models.}
}

@InProceedings{salmon:cubix,
  author = {John Salmon},
  title = {{CUBIX: Programming} Hypercubes without Programming Hosts},
  booktitle = {Proceedings of the Second Conference on Hypercube
  Multiprocessors},
  year = {1986},
  pages = {3--9},
  keyword = {hypercube, multiprocessor file system interface, pario bib},
  comment = {Previously, hypercubes were programmed as a combination of host
  and node programs. Salmon proposes to use a universal host program that acts
  essentially as a file server, responding to requests from the node programs.
  Two modes: crystalline, where node programs run in loose synchrony, and
  amorphous, where node programs are asynchronous. In the crystalline case,
  files have a single file pointer and are either single- or multiple- access;
  single access means all nodes must simultaneously issue the same request;
  multiple access means they all simultaneously issue the same request with
  different parameters, giving an interleaved pattern. Amorphous allows
  asynchronous activity, with separate file pointers per node.}
}

@TechReport{schulze:raid,
  author = {Martin Schulze},
  title = {Considerations in the Design of a {RAID} Prototype},
  year = {1988},
  month = {August},
  number = {UCB/CSD 88/448},
  institution = {UC Berkeley},
  keyword = {parallel I/O, RAID, disk array, disk architecture, pario bib},
  comment = {Very practical description of the RAID I prototype.}
}

@InProceedings{schulze:raid2,
  author = {Martin Schulze and Garth Gibson and Randy Katz and David
  Patterson},
  title = {How Reliable is a {RAID}?},
  booktitle = {Proceedings of IEEE Compcon},
  year = {1989},
  month = {Spring},
  keyword = {parallel I/O, reliability, RAID, disk array, disk architecture,
  pario bib},
  comment = {Published version of second paper in chen:raid. Some overlap with
  schulze:raid, though that paper has more detail.}
}

@InProceedings{scott:matrix,
  author = {David S. Scott},
  title = {Parallel {I/O} and Solving Out of Core Systems of Linear Equations},
  booktitle = {Proceedings of the 1993 DAGS/PC Symposium},
  year = {1993},
  month = {June},
  pages = {123--130},
  organization = {Dartmouth Institute for Advanced Graduate Studies},
  address = {Hanover, NH},
  keyword = {parallel I/O, scientific computing, matrix factorization, Intel,
  pario bib},
  abstract = {Large systems of linear equations arise in a number of scientific
  and engineering applications. In this paper we describe the implementation of
  a family of disk based linear equation solvers and the required
  characteristics of the I/O system needed to support them.},
  comment = {Invited speaker.}
}

@InProceedings{shin:hartsio,
  author = {Kang G. Shin and Greg Dykema},
  title = {A Distributed {I/O} Architecture for {HARTS}},
  booktitle = {Proceedings of the 17th Annual International Symposium on
  Computer Architecture},
  year = {1990},
  pages = {332--342},
  keyword = {parallel I/O, multiprocessor architecture, MIMD, fault tolerance,
  pario bib},
  comment = {HARTS is a multicomputer connected with a wrapped hexagonal mesh,
  with an emphasis on real-time and fault tolerance. The mesh consists of
  network routing chips. Hanging off each is a small bus-based multiprocessor
  ``node''. They consider how to integrate I/O devices into this architecture:
  attach device controllers to processors, to network routers, to node busses,
  or via a separate network. They decided to compromise and hang each I/O
  controller off three network routers, in the triangles of the hexagonal mesh.
  This keeps the traffic off of the node busses, and allows multiple paths to
  each controller. They discuss the reachability and hop count in the presence
  of failed nodes and links.}
}

@Article{smotherman:taxonomy,
  author = {Mark Smotherman},
  title = {A Sequencing-based Taxonomy of {I/O} Systems and Review of
  Historical Machines},
  journal = {Computer Architecture News},
  year = {1989},
  month = {September},
  volume = {17},
  number = {5},
  pages = {5--15},
  keyword = {I/O architecture, historical summary, pario bib},
  comment = {Classifies I/O systems by how they initiate and terminate I/O.
  Uniprocessor and Multiprocessor systems.}
}

@Misc{snir:hpfio,
  author = {Marc Snir},
  title = {Proposal for {IO}},
  year = {1992},
  month = {August 31,},
  howpublished = {Posted to HPFF I/O Forum},
  note = {Second Draft.},
  keyword = {parallel I/O, multiprocessor file system interface, pario bib},
  comment = {An outline of two possible ways to specify mappings of arrays to
  storage nodes in a multiprocessor, and to make unformatted parallel transfers
  of multiple records. Seems to apply only to arrays, and to files that hold
  only arrays. It keeps the linear structure of files as sequences of records,
  but in some cases does not preserve the order of data items or of fields
  within subrecords. Tricky to understand unless you know HPF and Fortran 90.}
}

@InProceedings{solworth:mirror,
  author = {John A. Solworth and Cyril U. Orji},
  title = {Distorted Mirrors},
  booktitle = {Proceedings of the First International Conference on Parallel
  and Distributed Information Systems},
  year = {1991},
  pages = {10--17},
  keyword = {disk mirroring, parallel I/O, pario bib},
  comment = {Write one disk (the master) in the usual way, and write the slave
  disk at the closest free block. Actually, logically partition the two disks
  so that each disk has a master partition and a slave partition. Up to 80\%
  improvement in small-write performance, while retaining good sequential read
  performance.}
}

@Article{solworth:mirror2,
  author = {John A. Solworth and Cyril U. Orji},
  title = {Distorted Mirrors},
  journal = {Journal of Distributed and Parallel Databases},
  year = {1993},
  month = {January},
  volume = {1},
  number = {1},
  pages = {81--102},
  keyword = {disk mirroring, parallel I/O, pario bib},
  comment = {See solworth:mirror.}
}

@MastersThesis{stabile:disks,
  author = {James Joseph Stabile},
  title = {Disk Scheduling Algorithms for a Multiple Disk System},
  year = {1988},
  school = {UC Davis},
  keyword = {parallel I/O, parallel file system, disk mirroring, disk
  scheduling, pario bib},
  comment = {Describes simulation based on model of disk access pattern.
  Multiple-disk system, much like in matloff:multidisk. Files stored in two
  copies, each on a separate disk, but there are more than two disks, so this
  differs from mirroring. He compares several disk scheduling algorithms. A
  variant of SCAN seems to be the best.}
}

@InProceedings{stodolsky:logging,
  author = {Daniel Stodolsky and Garth Gibson and Mark Holland},
  title = {Parity Logging: Overcoming the Small Write Problem in Redundant Disk
  Arrays},
  booktitle = {Proceedings of the 20th Annual International Symposium on
  Computer Architecture},
  year = {1993},
  pages = {64--75},
  keyword = {parallel I/O, RAID, redundancy, reliability, pario bib},
  comment = {Parity logging to improve small writes. Log all parity updates;
  when it fills, go redo parity disk. Actually distribute the parity and log
  across all disks. Performance is comparable to, or exceeding, mirroring. Also
  handling double failures.}
}

@Article{stone:query,
  author = {Harold S. Stone},
  title = {Parallel Querying of Large Databases: {A} Case Study},
  journal = {IEEE Computer},
  year = {1987},
  month = {October},
  volume = {20},
  number = {10},
  pages = {11--21},
  keyword = {parallel I/O, database, SIMD, connection machine, pario bib},
  comment = {See also IEEE Computer, Jan 1988, p. 8 and 10. Examines a database
  query that is parallelized for the Connection Machine. He shows that in many
  cases, a smarter serial algorithm that reads only a portion of the database
  (through an index) will be faster than 64K processors reading the whole
  database. Uses a simple model for the machines to show this. Reemphasizes the
  point of Boral and DeWitt that I/O is the bottleneck of a database machine,
  and that parallelizing the processing will not necessarily help a great
  deal.}
}

@InProceedings{stonebraker:radd,
  author = {Michael Stonebraker and Gerhard A. Schloss},
  title = {Distributed {RAID} --- {A} New Multiple Copy Algorithm},
  booktitle = {Proceedings of 6th International Data Engineering Conference},
  year = {1990},
  pages = {430--437},
  keyword = {disk striping, reliability, pario bib},
  comment = {This is about ``RADD'', a distributed form of RAID. Meant for
  cases where the disks are physically distributed around several sites, and no
  one controller controls them all. Much lower space overhead than any
  mirroring technique, with comparable normal-mode performance at the expense
  of failure-mode performance.}
}

@TechReport{stonebraker:xprs,
  author = {Michael Stonebraker and Randy Katz and David Patterson and John
  Ousterhout},
  title = {The Design of {XPRS}},
  year = {1988},
  month = {March},
  number = {UCB/ERL M88/19},
  institution = {UC Berkeley},
  keyword = {parallel I/O, disk array, RAID, Sprite, disk architecture,
  database, pario bib},
  comment = {Designing a DBMS for Sprite and RAID. High availability, high
  performance. Shared memory multiprocessor. Allocates extents to files that
  are a interleaved over a variable number of disks, and over a contiguous set
  of tracks on those disks.}
}

@Unpublished{taber:metadisk,
  author = {David Taber},
  title = {{MetaDisk} Driver Technical Description},
  year = {1990},
  month = {October},
  note = {SunFlash electronic mailing list 22(9)},
  keyword = {disk mirroring, parallel I/O, pario bib},
  comment = {MetaDisk is a addition to the Sun SPARCstation server kernel. It
  allows disk mirroring between any two local disk partitions, or concatenation
  of several disk partitions into one larger partition. Can span up to 4
  partitions simultaneously. Appears not to be striped, just allows bigger
  partitions, and (by chance) some parallel I/O for large files.}
}

@TechReport{think:cm-2,
  key = {TMC},
  title = {{Connection Machine} Model {CM-2} Technical Summary},
  year = {1987},
  month = {April},
  number = {HA87-4},
  institution = {Thinking Machines},
  keyword = {parallel I/O, connection machine, disk array, disk architecture,
  SIMD, pario bib},
  comment = {I/O and Data Vault, pp. 27--30}
}

@Book{think:cm5,
  key = {TMC},
  title = {The {Connection Machine} {CM-5} Technical Summary},
  year = {1991},
  month = {October},
  publisher = {Thinking Machines Corporation},
  keyword = {computer architecture, connection machine, MIMD, SIMD, parallel
  I/O, pario bib},
  comment = {Some detail but still skips over some key aspects (like
  communication topology. Neat communications support makes for user-mode
  message-passing, broadcasting, reductions, all built in. Lots of info here.
  File system calls allows data to be transferred in parallel directly from I/O
  node to processing node, bypassing the partition and I/O management nodes.
  Multiple I/O devices (even DataVaults) can be logically striped. See also
  best:cmmdio, loverso:sfs, think:cmmd, think:sda.}
}

@Manual{think:cmmd,
  key = {TMC},
  title = {{CMMD} User's Guide},
  year = {1992},
  month = {January},
  organization = {Thinking Machines Corporation},
  keyword = {MIMD, parallel programming, parallel I/O, message-passing, pario
  bib}
}

@Misc{think:sda,
  key = {TMC},
  title = {{CM-5} Scalable Disk Array},
  year = {1992},
  month = {November},
  howpublished = {Thinking Machines Corporation glossy},
  keyword = {parallel I/O, disk array, striping, RAID, pario bib},
  comment = {Disk storage nodes (processor, network interface, buffer, 4 SCSI
  controllers, 8 disks) attach individually to the CM-5 network. The software
  stripes across all nodes in the system. Thus, the collection of nodes is
  called a disk array. Multiple file systems across the array. Flexible
  redundancy. RAID~3 is used, i.e., bit-striped and a single parity disk. Remote
  access via NFS supported. Files stored in canonical order, with special
  hardware to help distribute data across processors. See best:cmmdio.}
}

@Manual{tmc:cmio,
  key = {TMC},
  title = {Programming the {CM I/O} System},
  year = {1990},
  month = {November},
  organization = {Thinking Machines Corporation},
  keyword = {parallel I/O, file system interface, multiprocessor file system,
  pario bib},
  comment = {Have two types of files, parallel and serial, differing in the way
  data is laid out internally. Also have three modes for reading the file:
  synchronous, streaming (asynchronous), and buffered.}
}

@InProceedings{towsley:cpuio,
  author = {Donald F. Towsley},
  title = {The Effects of {CPU: I/O} Overlap in Computer System
  Configurations},
  booktitle = {Proceedings of the 5th Annual International Symposium on
  Computer Architecture},
  year = {1978},
  month = {April},
  pages = {238--241},
  keyword = {parallel processing, parallel I/O, pario bib},
  comment = {Difficult to follow since it is missing its figures. ``Our most
  important result is that multiprocessor systems can benefit considerably more
  than single processor systems with the introduction of CPU: I/O overlap.''
  They overlap I/O needed by some future CPU sequence with the current CPU
  operation. They claim it looks good for large numbers of processors. Their
  orientation seems to be for multiprocessors operating on independent tasks.}
}

@Article{towsley:cpuio-parallel,
  author = {D. Towsley and K. M. Chandy and J. C. Browne},
  title = {Models for Parallel Processing within Programs: {Application} to
  {CPU: I/O} and {I/O: I/O} Overlap},
  journal = {Communications of the ACM},
  year = {1978},
  month = {October},
  volume = {21},
  number = {10},
  pages = {821--831},
  keyword = {parallel processing, parallel I/O, pario bib},
  comment = {Models CPU:I/O and I/O:I/O overlap within a program. ``Overlapping
  is helpful only when it allows a device to be utilized which would not be
  utilized without overlapping.'' In general the overlapping seems to help.}
}

@MastersThesis{vaitzblit:media,
  author = {Lev Vaitzblit},
  title = {The Design and Implementation of a High-Bandwidth File Service for
  Continuous Media},
  year = {1991},
  month = {September},
  school = {MIT},
  keyword = {multimedia, distributed file system, disk striping, pario bib},
  comment = {A DFS for multimedia. Expect large files, read-mostly, highly
  sequential. Temporal synchronization is key. An administration server handles
  opens and closes, and provides guarantees on performance (like Swift). The
  interface at the client nodes talks to the admin server transparently, and
  stripes requests over all storage nodes. Storage nodes may internally use
  RAIDs, I suppose. Files are a series of frames, rather than bytes. Each frame
  has a time offset in seconds. Seeks can be by frame number or time offset.
  File containers contain several files, and have attributes that specify
  performance requirements. Interface does prefetching, based on read direction
  (forward or backward) and any frame skips. But frames are not transmitted
  from storage server to client node until requested (client pacing). Claim
  that synchronous disk interleaving with a striping unit of one frame is best.
  Could get 30 frames/sec (3.5MB/s) with 2 DECstation 5000s and 4 disks,
  serving a client DEC 5000.}
}

@InProceedings{vandegoor:unixio,
  author = {A. J. {van de Goor} and A. Moolenaar},
  title = {{UNIX I/O} in a Multiprocessor System},
  booktitle = {Proceedings of the 1988 Winter Usenix Conference},
  year = {1988},
  pages = {251--258},
  keyword = {unix, multiprocessor file system, pario bib},
  comment = {How to split up the internals of the Unix I/O system to run on a
  shared-memory multiprocessor in a non-symmetric OS. They decided to split the
  functionality just above the buffer cache level, putting the buffer cache
  management and device drivers on the special I/O processors.}
}

@InProceedings{vitter:optimal,
  author = {Jeffrey Scott Vitter and Elizabeth A.~M. Shriver},
  title = {Optimal Disk {I/O} with Parallel Block Transfer},
  booktitle = {Proceedings of the 22nd Annual ACM Symposium on Theory of
  Computing (STOC~'90)},
  year = {1990},
  month = {May},
  pages = {159--169},
  note = {A summary appears in STOC '90},
  keyword = {parallel I/O algorithms, parallel memory, pario bib},
  comment = {Summary of vitter:parmem1 and vitter:parmem2.}
}

@TechReport{vitter:parmem1,
  author = {Jeffrey Scott Vitter and Elizabeth A. M. Shriver},
  title = {Algorithms for Parallel Memory {I}: Two-level Memories},
  year = {1992},
  month = {August},
  number = {CS--92--04},
  institution = {Brown University},
  keyword = {parallel I/O algorithms, parallel memory, pario bib},
  comment = {Summarized in vitter:optimal.}
}

@TechReport{vitter:parmem2,
  author = {Jeffrey Scott Vitter and Elizabeth A. M. Shriver},
  title = {Algorithms for Parallel Memory {II}: Hierarchical Multilevel
  Memories},
  year = {1992},
  month = {August},
  number = {CS--92--05},
  institution = {Brown University},
  note = {A summary appears in STOC '90},
  keyword = {parallel I/O algorithms, parallel memory, pario bib},
  comment = {Summarized in vitter:optimal.}
}

@TechReport{vitter:prefetch,
  author = {Jeffrey Scott Vitter and P. Krishnan},
  title = {Optimal Prefetching via Data Compression},
  year = {1991},
  month = {July},
  number = {CS--91--46},
  institution = {Brown University},
  note = {A summary appears in FOCS '91.},
  keyword = {parallel I/O algorithms, disk prefetching, pario bib},
  comment = {``This... is on prefetching, but I think the ideas will have a lot
  of use with parallel disks. The implementations we have now are doing
  amazingly well compared to LRU.'' [Vitter]}
}

@InProceedings{vitter:summary,
  author = {Jeffrey Scott Vitter},
  title = {Efficient Memory Access in Large-Scale Computation},
  booktitle = {Proceedings of the 1991 Symposium on Theoretical Aspects of
  Computer Science (STACS~'91)},
  year = {1991},
  pages = {26--41},
  publisher = {Springer-Verlag},
  address = {Berlin},
  note = {Published as {\em Lecture Notes in Computer Science}\/ volume 480},
  keyword = {parallel I/O algorithms, sorting, pario bib},
  comment = {Good overview of all the other papers.}
}

@Article{vitter:uniform,
  author = {Jeffrey Scott Vitter and Mark H. Nodine},
  title = {Large-Scale Sorting in Uniform Memory Hierarchies},
  journal = {Journal of Parallel and Distributed Computing},
  year = {1993},
  month = {January and February},
  volume = {17},
  number = {1--2},
  pages = {107--114},
  keyword = {parallel I/O algorithm, sorting, pario bib},
  comment = {Summary is nodine:sort.}
}

@Manual{vms:stripe,
  key = {DEC},
  title = {{VAX} Disk Striping Driver for {VMS}},
  year = {1989},
  month = {December},
  organization = {Digital Equipment Corporation},
  note = {Order Number AA-NY13A-TE},
  keyword = {disk striping, pario bib},
  comment = {Describes the VAX disk striping driver. Stripes an apparently
  arbitrary number of disk devices. All devices must be the same type, and
  apparently completely used. Manager can specify ``chunksize'', the number of
  logical blocks per striped block. They suggest using the track size of the
  device as the chunk size. They also point out that multiple controllers
  should be used in order to gain parallelism.}
}

@InProceedings{waltz:database,
  author = {David L. Waltz},
  title = {Innovative Massively Parallel {AI} Applications},
  booktitle = {Proceedings of the 1993 DAGS/PC Symposium},
  year = {1993},
  month = {June},
  pages = {132--138},
  organization = {Dartmouth Institute for Advanced Graduate Studies},
  address = {Hanover, NH},
  keyword = {database, AI, artificial intelligence, pario bib},
  abstract = {Massively parallel applications must address problems that will
  be too large for workstations for the next several years, or else it will not
  make sense to expend development costs on them. Suitable applications include
  one or more of the following properties: 1) large amounts of data; 2)
  intensive computations; 3) requirement for very fast response times; 4) ways
  to trade computations for human effort, as in developing applications using
  learning methods. Most of the suitable applications that we have found come
  from the general area of very large databases. Massively parallel machines
  have proved to be important not only in being able to run large applications,
  but in accelerating development (allowing the use of simpler algorithms,
  cutting the time to test performance on realistic databases) and allowing
  many different algorithms and parameter settings to be tried and compared for
  a particular task. This paper summarizes four such applications. The
  applications described are: 1) prediction of credit card "defaulters"
  (non-payers) and "attritters" (people who didn't renew their cards) from a
  credit card database; 2) prediction of the continuation of time series, e.g.
  stock price movements; 3) automatic keyword assignment for news articles; and
  4) protein secondary structure prediction. These add to a list identified in
  an earlier paper [Waltz 90] including: 5) automatic classification of U.S.
  Census Bureau long forms, using MBR -- Memory-Based Reasoning [Creecy et al
  92, Waltz 89, Stanfill \& Waltz 86]; 6) generating catalogs for a mail order
  company that maximize expected net returns (revenues from orders minus cost
  of the catalogs and mailings) using genetically-inspired methods; and 7)
  text-based intelligent systems for information retrieval, decision support,
  etc.},
  comment = {Invited speaker.}
}

@TechReport{wilkes:datamesh,
  author = {John Wilkes},
  title = {{DataMesh} --- scope and objectives: a commentary},
  year = {1989},
  month = {July},
  number = {HP-DSD-89-44},
  institution = {Hewlett-Packard},
  keyword = {parallel I/O, distributed file system, disk caching, heterogeneous
  file system, pario bib},
  comment = {Hooks a heterogeneous set of storage devices together over a fast
  interconnect, each with its own identical processor. The whole would then act
  as a file server for a network. Data storage devices would range from fast to
  slow (e.g. optical jukebox), varying availability, {\em etc.}. Many ideas here but
  few concrete suggestions. Very little mention of algorithms they might use to
  control the thing. See also wilkes:datamesh1, cao:tickertaip, chao:datamesh,
  wilkes:houses, wilkes:lessons.}
}

@InProceedings{wilkes:datamesh1,
  author = {John Wilkes},
  title = {{DataMesh} Research Project, Phase 1},
  booktitle = {Proceedings of the Usenix File Systems Workshop},
  year = {1992},
  month = {May},
  pages = {63--69},
  keyword = {distributed file system, parallel I/O, disk scheduling, disk
  layout, pario bib},
  comment = {See chao:datamesh}
}

@Article{wilkes:houses,
  author = {John Wilkes},
  title = {{DataMesh}, house-building, and distributed systems technology},
  journal = {ACM Operating Systems Review},
  year = {1993},
  month = {April},
  volume = {27},
  number = {2},
  pages = {104--108},
  keyword = {file system, distributed computing, pario bib},
  comment = {Same as wilkes:lessons. See that for comments.}
}

@InProceedings{wilkes:lessons,
  author = {John Wilkes},
  title = {{DataMesh}, house-building, and distributed systems technology},
  booktitle = {Proceedings of the 1993 DAGS/PC Symposium},
  year = {1993},
  month = {June},
  pages = {1--5},
  organization = {Dartmouth Institute for Advanced Graduate Studies},
  address = {Hanover, NH},
  keyword = {file system, parallel I/O, RAID, disk array, pario bib},
  comment = {Invited speaker. Also appeared in ACM OSR April 1993
  (wilkes:houses). See also cao:tickertaip, chao:datamesh, wilkes:datamesh1,
  wilkes:datamesh, wilkes:houses.}
}

@InProceedings{willeman:pario,
  author = {Ray Willeman and Susan Phillips and Ron Fargason},
  title = {An Integrated Library For Parallel Processing: The Input/Output
  Component},
  booktitle = {Fourth Conference on Hypercube Concurrent Computers and
  Applications},
  year = {1989},
  pages = {573--575},
  keyword = {parallel I/O, pario bib},
  comment = {Like the CUBIX interface, in some ways. Meant for parallel access
  to non-striped (sequential) file. Self-describing format so that the reader
  can read the formatting information and distribute data accordingly.}
}

@InProceedings{witkowski:hyper-fs,
  author = {Andrew Witkowski and Kumar Chandrakumar and Greg Macchio},
  title = {Concurrent {I/O} System for the Hypercube Multiprocessor},
  booktitle = {Third Conference on Hypercube Concurrent Computers and
  Applications},
  year = {1988},
  pages = {1398--1407},
  keyword = {parallel I/O, hypercube, parallel file system, pario bib},
  comment = {Concrete system for the hypercube. Files resident on one disk
  only. Little support for cooperation except for sequentialized access to
  parts of the file, or broadcast. No mention of random-access files. I/O nodes
  are distinguished from computation nodes. I/O nodes have separate comm.
  network. No parallel access. I/O hooked to front-end too.}
}

@Article{wolman:iobench,
  author = {Barry L. Wolman and Thomas M. Olson},
  title = {{IOBENCH:} A System Independent {IO} Benchmark},
  journal = {Computer Architecture News},
  year = {1989},
  month = {September},
  volume = {17},
  number = {5},
  pages = {55--70},
  keyword = {I/O benchmark, transaction processing, pario bib},
  comment = {Not about parallel I/O, but see olson:random. Defines a new I/O
  benchmark that is fairly system-independent. Focus is for transaction
  processing systems. Cranks up many tasks (users) all doing repetitive
  read/writes for a specified time, using optional locking, and optional
  computation. Whole suite of results for comparison with others.}
}

@InProceedings{womble:pario,
  author = {David Womble and David Greenberg and Stephen Wheat and Rolf
  Riesen},
  title = {Beyond Core: Making Parallel Computer {I/O} Practical},
  booktitle = {Proceedings of the 1993 DAGS/PC Symposium},
  year = {1993},
  month = {June},
  pages = {56--63},
  organization = {Dartmouth Institute for Advanced Graduate Studies},
  address = {Hanover, NH},
  keyword = {parallel I/O, out-of-core, parallel algorithm, scientific
  computing, multiprocessor file system, pario bib},
  abstract = {The solution of Grand Challenge Problems will require
  computations which are too large to fit in the memories of even the largest
  machines. Inevitably, new designs of I/O systems will be necessary to support
  them. Through our implementations of an out-of-core LU factorization we have
  learned several important lessons about what I/O systems should be like. In
  particular we believe that the I/O system must provide the programmer with
  the ability to explicitly manage storage. One method of doing so is to have a
  partitioned secondary storage in which each processor owns a logical disk.
  Along with operating system enhancements which allow overheads such as buffer
  copying to be avoided, this sort of I/O system meets the needs of high
  performance computing.}
}

@InProceedings{wu:thrashing,
  author = {Kun-Lung Wu and Philip S. Yu and James Z. Teng},
  title = {Performance Comparison of Thrashing Control Policies for Concurrent
  Mergesorts with Parallel Prefetching},
  booktitle = {Proceedings of the 1993 ACM Sigmetrics Conference on Measurement
  and Modeling of Computer Systems},
  year = {1993},
  pages = {171--182},
  keyword = {disk prefetching, parallel I/O, disk caching, sorting, pario bib}
}


From choudhar@cat.syr.edu  Tue May 10 07:55:07 1994
Received: from cat.syr.edu (fruit.ece.syr.edu) by cs.rice.edu (AA14880); Tue, 10 May 94 07:55:07 CDT
Date: Tue, 10 May 94 08:55:04 EDT
From: choudhar@cat.syr.edu (Alok Choudhary)
Received: by cat.syr.edu (4.1/1.0-6/5/90)
	id AA03396; Tue, 10 May 94 08:55:04 EDT
Message-Id: <9405101255.AA03396@cat.syr.edu>
To: hpff-io@cs.rice.edu
Subject: time to consider options for requirements
Cc: choudhar@cat.syr.edu


We have two basic goals.

1. Writing a requirements report.

2. Coming up qith simple additions/changes in current HPF. E.G.,

    a) 64 bit pointers
    b) Out-of-core directives added to arrays to indicate large arrays
       and guiding the compiler to handle such arrays. Also, it is
       possible to specify amount of memory with this directive but
       that may be very machine dependent.

    c) Asynchronous I/O (Posix)

 We need to address 2 by next meeting for some sort of straw vote and further
 action.

 Things to consider wrt 1 above include distributions of files, mapping
 on disks etc.

 Please provide your input ASAP.

thanks
Alok

From shapiro@think.com  Tue May 10 08:46:03 1994
Received: from mail.think.com by cs.rice.edu (AA15909); Tue, 10 May 94 08:46:03 CDT
Received: from Zeus.Think.COM by mail.think.com; Tue, 10 May 94 09:46:02 -0400
From: Richard Shapiro <shapiro@think.com>
Received: by zeus.think.com (4.1/Think-1.2)
	id AA01739; Tue, 10 May 94 09:46:01 EDT
Date: Tue, 10 May 94 09:46:01 EDT
Message-Id: <9405101346.AA01739@zeus.think.com>
To: choudhar@cat.syr.edu
Cc: hpff-io@cs.rice.edu, choudhar@cat.syr.edu
In-Reply-To: Alok Choudhary's message of Tue, 10 May 94 08:55:04 EDT <9405101255.AA03396@cat.syr.edu>
Subject: time to consider options for requirements

   Date: Tue, 10 May 94 08:55:04 EDT
   From: choudhar@cat.syr.edu (Alok Choudhary)


   We have two basic goals.

   1. Writing a requirements report.

   2. Coming up qith simple additions/changes in current HPF. E.G.,

       a) 64 bit pointers

This can be handled by the KIND mechanism in F90. Why invent yet another mechanism?

       b) Out-of-core directives added to arrays to indicate large arrays
	  and guiding the compiler to handle such arrays. Also, it is
	  possible to specify amount of memory with this directive but
	  that may be very machine dependent.

       c) Asynchronous I/O (Posix)

    We need to address 2 by next meeting for some sort of straw vote and further
    action.

    Things to consider wrt 1 above include distributions of files, mapping
    on disks etc.

    Please provide your input ASAP.

   thanks
   Alok


From chk@cs.rice.edu  Tue May 10 10:55:57 1994
Received: from [128.42.1.227] by cs.rice.edu (AB19980); Tue, 10 May 94 10:55:57 CDT
Message-Id: <9405101555.AB19980@cs.rice.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Tue, 10 May 1994 10:58:19 -0600
To: hpff-io
From: chk@cs.rice.edu (Chuck Koelbel)
Subject: Forwarded message

Due to mailer problems at Rice, many people did not see this the first time
around.  Apologies to those of you who see it twice.

                                                Chuck

>Date: Tue, 10 May 94 09:46:01 EDT
>Message-Id: <9405101346.AA01739@zeus.think.com>
>To: choudhar@cat.syr.edu
>Cc: hpff-io@cs.rice.edu, choudhar@cat.syr.edu
>In-Reply-To: Alok Choudhary's message of Tue, 10 May 94 08:55:04 EDT
><9405101255.AA03396@cat.syr.edu>
>Subject: time to consider options for requirements
>
>   Date: Tue, 10 May 94 08:55:04 EDT
>   From: choudhar@cat.syr.edu (Alok Choudhary)
>
>
>   We have two basic goals.
>
>   1. Writing a requirements report.
>
>   2. Coming up qith simple additions/changes in current HPF. E.G.,
>
>       a) 64 bit pointers
>
>This can be handled by the KIND mechanism in F90. Why invent yet another
>mechanism?
>
>       b) Out-of-core directives added to arrays to indicate large arrays
>          and guiding the compiler to handle such arrays. Also, it is
>          possible to specify amount of memory with this directive but
>          that may be very machine dependent.
>
>       c) Asynchronous I/O (Posix)
>
>    We need to address 2 by next meeting for some sort of straw vote and
>further
>    action.
>
>    Things to consider wrt 1 above include distributions of files, mapping
>    on disks etc.
>
>    Please provide your input ASAP.
>
>   thanks
>   Alok
>
>
>



From chk@cs.rice.edu  Wed May 11 16:25:16 1994
Received: from [128.42.1.227] by cs.rice.edu (AB02769); Wed, 11 May 94 16:25:16 CDT
Message-Id: <9405112125.AB02769@cs.rice.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Wed, 11 May 1994 16:27:01 -0600
To: hpff-io@cs.rice.edu
From: chk@cs.rice.edu (Chuck Koelbel)
Subject: Re: time to consider options for requirements

At  8:55 5/10/94 -0400, Alok Choudhary wrote:
>We have two basic goals.
>
>1. Writing a requirements report.

See below.

>2. Coming up qith simple additions/changes in current HPF. E.G.,
>
>    a) 64 bit pointers

As has already been pointed out, KIND type parameters allow this for
Fortran 90 already.

>    b) Out-of-core directives added to arrays to indicate large arrays
>       and guiding the compiler to handle such arrays. Also, it is
>       possible to specify amount of memory with this directive but
>       that may be very machine dependent.

This does not sound like a simple addition.  For starters, what _exactly_
does an out-of-core directive mean?  Do any vendors already have one?

>    c) Asynchronous I/O (Posix)

Saying "Use Posix" would be a simple addition.  Extending Posix in some way
would not.  I'm not familiar enough with Posix to say whether adopting it
would be a good idea.  Could somebody provide a pointer to the standard, or
summarize it for the group?

In short, I don't believe there are any quick fixes for HPF I/O, except
possibly for adopting Posix asynchronous I/O.

> We need to address 2 by next meeting for some sort of straw vote and further
> action.
>
> Things to consider wrt 1 above include distributions of files, mapping
> on disks etc.
>
> Please provide your input ASAP.
>
>thanks
>Alok

I believe the following are reasonable goals for the requirements document:

0. A definition of the "parallel I/O problem"
        (or problems, if user needs are sufficiently diverse)

1. A survey of user needs

2. A survey of models for parallel I/O

3. (Assuming we agree) Recommendations:
        I/O capabilities for HPF to standardize
        One HPF-level model for describing I/O
        A high-level strategy for HPF I/O features
        (e.g. "define a new parallel file type")

Slipping in some reasons that standard Fortran I/O statements are not
sufficient would also be a good idea.

                                                Chuck



From jim@meiko.co.uk  Thu May 12 05:55:35 1994
Received: from hub.meiko.co.uk by cs.rice.edu (AA16046); Thu, 12 May 94 05:55:35 CDT
Received: from tycho.co.uk (tycho.meiko.co.uk) by hub.meiko.co.uk with SMTP id AA03676
  (5.65c/IDA-1.4.4 for chk@cs.rice.edu); Thu, 12 May 1994 11:54:27 +0100
Received: by tycho.co.uk (5.0/SMI-SVR4)
	id AA15496; Thu, 12 May 1994 11:53:59 +0000
Date: Thu, 12 May 1994 11:53:59 +0000
From: jim@meiko.co.uk (James Cownie)
Message-Id: <9405121053.AA15496@tycho.co.uk>
To: chk@cs.rice.edu
Cc: hpff-io@cs.rice.edu
In-Reply-To: <9405112125.AB02769@cs.rice.edu> (chk@cs.rice.edu)
Subject: Re: time to consider options for requirements
Content-Length: 2562


> >    c) Asynchronous I/O (Posix)
> 
> Saying "Use Posix" would be a simple addition.  Extending Posix in some way
> would not.  I'm not familiar enough with Posix to say whether adopting it
> would be a good idea.  Could somebody provide a pointer to the standard, or
> summarize it for the group?
> 
> In short, I don't believe there are any quick fixes for HPF I/O, except
> possibly for adopting Posix asynchronous I/O.

There are some potentially serious problems with binding asynchronous
operations into F90. (I have observed these dangers in MPI, but the
same thing applies here too.)

The fundamental issue is that there is no way in F90 (or in any other
language to the best of my knowledge) of expressing the fact that the
arguments to a subroutine will continue to be accessed AFTER the
subroutine has returned.

This is exactly the behaviour of all of these asynchronous operations
(e.g. Async I/O, non-blocking message passing), where one routine
"passes a pointer to the data" and then returns. The data is actually
accessed behind the programmer's back, and finally a call is made to
check that the subversive mechanism has finished with the data.

This is not such a problem in C, where the argument passing mechanisms
are low level, and defined in terms of machine level semantics and
there are no compiler generated temporaries. It is a significant issue
in F90, where the argument passing mechanisms are not defined at this
level at all.

Consider for instance, something like this

	integer ibuffer(10000)

	call start_async_op(ibuffer[1:1000:3])

	...

If start_async_op is an external routine which appears to the F90
compiler to be written in F77 (though it's probably written in C !),
the compiler will very reasonably generate a temporary to hold the
array slice, and compress the data into it. BUT the compiler is
entirely at liberty to free this temporary immediately start_async_op
returns. Unfortunately that's all the data which is being used by the
low level asynchronous mechanism at exactly the same time as it is
being freed and reused by the compiler.

I don't have a solution to this. I am flagging a warning though. Don't
think that just saying "Adopt POSIX asynchronous I/O" is necessarily
"a simple addition".

-- Jim 
James Cownie 
Meiko Limited			Meiko Inc.
650 Aztec West			130C Baker Avenue Ext.
Bristol BS12 4SD		Concord
England				MA 01742

Phone : +44 454 616171		+1 508 371 0088
FAX   : +44 454 618188		+1 508 371 7516
E-Mail: jim@meiko.co.uk   or    jim@meiko.com
WWW   : http://www.meiko.com:8080/welcome.html

From loveman@hpc.pko.dec.com  Thu May 12 07:42:35 1994
Received: from inet-gw-1.pa.dec.com by cs.rice.edu (AA16591); Thu, 12 May 94 07:42:35 CDT
Received: from ovid.hpc.pko.dec.com by inet-gw-1.pa.dec.com (5.65/21Mar94)
	id AA08900; Thu, 12 May 94 05:38:54 -0700
Received: by ovid.hpc.pko.dec.com; id AA17905; Thu, 12 May 1994 08:37:27 -0400
Message-Id: <9405121237.AA17905@ovid.hpc.pko.dec.com>
To: hpff-io@cs.rice.edu
Cc: loveman@hpc.pko.dec.com
Subject: Asynchronous I/O 
Date: Thu, 12 May 94 08:31:56 -0400
From: loveman@hpc.pko.dec.com
X-Mts: smtp


I agree with Chuck's goals for an i/o requirements document.  I would
explicitly include his tag line ("Slipping in some reasons that
standard Fortran I/O statements are not sufficient would also be a good
idea.") as a part of his item 0.  Presumably a part of the "parallel
I/O problem" is that there are things that real users need to do that
can not be done (or are difficult or awkward to do) with existing
constructs.  (I know - my Turing machine is just as powerful as your
Turing machine, but it might be easier to use.)

I think there is a requirement for asynchronous i/o, but...

1. Just because a Fortran i/o statement is synchronous in its
definition, there is no prohibition against a compiler using code
motion, etc. to move all or part of the i/o to more convenient
locations in the program.  So even if the programmer doen't directly
have a language facility, the compiler can arrange to start the i/o
early, and then later wait for its completion.

2. The "asynchronous i/o" question seem to have two parts:  surface
syntax plus user semantics, and underlying mechanism.  It might be
worthwhile to add a requirement to the tasking group that their model
allow for the effect of asynchronous i/o such as start a task
containing "ordinary" Fortran i/o, synchronizing appropriately, and
making the results available.  A second issue, for the i/o group, might
then be whether any syntactic sugaring would be a user benefit or an
aid to the compiler in identifying a particular cliche for
optimization.  For example, FORALL is really just a sugaring of a
cliched use of DO loops with !HPF$INDEPENDENT, but it sure makes life
easier for the programmer and the compiler.

-David

From choudhar@cat.syr.edu  Sun Sep 25 14:38:58 1994
Received: from cat.syr.edu (peach.ece.syr.EDU) by cs.rice.edu (AA23686); Sun, 25 Sep 94 14:38:58 CDT
Date: Sun, 25 Sep 94 15:33:37 EDT
From: choudhar@cat.syr.edu (Alok Choudhary)
Received: by cat.syr.edu (4.1/1.0-6/5/90)
	id AA16020; Sun, 25 Sep 94 15:33:37 EDT
Message-Id: <9409251933.AA16020@cat.syr.edu>
To: itf@mcs.anl.gov, pm@icase.edu, hpff-io@cs.rice.edu, zosel@llnl.gov
Subject: IO Requirements Summary
Cc: choudhar@cat.syr.edu


% I was not clear who is collecting these (Ian or Piyush?). Please provide
% feedback and suggestions. I believe these will be sections of
% and overall requirement section? Alok

\documentstyle[11pt]{article}

% Prepared by : Alok Choudhary
% To add bibliography, if needed
% Only a brief discussion of requirements and questions these requirement
% pose are presented. No specific proposal for language features
% is presented at this time.

% SUGGESTIONS TO ADD and MODIFY THE MATERIAL are most welcome. Organizational
% suggestions are welcome as well.
% 

\begin{document}

\section{A Summary of Input-Output Requirements}

Incorporating features in HPF to allow High-Performance Input-Output
is considered very important in order to provide scalable performance
for accessing secondary storage and beyond. Several basic requirements have
been identified; namely,
\begin{itemize}
\item transparent parallel accesses to files,
\item out-of-core arrays and persistent arrays,
\item real-time output,
\item checkpointing and restart,
\item an ability to stripe files over disks, and
\item asynchronous I/O for overlapping computation with I/O.
\end{itemize}

The following summarizes each of these requirements.

\subsection{Transparent Parallel Accesses to Files}

>From a user's perspective, high-performance access to files is
important, whether the underlying mechanism involves parallel accesses or
something else. It is clear, however, that given the technological trend,
extending parallelism to I/O is the only possible way to achieving
high-performance in I/O.

Parallelism in accessing files has an implication on the language 
using which the parallel accesses will be performed.
Fortran (on which HPF is based) file I/O has sequential semantics. 
There are several questions to be resolved for implementing
parallel accesses to files from HPF. Is it necessary or important
to provide an explicit mechanism to perform parallel accesses to
a file using new constructs such as parallel read and write statements?
If yes, how do these statements interact with traditional (sequential)
I/O accesses semantics?
Given that parallel access to files is needed, is it necessary to
define new types of files for this purpose? 

\subsection{Striping/Distributing Files over Disks}

Given that parallelism is crucial to achieving high-performance in I/O,
files are and will be striped over disks. The important questions
to address are what controls does a user have in specifying
such distributions. Does the user see the individual stripes (or portions)
of files from within an HPF program, and are the Fortran/HPF semantics
preserved in such a file? These questions are obviously related.
Most parallel machines implement some sort of striped file system, and
many allow control on stripe size, data distribution, number of disks etc.


To take advantage of above features, it has been argued that future extensions
in HPF allow an option of
breaking away from traditional sequential semantics of
fortran I/O, at least for some special
types of files.
This argument is similar to the one on which distribution of data
over processor memories is based in the HPF model.
Examples of these files include scratch files which
may exist during the execution of a program to store temporary data.
Furthermore, in many applications, it is desired to write data in one
distribution and read it in another (potentially by a different number of
processors). Following the current language semantics, accesses in most
cases will have to be sequentialized.

Various proposals for distributing files over disks were made during the
HPF-I round. These proposals are summarized in the Journal of Development
of the HPF-I language specification.

\subsection{Out-of-Core and Persistent Arrays}
%Model?

Many potential applications of HPF operate on large quantities of data.
Primary data structures for these applications reside on disks.
These data structures are termed as out-of-core. In order not to
restrict application developers to problem sizes that fit in the memory
of a system,
providing a way to specify out-of-core data structures from HPF
is considered important. The mechanisms for providing this feature
includes extending the directives in HPF to declare out-of-core arrays
and their distributions on disks. 

Persistent arrays are those that persist beyond a program's execution.
For example, data produced by one program is later needed by another,
may be provided using this technique. Meta data associated with these
arrays may describe distribution and access mechanism for 
persistent arrays. Mechanisms for doing this from HPF need to be considered.

\subsection{Real-Time Output}

This requirement essentially deals with needing output data in real-time
from a program while the program is executing. The main application
of this requirement is on-line visualization and interpretation
of the data produced by a program, which could be used
to check the results or status and for tuning or navigating a program's
execution (i.e., modifying input parameters to a program to guide
its execution).

This requirement is also related to the task-parallel requirements
in HPF.
One way of incorporating real-time output would be to view
the HPF program as one data-parallel task and the visualization task
as another task. This would require, therefore, communication between
two tasks, an issue being considered by the task-parallel group as well.

\subsection{Checkpointing/Restart}

A capability to checkpoint an HPF program's state and restart the
program later
(potentially on a different number of processors) is deemed quite
important.
This is especially critical in those applications that execute for
a very long time, and thus,  in those situations
the possibilities of machine or software failures
are great. Users want a capability to restart their applications later,
but from an intermediate state to avoid recomputing from the beginning.
Specifically, the checkpointing time should be minimized, which in turns
means that efficient parallel I/O for checkpointing is required.
Another important requirement is the ability to restart the
computation from a previously saved checkpoint on a different number of
processors than the number of processors on which the program was
executing when the last checkpoint was taken.
The reason for desiring this may be the availability of less number
of processors
(due to failures or some other reasons) or more number of nodes.
This has clear implications
on the language, compilation and the runtime systems. If checkpointing
information is distribution and number of processor dependent then
restarting on a different number of processors (which will have
a different distribution of data, if not in shape, definitely in size)
may be very difficult, if not impossible.

Language features or directives can help compilers determine good places
in programs to perform checkpoints. Just like users can use their knowledge
of the application domain to provide information on data mapping and
interactions using directives, users can also provide information
on which data to checkpoint and where. This can not only reduce the
burden on the compiler but it can help reduce the amount of information
to be saved in a checkpoint.

\subsection{Asynchronous I/O and Prefetching}

Given that I/O is very slow compared to the computation and even communication,
it is obviously attractive to overlap computation with I/O as much as possible.
This requires an ability to perform asynchronous I/O and 
user level prefetching of data. There are three possible approaches to
providing prefetching and asynchronous I/O capabilities; namely, runtime
libraries, language extensions or directives. 

The idea of using runtime libraries for asynchronous I/O
is similar to the usage of libraries for irregular computations
and reductions as incorporated in the HPF library. This requires the
least number of extensions to the language. However, since
for asynchronous I/O, the control returns to the program while
I/O is going on, the mechanism for signaling completion and
synchronization is not clear, and needs to be addressed.
Furthermore, ensuring that the buffers containing data for I/O
are not polluted must be resolved within the constraints of
HPF (and thus within the constraints of the base language F90).

Using directives for asynchronous I/O is another option. However,
issues of signaling and data pollution need to be resolved here
as well. The following example illustrates the use of directives.
The basic idea is for the user to specify a region in which 
the user guarantees not to access the buffers involved in the I/O.
Thus, asynchronous I/O may be performed while computation within
the region is going on. If the I/O does not finish within this
region, then a mechanism is required to block further computation
until the I/O completes.

\begin{verbatim}

      READ (7,30) A(:,:), Q,Z
!HPF$ I PROMISE NOT TO TOUCH THE I/O VARIABLES
C Specifying that the variables involved in the preceding read operation
C will not be accessed between this directive and its corresponding
C end of region directive

      B=TRANSPOSE (MATMUL(C<D))
      CALL TEDIOUS CRUNCHING
!HPF$   NOT!
C A directive to specify end of user guarantee.
      z= sum (A) *z

\end{verbatim}

There are many open issues to be addressed if this mechanism is to work.
These include, how to handle error conditions? if the I/O
is performed in subroutines and control returns to the calling
routine, how to specify completion of I/O or how to inquire
about completion etc?

\end{document}

From dfk@wildcat.dartmouth.edu  Sat Feb  4 12:36:18 1995
Received: from wildcat.dartmouth.edu by cs.rice.edu (AA26344); Sat, 4 Feb 95 12:36:18 CST
Received: from localhost by wildcat.dartmouth.edu (8.6.5/4.2)
	id NAA19662; Sat, 4 Feb 1995 13:36:17 -0500
Date: Sat, 4 Feb 1995 13:36:17 -0500
From: dfk@wildcat.dartmouth.edu (David Kotz)
Message-Id: <199502041836.NAA19662@wildcat.dartmouth.edu>
To: scale-io@delilah.ccsf.caltech.edu, pio@nersc.gov, ciosig@ssd.intel.com,
        hpff-io@cs.rice.edu
Subject: parallel I/O bibliography updated

Hello parallel-I/O friends,

[Sorry if you receive this via multiple lists.]

I have just released the sixth edition of my parallel-I/O
bibliography.  You can find it in the usual places: 
in HTML or BibTeX form at URL 
	http://www.cs.dartmouth.edu/cs_archive/pario/bib.html
or in BibTeX form for anonymous ftp at
	ftp.cs.dartmouth.edu  as  pub/pario/pario.bib

As always, the parallel-I/O web page is a handy place to find things:
	http://www.cs.dartmouth.edu/pario.html
This includes a "what's new" page that is easy to check periodically.

Thanks for all your contributions.

Since I last wrote to you all I have released some
other items of interest:
 - simulation code for HP 97560 SCSI disk drive
 - STARFISH, my disk-directed I/O simulator (runs on DECstations)
 - RAPID-Transit simulator, which I used 1987-1991 for
	prefetching and caching studies (runs on the BBN Butterfly)
 - full version of disk-directed I/O paper (TOCS submission), kotz:jdiskdir
 - a paper on using disk-directed I/O for out-of-core LU-decomposition, kotz:lu
 - a paper sketching a new 'nested-strided' interface, nieuwejaar:strided
 - full version of the CHARISMA tracing results for the CM-5, ap:workload
 - full version of the CHARISMA tracing results for the iPSC, kotz:workload

You can find all of these things through my web page below, or by
contacting me.

See you around.

-dave

-----------------
Department of Computer Science    [on sabbatical at Syracuse University]
Dartmouth College, 6211 Sudikoff Laboratory, Hanover NH 03755-3510
email dfk@cs.dartmouth.edu   URL http://www.cs.dartmouth.edu/~dfk/

From help@cs.rice.edu  Sun Feb 26 11:53:43 1995
Return-Path: <dfk@wildcat.dartmouth.edu>
Received: from wildcat.dartmouth.edu by cs.rice.edu (LAA29433); Sun, 26 Feb 1995 11:53:42 -0600
Received: by wildcat.dartmouth.edu (8.6.10/4.2)
	id MAA20803; Sun, 26 Feb 1995 12:53:50 -0500
Date: Sun, 26 Feb 1995 12:53:50 -0500
From: dfk@wildcat.dartmouth.edu (David Kotz)
Message-Id: <199502261753.MAA20803@wildcat.dartmouth.edu>
To: hpff-io@cs.rice.edu
Subject: new parallel-I/O mailing list

I'd like to invite you to join a new parallel-I/O mailing list.
Several lists about parallel I/O exist, but they are all specific to
some effort (MPI, HPF, parallel transport protocol, etc), but there is
nothing for general research discussions.  This new list is intended
for individuals interested in research about parallel-I/O systems and
algorithms, including file systems, architectures, out-of-core
algorithms, compiler techniques, language extensions, run-time
support, APIs, application requirements and workload characterization,
disk arrays, tape striping, and perhaps graphics and networking.  I
hope people will use it to announce new papers, projects, web pages,
workshops, software packages, and so forth.  I also expect people will
use it as a way to discuss topic of common interest.  It is
unmoderated, so the usual rules of courtesy apply.  Please note that
lots of relevant information can be found at

                http://www.cs.dartmouth.edu/pario.html


To subscribe, send a message to majordomo@dartmouth.edu with the
following BODY (Subject ignored):
	subscribe parallel-io

To post a message to the list, just mail it to parallel-io@dartmouth.edu

No archive is currently planned, but might be considered.

[Apologies to those who see this invitation several times.]

Thanks,

David Kotz
Assistant Professor
Department of Computer Science
Dartmouth College
6211 Sudikoff Laboratory
Hanover, NH  03755-3510 USA
email: dfk@cs.dartmouth.edu
URL: http://www.cs.dartmouth.edu/~dfk/
603-646-1439


From help@cs.rice.edu  Wed Jul 12 01:24:21 1995
Return-Path: <jdwang@ds2.cs.ccu.edu.tw>
Received: from cs.ccu.edu.tw by cs.rice.edu (BAA21664); Wed, 12 Jul 1995 01:22:04 -0500
Received: from ds2.cs.ccu.edu.tw by cs.ccu.edu.tw (4.1/SMI-4.1)
	id AA08553; Wed, 12 Jul 95 14:06:00 CST
Received: from ds3.cs.ccu.edu.tw by ds2.cs.ccu.edu.tw (4.1/SMI-4.1)
	id AA00458; Wed, 12 Jul 95 14:05:58 CST
Date: Wed, 12 Jul 95 14:05:58 CST
From: jdwang@ds2.cs.ccu.edu.tw (Wang Jing-Do)
Message-Id: <9507120605.AA00458@ds2.cs.ccu.edu.tw>
To: hpff-io@cs.rice.edu

subscribe

