A DATA DEFINITION AND MAPPING LANGUAGE FOR ...
A DATA DEFINITION AND MAPPING LANGUAGE FOR ...
A DATA DEFINITION AND MAPPING LANGUAGE FOR ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
A <strong>DATA</strong> <strong>DEFINITION</strong> <strong>AND</strong> <strong>MAPPING</strong> <strong>LANGUAGE</strong> <strong>FOR</strong> NUMERICAL <strong>DATA</strong> BASES<br />
Ola-Olu A. Dainl and Peter Scheuermann<br />
Electrical Engineering and Computer Science Department<br />
Northwestern University<br />
Evanston, Illinois 60201<br />
Abstract<br />
Numerical data bases arise in many scientific<br />
applications to keep track of large sparse and<br />
dense matrices. Unlike the many matrix data storage<br />
techniques available for incore manipulation,<br />
very large matrices are currently limited to a few<br />
compact storage schemes on secondary devices, due<br />
to the complex underlying data management facilities.<br />
This paper proposes an approach for generalized<br />
numerical database management that would promote<br />
physical data independence by relieving users<br />
from the need for knowledge of the physical data<br />
organization on the secondary devices.<br />
Our approach is to describe each of the storage<br />
techniques for dense and sparse matrices by a<br />
physical schema, which encompasses the corresponding<br />
access path, the encoding to storage structures,<br />
and the file access method. A generalized<br />
facility for describing any kind of numerical database<br />
and its mapping to storage is provided via<br />
nonprocedural Stored-Data Description and Mapping<br />
Languages (SDDL and SDML). The languages are processed<br />
by a Generalized Syntax-Directed Translation<br />
Scheme (GSDTS) to automatically generate <strong>FOR</strong>TRAN<br />
conversion programs for creating or translating numerical<br />
database from one compact storage scheme<br />
to another. The feasibility of the generalized approach<br />
with regard to our current implementation<br />
is also discussed.<br />
I. Introduction<br />
The problem of storage representation for<br />
dense/sparse matrices in main core, in order to<br />
optimize storage costs or processing time, has received<br />
considerable attention in literature [6,10,<br />
* This research is supported by the University<br />
of Ife, lle-lfe, Nigeria.<br />
**On study leave from Computer Science Department,<br />
University of Ire, Nigeria.<br />
Permission to copy without fee all or part of this material is granted<br />
provided that the copies are not made or distributed for direct<br />
commercial advantage, the ACM copyright notice and the title of the<br />
publication and its date appear, and notice is given that copying is by<br />
permission of the Association for Computing Machinery. To copy<br />
otherwise, or to republish, requires a fee and/or specific permission.<br />
©1980 ACM 0-89791-028-1/80/1000/0418 $00.75<br />
12]. A variety of compact storage schemes have<br />
been developed and facilities for incore data manipulation<br />
using these schemes are available in a<br />
number of the software packages currently in use<br />
at any computing center. However, only a few matrix<br />
compact storage schemes are currently being<br />
implemented for the manipulation of large dense or<br />
sparse matrices residing on secondary devices and<br />
these are not readily available [3,8,9]. This is<br />
due to the fact that some of these methods employ<br />
quite complex data structures, such as threaded<br />
linked lists [Ii], which require complex programs<br />
for their implementation on secondary devices.<br />
In addition, there is also the added difficulty to<br />
an application user in accessing the compact matrix<br />
data residing on secondary devices.<br />
Numerical data bases refer to data bases necessary<br />
to process numerical applications, that are<br />
residing on secondary storage devices in matrix<br />
compact storage forms. A numerical application<br />
database may consist of from one to three interrelated<br />
set of files because pseudo data e.g. distance<br />
from the diagonal and row beginning in the<br />
data item vector, is usually kept on separate<br />
files from the data item file. In addition, the<br />
set of files may also be processed by different<br />
file access methods e.g. sequential for pseudo data<br />
file, and indexed sequential or direct for the<br />
index and data item files.<br />
While there recently have been important advances<br />
in the use of very large data bases in commercial<br />
applications, little has been done in the<br />
area of numerical applications because the current<br />
facilities of database management systems (DBMS)<br />
are not suitable for processing numerical data<br />
bases in the majority of the matrix compact storage<br />
schemes. In order to address this problem,<br />
there is a need for a generalized approach to numerical<br />
database management whereby the numerical<br />
application users have facilities for data definition<br />
and mapping as well as data access to numerical<br />
data bases in any matrix compact storage<br />
scheme by means of simple hlgh-level nonprocedural<br />
languages that relieve them from the need for<br />
knowledge of low-level details of physical implementation.<br />
The main advantage of data definition and<br />
mapping facilities is that the information that<br />
usually resides in an application program on any<br />
storage structure is removed into a schema which<br />
provides information on the physical storage or-<br />
418
ganization and its mapping interface to the operating<br />
system such that the user only provides infonm<br />
ation about logical data descriptions. These facilities<br />
are usually provided by a data definition<br />
language (DDL) or by stored-data description and<br />
mapping languages (SDDL and SDML) [7]. Similarly,<br />
data access facilities are provided by a data manipulation<br />
language (DML) which promotes physical<br />
data independence.<br />
Our investigation of data language facilities<br />
reported in [2,7,13,17,18] reveals that none<br />
are suitable for numerical data management, which<br />
usually requires different kinds of indexing and<br />
ordering capabilities. Therefore, we have designed<br />
the data language facilities (SDDL, SDML, and<br />
DML) which can provide a generalized approach to<br />
numerical database management. In view of the<br />
limited number of compact storage schemes currently<br />
in use for numerical database management, we<br />
are implementing a generalized data translator<br />
that will automatically restructure any numerical<br />
database from one compact storage scheme to another<br />
by means of SDDL and SDML facilities. This<br />
satisfies an important goal of data portability,<br />
and in addition the methodology developed for the<br />
data translator is an essential part for the support<br />
of a DML, which will be implemented in the<br />
second phase of our project.<br />
Our current approach provides the following<br />
features:<br />
i. Each dense or sparse matrix compact storage<br />
scheme can be described by a physical<br />
schema, which comprises the corresponding<br />
data access path, the encoding to storage<br />
structures and the file access method.<br />
2. A generalized facility for describing any<br />
kind of numerical database and its mapping<br />
to secondary storage, i.e. the physical<br />
schema, is provided via nonprocedural<br />
Stored-Data Description and Mapping Languages<br />
(SDDL and SDML).<br />
3. A generalized data translator that will<br />
enable application users to create or to<br />
restructure their numerical database from<br />
one compact storage scheme to another, by<br />
supplying the SDDL and SDML statements of<br />
the source and target database descriptlons.<br />
We begin by describing some relevant concepts<br />
from numerical analysis and DBMS in Section 2.<br />
Next, the numerical physical schemas and the SDDL<br />
and SDML facilities are described in Sections 3<br />
and 4. The feasibility of the SDDL and SDML in<br />
numerical database management and their implementation<br />
by a Generalized Syntax-Directed Translation<br />
Scheme (GSDTS) as part of our generalized data<br />
translator is discussed in Section 5.<br />
2. Overvlew of Numerical Analysis and DBMS<br />
Concepts<br />
Numerical data are usually generated in both<br />
quantitative and qualitative problem solving operations<br />
in the social sciences, physical sciences,<br />
engineering, etc. Numerical application data usu-<br />
ally corresponds to dense or sparse matrices, and<br />
any such data necessary to process a numerical application<br />
which is residing on secondary storage<br />
is called here a nmuerlcal database. We discuss<br />
matrix features which provide guidelines towards<br />
minimization of storage space and storage data<br />
representation as well as DBMS concepts, such as<br />
schema and the data language facilities, which enable<br />
our generalized approach.<br />
2.1. Dense and Sparse Matrix Compact Storage<br />
Schemes<br />
Two major types of matrices, dense and sparse<br />
matrices, will be considered. A dense matrix has<br />
a high proportion of nonzero elements, while a<br />
sparse matrix has a few nonzero elements. The two<br />
basic features for promoting compact matrix storeddata<br />
are symmetry and bandwidth. Different compact<br />
storage schemes for synmnetrlc and band matrices<br />
as well as several other sparse matrix indexing<br />
schemes are identified in literature [6,<br />
9-12]. These compact storage schemes are described<br />
by the corresponding numerical physical<br />
schemas in our generalized approach, as is described<br />
in section 3.<br />
2.2. Schema<br />
The term schema was originally coined in connection<br />
with the logical database description,<br />
i.e. the definition of the objects, roles and<br />
properties of interest to a given enterprise. The<br />
term was first brought into usage by the CODASYL<br />
Database Task Group [4,5]. However, it is now<br />
used in a broader sense to stand for data descriptions<br />
in database systems at the logical or physical<br />
level. Since for our numerical databases the<br />
logical structure is relatively simple, the role<br />
of the physical schema which describes the mapping<br />
to storage becomes predominant. Each type of<br />
matrix data organization, such as a square, lower<br />
triangular or band matrix, could be viewed as<br />
corresponding to a logical schema, while any compact<br />
storage scheme can be viewed as a storage<br />
model with a corresponding physical schema. The<br />
physical schema describes completely the mapping<br />
to storage in terms of: (I) access path organization,<br />
(2) encoding of storage structures, and<br />
(3) operating system accessing methods [15].<br />
2.3. Data Languase Facilities<br />
Data definition and mapping facilities are<br />
important features of a DBMS which support the<br />
concept of data independence. These facilities<br />
are provided either in the form of a self-contained<br />
language llke the data definition language<br />
(DDL) or as two languages which are a stored-data<br />
description language (SDDL) and a stored-data<br />
mapping language (SDML). A DDL is generally a<br />
declarative language for specifying logical data<br />
structures and a data mapping language specifies<br />
the mapping of the logical data structure to the<br />
storage space.<br />
The Database Task Group in [5] proposed a<br />
schema DDL as a language for defining a data model<br />
together with its mapping to storage so that<br />
it would meet the requirements of many distinct<br />
progran~ning languages. Another CODASYL group,<br />
419
the Stored-Data Definition and Translation Task<br />
Group [7], proposed a stored-data and data translation<br />
model and language for describing and translating<br />
among a wide class of logical and physical<br />
structures. Additional data definition and mapping<br />
languages have been proposed, with prototype<br />
implementations, for database reorganization, e.g.<br />
[2,13,17,18]. The language facilities are usually<br />
designed for operating on the traditional database<br />
schemas of relational, hierarchical and network<br />
data models.<br />
The matrix compact storage schemes which represent<br />
our model cannot be suitably defined using<br />
the data language facilities mentioned above because<br />
of the requirements for different kinds of<br />
indexing and data ordering capabilities. Therefore,<br />
we decided to develop nonprocedural storeddata<br />
description and mapping languages (SDDL and<br />
SDML) which provide a generalized approach for<br />
describing and mapping any numerical database to<br />
secondary storage. The two languages are discussed<br />
in section 4.<br />
Another important feature of a DBMS is a data<br />
manipulation language (DML) which provides the interface<br />
between the application users and the DBMS<br />
via a set of higher-level commands. We have designed<br />
a DML which contains commands embedded in<br />
<strong>FOR</strong>TRAN, corresponding to the operation performed<br />
on numerical databases. However, the DML will not<br />
be discussed further, since its implementation<br />
will be considered only in a future project.<br />
3. Numerical Physical Schemas<br />
As we mentioned previously, the various storage<br />
techniques for dense and sparse matrices suggested<br />
in literature can be represented by a corresponding<br />
physical schema, which depicts not only<br />
the access path, but also the encoding of storage<br />
structures and the file access method. In order<br />
to generalize the description of the physical<br />
schemas, we investigated their access paths for<br />
similarities. Our investigation reveals three<br />
groups which have direct, indirect and linked access<br />
paths respectively. The direct access path<br />
corresponds to dense array realization, the indirect<br />
to the technique of going through an index to<br />
access a data item (non-zero element) and the<br />
linked to the technique of accessing a data item<br />
through other data items connected to it by pointers.<br />
Formal definitions of the access paths will<br />
be presented later.<br />
Since in our case the access paths are closely<br />
related to the actual encodings of the storage<br />
structures, which specify mappings into a linear<br />
address space [15], we identify the groups as direct,<br />
indirect and linked encodlngs respectively.<br />
We shall assume that in our approach the linear<br />
address space refers to storage space on secondary<br />
devices.<br />
3.1. Direct Encoding Group<br />
Numerical physical schemas in this group describe<br />
compact storage schemes for dense matrices.<br />
Their logical schema comprises the dense m x n,<br />
lower-/upper-triangular or band matrices, and<br />
their storage is either an m x n matrix or a vect-<br />
or. The stored-data organization is in row or<br />
column major order and the access path is direct.<br />
Each of these storage schemes requires a single<br />
external file and those with a non-synmaetrlc dataset<br />
are usually processed by a sequential file access<br />
method. But indexed sequential or direct file<br />
access methods may be appropriate for symmetric<br />
matrices in order to reduce the access time involved<br />
in reconstructing the data items for a row/<br />
column. We identify the following storage schemes<br />
in this category (albeit, close to their logical<br />
counterparts).<br />
i. address-polynomial (regular m x n matrix)<br />
2. lower- or upper-trlangular<br />
3. symmetrlc-band<br />
4. nonsynmnetrlc-band<br />
An illustration of one of theschemas is shown<br />
below in Figure I.<br />
42000<br />
35600<br />
01430<br />
00917<br />
00023<br />
Source<br />
Dataset<br />
Logical<br />
Schema<br />
1356<br />
1143<br />
~917<br />
230<br />
Storage<br />
Scheme<br />
Figure i - Dense nonsymnetric-band matrix data<br />
structure.<br />
The group's access path is direct because the<br />
search technique uses computed-access array storage<br />
mapping which is defined as follows [14]:<br />
Definition: Let N denote the set of positive integers<br />
and A be a two-dimenslonal array scheme. A<br />
computed access storage mapping for A is a total<br />
function f: N x N 4 N such that: (I) f(l,l) = I,<br />
and (2) f is one-to-one on array scheme A.<br />
3.2. Indirect Encoding Group<br />
This group of numerical physical schemas describe<br />
the storage structures for all the sparse<br />
matrix indexing techniques whose access paths include<br />
reference data separately from the data-items<br />
themselves. Their logical schema is a m x n<br />
sparse matrix or a lower/upper diagonal matrix.<br />
Their storage scheme consists of vectors of data<br />
items, i.e., the non-zero elements, in row or columnmaJor<br />
order, with corresponding row and/or<br />
column indices and/or reference data. Reference<br />
data, i.e. pseudo data, refers to the location of<br />
data items within the source matrix; row/column<br />
beginning in the data item vector; or distance<br />
from the diagonal. These schemas usually require<br />
interrelated sets of two or three files respectively<br />
and their choice of file access method depends<br />
on the type of expected row/column retrieval.<br />
For sequential row/column retrieval, a sequential<br />
file access method is adequate; for random row/column<br />
retrieval, we can choose either indexed sequential/dlrect<br />
for all files or a combination of sequential<br />
for reference data file and indexed/dlrect<br />
for index and data item files. The schemas<br />
we identify in this group are:<br />
420
I. slngle-lndexlng<br />
2. double-lndexlng-I (row-column-I)<br />
3. double-lndexlng-2 (row-column-2)<br />
4. blt-map<br />
5. address-map<br />
An illustration of one of the schemas is shown<br />
in Figure 2.<br />
1234 i<br />
i 0 0 2 3 1 ~ ~i Row beginning in<br />
0 0 3 0 data item vector<br />
0400<br />
50 I 2 ~ 143 2 I 34 ~ j Column index<br />
i ! vector<br />
Logical<br />
Schema<br />
123 4567 ~j<br />
i<br />
I 12 3 4 5 1 2 Data item<br />
I<br />
vector<br />
123 4567 M(i,J)<br />
Storage Scheme<br />
Figure 2 - Double-indexing-2 (Row column-2)<br />
Their access path is indirect because the<br />
search technique uses a composite storage mapping<br />
which may be defined by the following [ii]:<br />
Definition: Let i and j represent the row and column<br />
data item subscripts; M(i,J)--data item location;<br />
~.--beginnlng relative address of indices<br />
i<br />
for row i; and n.--relatlve address of element 3 in<br />
column index vector as illustrated in column Figure<br />
2. Data ordering is assumed rowwise, for columnwlse<br />
ordering we can just interchange i and j.<br />
Let f represent any storage mapping function such<br />
that f(1) = ~i" A search function, ~f, is defined<br />
as follows:<br />
(~f) (j,~i) = ~j, iff f(~j) = j;<br />
and V ~j' s.t.<br />
!<br />
~i ~ ~j < ~j' f(~j' ) ~ j"<br />
= @, iff V ~j ¢ N, f(~j) ~ j.<br />
A composite mapping function, h, on a search function,<br />
~. is defined as follows: h(~f(j, f(i))) =<br />
M(i,j). r<br />
3.3. Linked Encodln~ Group<br />
The linked encoding group consists of numerical<br />
physical schemas for all the sparse indexing<br />
schemes with linked llst data structures.<br />
Their logical schema is the m x n sparse matrix<br />
and their storage scheme consists of lists of nodes.<br />
Each node has a format which might consist of data<br />
item, row and column indices and pointer fields.<br />
The schemas usually require a single file with indexed<br />
sequential or direct file access method.<br />
These schemas are further classified as:<br />
I. llnear-llnked-llst<br />
2. doubly-llnked-list<br />
3. threaded-linked-llst<br />
Figure 3 shows an illustration of such one of them.<br />
Their access path is called linked because the<br />
search technique uses a mapping defined through<br />
pointer linkage.<br />
It may be defined as follows:<br />
Definition: Let D = (X, R) be a storage structure<br />
with nodes xl, ..., x_ and relations (r,, ro) c R<br />
such that rl-represen~s a row equlvalen~e relatlon<br />
and r 2 represents a column equivalence relation.<br />
In adaltlon, let ~x I represent the address of node<br />
x.; k.x--the value of ith pointer field of node x<br />
l l r<br />
i.e, row pointer value; k4x--value of Jth polnte<br />
field of node x i.e., col6mn pointer value; X/rl--<br />
row equivalence class and X/r2--column equivalence<br />
class. A linked mapping is a linked realization<br />
of a relation from the header pointer node, if at<br />
least one of the following holds:<br />
I. The relation r I is realized as a linked<br />
structure (rel~tive to the ith pointer<br />
field) i,e., for every pair of nodes<br />
(x.,i x^)sz X/rl' ~xp ¢ k~x I holds, or<br />
similarly r 2 Is realizes ~s a linked<br />
structure.<br />
2. If for every ordered three nodes such that (xl, x~) c X/r. and (~., ~.)<br />
¢ X/r2, ~x2¢ ~ix~ and nx 3 ¢ ~jx I hol~.<br />
In addition, it is possible that the relation r is<br />
realized as a linked structure and the end node x<br />
points to the header node x', i.e. kx n = ~x'. n<br />
4. Data Lansuage Facilities<br />
The data language facilities provide a generalized<br />
approach for describing any numerical database<br />
and its mapping to storage. They consist<br />
of a stored-data description language (SDDL) and a<br />
stored-data mapping language (SDML). The two<br />
languages are similar to other data definition and<br />
mapping languages [7,17,18]. We have attempted as<br />
much as possible to make them user friendly, by<br />
including simple, self-explanatory language const,<br />
ructs. The choice of only one of the alternatives<br />
is represented by [] (braces) and an optional<br />
phrase by [] (square brackets). Language keywords<br />
appear in capital letters and user-defined words<br />
in lower case. Sample SDDL and SDML statements<br />
of both source and target numerical databases are<br />
shown in Figures 4 and 4.1 respectively. Other<br />
features of the two languages will be revealed as<br />
they are described below.<br />
4.1. Stored~Data Description Language (SDDL)<br />
The SDDL is intended mainly for the user to<br />
describe the logical characteristics of his numerical<br />
database and the associated type of file organization<br />
on secondary storage devices, or alternatively<br />
the card input-fornlst. Therefore, the<br />
language is divided into three parts which are<br />
(I) matrix structure, (2) file control, and (3)<br />
input format.<br />
The matrix structure describes the logical<br />
characteristics of the data and it also indicates<br />
if dynamic storage management is required. The<br />
basic matrix format is specified using the selfexplanatory<br />
keywords: ~DENSE ~ {SYMMETRIC ~,and<br />
~SPARSEy~ONSYMMETRIC 3<br />
B<strong>AND</strong>ED ~. If the matrix is symmetric, the<br />
ONB<strong>AND</strong>EDJ statement will include~UPPER-DIAGONAL~<br />
~LOWER-DIAGONAL~<br />
in order to specify the partition of the dataset<br />
421
i 0 0 2<br />
0 0 3 0<br />
0 4 0 0<br />
5 0 I 2<br />
Logical Schema<br />
<br />
D---~iII, I i1115121<br />
E~<br />
:"I 1412171oi<br />
1<br />
[i]------~ I, I ,I~I°I 0L<br />
Storage Scheme<br />
!<br />
-~L' 71 ~ b121o Ioi<br />
Node<br />
Format<br />
Node Row I Column I Data<br />
Key Index Index Item<br />
Column<br />
Node<br />
Pointer<br />
Row<br />
Node<br />
Pointer<br />
Figure 3.<br />
Doubly-llnked-llst<br />
422
<strong>DATA</strong>-DESCRIPTION:<br />
MATRIX-STRUCTURE:<br />
TYPE = SPARSE, NONSYMMETRIC, STATIC:<br />
FILE-CONTROL:<br />
TYPE<br />
= SOURCE;<br />
FILE-UNIT = 21, 22, 23;<br />
MEDIUM = DISK;<br />
RECORD: REC-KEY = integer;<br />
SIZE = 512, FIXED, UNBLOCKED;<br />
<strong>DATA</strong>-<strong>MAPPING</strong> (double-indexing-2);<br />
ACCESS-PATH-ENCODING:<br />
ACCESS-PATH = INDIRECT-ENCODING<br />
(REF-<strong>DATA</strong>-ORG);<br />
INDIRECT-ENCODING:<br />
REF-<strong>DATA</strong>-ORG: (REF-ORG-i,<br />
REF-ORG-2, <strong>DATA</strong>-ORG);<br />
REF-ORG-I: SET(LOC);<br />
LOC: integer, TYPE = ROW BEGIN-<br />
ING;<br />
REF-ORG-2: SET(INDEX);<br />
INDEX: integer, TYPE = COLUMN<br />
INDEX;<br />
<strong>DATA</strong>-ORG: DIMENSION = (5000,5000);<br />
ORDERING = ROWWISE;<br />
SET(<strong>DATA</strong>-ITEM);<br />
<strong>DATA</strong>-ITEM: real, REAL-PRECISION<br />
= DOUBLE;<br />
ENCODED-FILE:<br />
FILE-NAME = datfile,lndfile,locfile;<br />
ORGANIZATION = R<strong>AND</strong>OM,R<strong>AND</strong>OM, SEQUENTIAL;<br />
ENCODED-<strong>DATA</strong> = <strong>DATA</strong>-ORG, REF-ORG-2,<br />
REF-ORG-I;<br />
Figure 4.<br />
Sample SDDL & SDML statements of a<br />
source numerical database for a<br />
double-lndex-2 schema.<br />
to be processed. Similarly, a bandwidth statement<br />
which specifies the size of the band is required<br />
for a band matrix and a density statement giving<br />
an estimated density of a sparse matrix is necessary<br />
for creating a database with random file organization.<br />
Some statements in the matrix structure<br />
section are shown in the example below.<br />
MATRIX-STRUCTURE:<br />
TYPE = SPARSE, B<strong>AND</strong>ED, SYMMETRIC, LOWER-<br />
DIAGONAL, STATIC;<br />
B<strong>AND</strong>WIDTH = (250, 250);<br />
The file control specifies the file organization<br />
of a numerical database already residing<br />
on a secondary device or to be created, by listing<br />
the type of file, device medium, file unit etc.<br />
The file control statements depend on the device<br />
medi~m~ selected for processing as specified by the<br />
device medium keyword, CARD, TAPE, or DISK. If data<br />
is to be processed from card input stream, only<br />
the file-type, file-unit and device-medlum statements<br />
are required, but in addition to these three<br />
statements, both disk and tape files require record<br />
statements.<br />
The file-type statement identifies the source/<br />
target file and the file-unlt statement gives a<br />
set of <strong>FOR</strong>TRAN READ/WRITE unit numbers for processing<br />
the files in the database. The record statement<br />
lists the record properties llke record-slze,<br />
<strong>DATA</strong>-DESCRIPTION:<br />
MATRIX-STRUCTURE:<br />
TYPE = SPARSE, NONSYMMETRIC, STATIC;<br />
FILE-CONTROL:<br />
TYPE = TARGET;<br />
FILE-UNIT = 4;<br />
MEDIUM = DISK;<br />
RECORD: REC-KEY = integer;<br />
SIZE = 1024, FIXED, UNBLOCKED;<br />
<strong>DATA</strong>-<strong>MAPPING</strong>: (doubly-linked-list);<br />
ACCESS-PATH-ENCODING:<br />
ACCESS-PATH = LINKED-ENCODING:<br />
(LINKED-<strong>DATA</strong>-ORG);<br />
LINKED-<strong>DATA</strong>-ORG: (COL-HEAD-NODE,<br />
ROW-HEAD-NODE,<br />
<strong>DATA</strong>-ITEM-NODE);<br />
COL-HEAD-NODE: (PTR-ITEM,FIELD-<br />
LINKAGE);<br />
PTR-ITEM: integer, TYPE = COL PTR;<br />
FIELD-LINKAGE = FIRST COL NODE;<br />
ROW-HEAD-NODE: (PTR-ITEM, FIELD-<br />
LINKAGE) ;<br />
PTR-ITEM: integer, TYPE = ROW PTR;<br />
FIELD-LINKAGE = FIRST ROW NODE;<br />
<strong>DATA</strong>-ITEM-NODE: (KEY-FIELD, ROW-FIELD,<br />
COL-FIELD,<br />
<strong>DATA</strong>-FIELD, COL-PTR-<br />
FIELD, ROW-PTR-FIELD);<br />
KEY-FIELD: NODE-KEY = integer;<br />
ROW-FIELD: REF-ITEM = INDEX;<br />
INDEX: integer, TYPE = ROW<br />
INDEX;<br />
COL-FIELD: INDEX: integer, TYPE=<br />
COL INDEX;<br />
<strong>DATA</strong>-FIELD: ORDERING = NONE;<br />
<strong>DATA</strong>-ITEM = real, REAL-<br />
PRECISION;<br />
REAL-PRECISION = DOUBLE;<br />
COL-PTR-FIELD: PTR-ITEM, FIELD-<br />
LINKAGE;<br />
PTR-ITEM: integer, TYPE =<br />
COL PTR;<br />
FIELD-LINKAGE = NEXT COL NODE;<br />
ROW-PTR-FIELD: PTR-ITEM, FIELD-<br />
LINKAGE;<br />
PTR-ITEM: integer, TYPE =<br />
ROW PTR;<br />
FIELD-LINKAGE = NEXT ROW NODE<br />
ENCODED-FILE:<br />
FILE-NAME = NODFILE;<br />
ORGANIZATION = R<strong>AND</strong>OM;<br />
ENCODED-<strong>DATA</strong> = SET(LINKED-<strong>DATA</strong>-ORG);<br />
Figure 4.1.<br />
FIXED<br />
Sample SDDL & SDML statements of a<br />
target numerical database for a<br />
doubly-llnked-llst schema.<br />
IBLO KED<br />
VARIABL~and[UNBLOCKEDJ- In addition, the file<br />
control section may include any of the following<br />
optional statements: (I) a record-key statement<br />
to specify either integer or alphanumeric key<br />
for random file organization; (2) a block-size<br />
statement required for blocked records; and (3)<br />
a format statement (similar to <strong>FOR</strong>TRAN) for formatted<br />
records. Some of these statements are<br />
illustrated under FILE-CONTROL in figure 4.<br />
423
The input-format section provides facilities<br />
for processing unstructured database from cards.<br />
The section is comprised of the dimension, the data<br />
ordering and format statements respectively.<br />
The dimension statement, shown below, specifies<br />
the numbers of<br />
DIMENSION= SROW ~, integer,~COLUMN~, integer;<br />
COLUMN)<br />
[ROW<br />
rows and columns in the matrix. The data ordering<br />
statement specifies a rowwise/columnwlse/none ordering.<br />
The data-format statement:<br />
(SRARSE- YPE-q<br />
<strong>DATA</strong>-<strong>FOR</strong>MAT=~SPARSE-TYPE-21;<br />
(DENSE<br />
J<br />
gives users three choices of format specifications.<br />
Both SPARSE-TYPE-I and SPARSE-TYPE-2 are for sparse<br />
matrix input format specifications of only nonzero<br />
elements and the DENSE is for all the matrix elements.<br />
SPARSE-TYPE-i is for an ordered input data so<br />
that a row or column input data stream is processed<br />
at a time. As shown below,<br />
SPARSE-TYPE-i: CONTROL-<strong>DATA</strong> = ~ROW ~ data-type;<br />
ICOLU~NJ '<br />
<strong>FOR</strong>MAT = SET(data-type,<br />
data-type);<br />
it requires a control data to specify the row or<br />
column to be processed so that the format becomes<br />
a set of pairs of column/row and data item datatypes.<br />
A data-type is any valid <strong>FOR</strong>TRAN format<br />
specification for spacing, alphanumeric, integer<br />
or real variable e.g. 5X, 16, FIO.4 and E20.12.<br />
SPARSE-TYPE-2 is for an unordered input data<br />
so that the format is a set of row, column, and<br />
data item data-types as follows:<br />
SPARSE-TYPE-2 = SET([ROW], data-type,<br />
[COLUMN], data-type,<br />
data-type);<br />
Finally, DENSE = SET(data-type); provides for<br />
a set of regular <strong>FOR</strong>TRAN-type format specifications.<br />
An example of a SPARSE-TYPE-I input format is shown<br />
below.<br />
INPUT-<strong>FOR</strong>MAT:<br />
DIMENSION = ROW, 5000, COLUMN, 5000;<br />
ORDERING = ROWWISE;<br />
SPARSE-TYPE-l: CONTROL-<strong>DATA</strong> = ROW, 14;<br />
<strong>FOR</strong>MAT =5(14,2X,FI0.6) ;<br />
4.2. Stored-Data Mapping Language (SDML)<br />
The SDML has two functions: (i) to describe<br />
the different types of mapping which the<br />
system can make between a logical schema and a<br />
target storage space, and (2) to describe the encoding<br />
to storage structures. The major structure<br />
of the language is comprised of the access path encoding<br />
and the encoded file. The major emphasis<br />
of the language is on the access path encoding,<br />
which represents the most difficult part of the<br />
mapping description. The encoded file section enables<br />
the assignment of encoded data (data items<br />
and pseudo data) to the files in the database according<br />
to the corresponding definitions of filenames<br />
and file accessing methods.<br />
selection of an appropriate mapping subsection and<br />
relates its subsections to the mapping descriptions<br />
of the direct, indirect and linked schema<br />
encoding groups. Reference to mapping descriptions<br />
defined in one encoding group by another is<br />
a colmnon feature of the language, e.g. REF-ITEM<br />
definition of pseudo data in the indirect encoding<br />
subsection is referenced by the linked encoding<br />
subsection.<br />
The direct encoding, implied by the <strong>DATA</strong>-ORG:<br />
subsection, describes the data item with its properties<br />
llke data ordering and type. It also provides<br />
for an optional definition of dimension and<br />
bandwidth for a source database description. The<br />
indirect encoding provides a choice of mapping alternatives<br />
for encoding pseudo data and data item<br />
to separate encoded files by the mapping descriptions<br />
identified by MAP-ORG: and REF-ORG: (see<br />
Figure 4). In addition, an ordered combination<br />
of pseudo data and data items may be mapped to an<br />
encoded file by MIXED-ORG: mapping description as<br />
follows:<br />
MIXED-ORG: SET ~RDERED~(REF-ITEM, <strong>DATA</strong>-ORG)~.<br />
~(REF-ITEM, REF'ITEM,~r<br />
~ <strong>DATA</strong>-ORG) JJ<br />
The linked encoding enables the mapping of<br />
any set of nodes to an encoded file. Each node<br />
is identified by a user defined node-name and<br />
consists of a set of fields. Each field is described<br />
by an optional field-name and a field identifier<br />
which may be a node key, pseudo data, or<br />
data item. An example of linked encoding mapping<br />
is illustrated in Figure 4.1.<br />
The mapping description consists of definitions<br />
of both primitive and nonprimitive data<br />
structures. The representation of structures of<br />
primitive type is usually by an assignment statement,<br />
while that of nonprimltive is by a descriptive<br />
statement consisting of a set or group name,<br />
and a set or group definition [16]. We provide<br />
the following constructs in the language to specify<br />
data, ordering and linkage definitions:<br />
i. ordering definition types--rowwise, collumnwise<br />
and none;<br />
2. basic data types--integer, real, and alphanumeric;<br />
3. linkage definition types--header, first,<br />
next, prior, last, row, column, node,<br />
field, and null.<br />
A valid and meaningful linkage definition,<br />
except the NULL keyword, requires an ordered combination<br />
of the following: (I) a pointer linkage<br />
keyword, (2) row or column, and (3) node or field.<br />
The pointer linkage keywords are header, first,<br />
next, prior, and last. An example of a valid<br />
definition is FIRST ROW NODE.<br />
is:<br />
An example of a<br />
primitive type data structure<br />
integer<br />
<strong>DATA</strong>-ITEM = ~ real ~ ;<br />
L alpha 3<br />
The access path encoding section enables the<br />
424
An example of a nonprlmltive type data structure<br />
illustrating a SET definition is:<br />
<strong>DATA</strong>-ORG:<br />
[ROUSE<br />
SET(<strong>DATA</strong>-ITEM), ORDERING=~COLUMNWISE|;<br />
(.NONE .2<br />
A primitive type data structure which is semantlcally<br />
ambiguous, e.g. index and pointer, becomes<br />
a nonprlmltive structure by qualifying the<br />
basic data definition with a semantic phrase definition<br />
as follows:<br />
INDEX: ~integer~ , TYPE =[ROW INDEX<br />
Lalpha J ~COLUMN INDEX ~ ;<br />
]CONCAT(ROW INDEX,]<br />
£COLUM~ INDEX) J<br />
An access path is described by ORDERING and<br />
LINKAGE phrases. ORDERING describes the matrix<br />
data access path by row, column or none. It is assumed<br />
that the ORDERING of reference items, i.e.,<br />
indices and locations (within the matrix or from<br />
diagonal elements) corresponds to that of matrix<br />
data items. LINKAGE describes linked llst structure<br />
connectivity by a combination of linkage keywords<br />
as in the following example:<br />
PTR-ORG: SET(PTR-ITEM), LINKAGE=NEXT COLUMN FIELD;<br />
5. The Feaslbillty of SDDL and SDML in a Numerical<br />
Database System<br />
The current approach to numerical database<br />
management is restricted to a few matrix compact<br />
storage schemes. The most cmmnon compact storage<br />
scheme for processing sparse matrices residing on<br />
secondary devices is the double-lndexlng (rowcolumn)<br />
technique, but this is not the best technique<br />
for many applications. A few research<br />
groups, e.g., [9], have tried the linked llst<br />
technique for programs tailored to their applications;<br />
however, they are not always available for<br />
public distribution.<br />
Our investigation of the implementation of<br />
a generalized approach to numerical database management<br />
reveals two basic requirements. The<br />
first requirement is for the numerical database to<br />
reside on secondary storage using the storage<br />
scheme that is best fitted for its application.<br />
The second requirement is to provide tools for<br />
data access that will promote physical data independence<br />
through the implementation of a DML.<br />
It is obvious that the first requirement is<br />
a prerequisite to the second and that there are<br />
two options for its realization. The first option<br />
is for each user to be responsible for structuring<br />
his numerical database corresponding to the physical<br />
schema best suited to his application. This optlon<br />
is not practical because a user may not know<br />
how to structure his database to suit his objective.<br />
The second option is to have a generalized data<br />
translator that will automatically restructure any<br />
numerical database from one physical schema to another,<br />
or convert unstructured raw data not in a<br />
compact storage form, corresponding to a physical<br />
schema. It is essential for this option to be integrated<br />
into any effective generalized approach<br />
to numerical database management.<br />
Our first priority then is to develop a generalized<br />
data translator for numerical databases<br />
that will isolate the users from the underlying<br />
data management through stored-data description<br />
and mapping language facilities.<br />
5.1. A ~enerallzed Data Translator for Numerical<br />
Databases<br />
We are currently developing a generalized data<br />
translator for numerical databases as a first<br />
step towards developing a generalized numerical<br />
database management system. The generalized data<br />
translator is focused on the implementation of our<br />
nonprocedural Stored-Data Description and Mapping<br />
Languages (SDDL and SDML). Its function is to automatically<br />
create or restructure a numerical database<br />
from one schema to another in two consecutive<br />
processes of compilation and data translation<br />
(to be discussed later). Its input, supplied<br />
by the user, consists of the source and target<br />
SDDL and SDML statements (see Figure 4), and a<br />
source numerical database, Its output is the target<br />
numerical database. The overall functions are<br />
illustrated in Figure 5.<br />
During the compilation process, the user-supplied<br />
SDDL and SDML statements are converted by a<br />
lexical analyzer into a token stream which is<br />
translated by a Generalized Syntax Directed Translation<br />
Scheme (GSDTS) £nto <strong>FOR</strong>TRAN source programs<br />
of the reader, the restructurer, and the writer<br />
subroutines. After compilation by a <strong>FOR</strong>TRAN compiler,<br />
the subroutines become the major components<br />
of the translator subsystem. The translator subsystem<br />
also includes common data table information,<br />
shown in Figure 6, and utility functions and routines<br />
to compute mapping functions, e.g., synmnetrlc<br />
and band address locations, and to execute<br />
search and reordering algorithms.<br />
5.2. Data Translation Process<br />
The data translation process of the translator<br />
subsystem starts with the encoding of each record(s)<br />
of the source database into a translator<br />
internal form (TIF), followed by the decoding of<br />
TIF data to encoded record(s), and ending with the<br />
writing of record(s) on the storage devices. The<br />
components of the TIF are (I) the row/column identifier,<br />
(2) the index buffer for column/row index,<br />
and (3) the data item buffer for row/column data<br />
item. The translation process is controlled by<br />
the translation supervisor which activates the<br />
reader to encode the source database record(s) to<br />
TIF data, followed by the restructurer to decode<br />
the TIF data to encoded record(s), and then the<br />
writer to convert the encoded record(s) to physical<br />
record(s) and to wrlte it on the storage device,<br />
Each subroutine returns control to the supervisor,<br />
which activates the next subroutine accordingly,<br />
and the process is repeated until all<br />
the records of the source database have been processed.<br />
Figure 6.1 illustrates a data translation<br />
process of double-lndex-2 source database to doubly-llnked-llst<br />
target database.<br />
5.2.1. Reader Module<br />
The reader encodes both the unstructured matrix<br />
data, i.e., raw data not in any compact stor-<br />
425
age form, and the numerical database. In both cases,<br />
the information in the source file control<br />
table and either the input format or the physical<br />
schema table (see Figure 6) is used by the reader<br />
to read source data from cards or secondary devices<br />
and encode it into the translator internal<br />
form (TIF) data. The source data is processed by<br />
row/column according to the input format or physical<br />
schema specification. In order to produce the<br />
TIF data, each encode step ~f the translation iteration<br />
does the following: (i) fills in the appropriate<br />
row/column identifier, and (2) fills in<br />
the corresponding index and data buffers for that<br />
row/column (see Step la of Figure 6.1). For example,<br />
with row identifier equals I, we have I and<br />
4 in column index buffer, as well as I and 2 in<br />
data item buffer. On completion, control is re-<br />
turned to the supervisor for the next step of<br />
translation iteration, i.e., the decode step by<br />
the restructurer.<br />
5.2.2. Restrueturer Module<br />
If the source ordering is different from the<br />
target ordering, the TIF data of the entire database<br />
is temporarily stored in a workfile(s) to be<br />
reordered before it is decoded; otherwise, the TIF<br />
data is decoded into encoded data corresponding to<br />
the target schema as received. Each decode step<br />
of the translation iteration from the TIF data to<br />
a direct encoding group, dlslcards the index buffer,<br />
and reorganizes the data items to the appropriate<br />
encoded data. For the indirect encoding<br />
group, both the data items and the index which is<br />
I<br />
Source<br />
I<br />
SDDL & SDML<br />
Statements<br />
Lexical<br />
Analyzer<br />
Target<br />
|SDDL & SDML<br />
Statements<br />
I<br />
Lexical<br />
Analyzer<br />
I<br />
COMPILATION<br />
Token<br />
i<br />
Target<br />
Token<br />
GSDTS for SDDL and SDML<br />
/<br />
<strong>FOR</strong>TRAN<br />
Conversion<br />
Programs<br />
/<br />
\<br />
r-<br />
<strong>FOR</strong>TRAN Compiler<br />
i<br />
TRANSLATION<br />
C<br />
NSu.t~rr~.ceall, ,%<br />
Database j<br />
TRANSLATOR<br />
Subsystem<br />
.( Target<br />
,~Numerical<br />
~.D_atabase<br />
><br />
L<br />
Internal<br />
Form Data<br />
Figure 5.<br />
Usage and functions of the generalized data translator.<br />
426
SOURCE<br />
TARGET<br />
Control<br />
File I<br />
Table<br />
\<br />
\<br />
I Input<br />
Format<br />
I<br />
I<br />
\ I i<br />
f<br />
f<br />
f<br />
Physical<br />
Schema<br />
Table<br />
Physical<br />
Schema<br />
Table<br />
/<br />
/<br />
/<br />
/<br />
File<br />
Control<br />
Table<br />
I<br />
I<br />
I<br />
RESTRUCTURER<br />
WRITER<br />
1<br />
TRANSLATOR<br />
SUBSYSTEM<br />
< Tran81ator<br />
I,<br />
Target<br />
1<br />
Numerical<br />
Source 1<br />
Internal<br />
Numerical<br />
Database<br />
Database<br />
* Either Input Format--unstructured (raw) source matrix data.<br />
Or<br />
Physical Schema Table--source database in compact storage form.<br />
data descriptions<br />
data flow<br />
> processing sequence<br />
Figure 6.<br />
Major components of the translator subsystem.<br />
converted to the appropriate pseudo data, become<br />
the encoded data. However, the linked encoding<br />
group requires the supervisor to create null head<br />
nodes during initialization. Data item nodes with<br />
any appropriate pointers are created to form the<br />
encoded data at each decode step. For example, in<br />
Step Ib of Figure 6.1, two data item nodes for the<br />
first row are created to correspond to the TIF data<br />
in Step la. In addition, "i" in the row and column<br />
head nodes represents the pointer to the first data<br />
item node, and "2" in the column head and the first<br />
data item nodes respectively represents the column<br />
pointer to the second data item node. At the end<br />
of this step, control is returned to the supervisor<br />
for the last phase of the translation iteration<br />
i.e. writing the encoded data on the secondary devices<br />
by the writer.<br />
5.2.3. Writer Module<br />
The writer uses the information in the target<br />
file control table to open the file(s) of the target<br />
database during initialization and closes them<br />
after the entire database has been processed. It<br />
performs the last phase of each translation iteration<br />
by converting the encoded data into physical<br />
record(s) to be written on the secondary devices<br />
according to the user-deflned target file access<br />
method. For example, with regard to the encoded<br />
data of Step Ib in Figure 6.1, the head node records<br />
are updated records which are rewritten in<br />
place, and the data item node record is written<br />
as a new record on secondary device. On completion,<br />
control is returned to the supervisor for<br />
another translation iteration to begin with the<br />
reader.<br />
5.3. Compilation Process<br />
The compilation process is the sequence of<br />
operations necessary to automatically produce the<br />
reader, the restructurer, and the writer subrou-<br />
tine programs from the SDDL and SDML statements<br />
supplied by the user. Our investigation of automatic<br />
data conversion techniques [2,13,17,18] reveals<br />
tha= compiler-compiler techniques are generally<br />
used. In order to be able to perform a<br />
broad, useful and syntactically valid class of<br />
427
Source database of figure 2<br />
Step O<br />
1002<br />
0030<br />
0400<br />
5012<br />
Logical<br />
S ch ema<br />
II 3 4 5 0 0 0 0 1<br />
1 1 4 3 2 1 3 4 0 1<br />
II 2 3 4 5 1 2 0]<br />
Source record size = 4;<br />
Source file org. = sequential for all files.<br />
Row beginning file<br />
Column index file<br />
Data item file<br />
Target database of figure 3<br />
(Partial data description)<br />
Target record size = 14;<br />
No of row = 4;<br />
Target file org. = random;<br />
Translation Start<br />
Initialization Operation<br />
Create null head node records<br />
Buffer size = 4;<br />
No of column = 4;<br />
Record key = integer;<br />
Row-head node rec. [I [0 0 0 0 0 ..... 0 [<br />
Step la<br />
Col-head node rec. 12 I0 0 0 0 0 .....<br />
rec key<br />
ist Translation Iteration<br />
Source data to TIF (translator internal form) data<br />
Row identifier = I;<br />
01<br />
Step Ib<br />
Index buffer = I I 4 0 0 ~ Data buffer<br />
TIF data to Encoded Data<br />
Row-head node rec. ~I ~ i 0 0 0 0 .....<br />
Col-head node rec. 12 | I 0 0 2 0 .....<br />
Data-ltem node rec. [3l 1 l1 1 1 0 212ll 4 2 o<br />
Figure 6.1<br />
r~c ~ode n~de<br />
key key key<br />
An illustration of a data translation process.<br />
ii 2 0 0~<br />
0#<br />
0J<br />
01<br />
translations, we decided that a generalized syntaxdirected<br />
translation scheme (GSDTS) is the best model<br />
for our application. Because <strong>FOR</strong>TRAN is the<br />
progran~ning language of the majority of numerical<br />
application users, we decided to write the translation<br />
software in portable <strong>FOR</strong>TRAN so that it can<br />
be of general distribution with little or no modification<br />
of the source programs from one computer<br />
system to another.<br />
A GSDTS requires an underlying LR(k) contextfree<br />
grammar. Therefore, we had to construct LR(k)<br />
gralmaars for our SDDL and SDML, and in order to<br />
minimize the compilation time, we have constructed<br />
SLR(1) grammars for the SDDL and SDML such that the<br />
terminal symbols are single digits or letters except<br />
the user-deflned variables and constants.<br />
The grammars and the LR(1) automatic parser generator<br />
which is used to validate them as part of the<br />
system initialization process are discussed below.<br />
A token stream of single digits or letters<br />
for keywords, and user-defined variables and constants<br />
is the output from the conversion of the<br />
SDDL and SDML statements by the lexical analyzer<br />
Eli. For example, "TYPE = SOURCE"; is converted<br />
to "I", "TYPE = TARGET"; becomes "2", "FILE-NAME<br />
= SAMPLE"; becomes "SAMPLE." The token stream is<br />
the input to the GSDTS which produces the source<br />
<strong>FOR</strong>TRAN subroutine programs to be compiled by the<br />
<strong>FOR</strong>TRAN compiler into object decks as the final<br />
output of the compilation process.<br />
An illustration of the compilation process is<br />
shown in figure 6.2. The SDDL statements of figure<br />
4 are input to the lexical analyzer. The<br />
statements are processed by the lexieal analyzer<br />
to produce an output token stream, which becomes<br />
an input to the GSDTS. The token stream is processed<br />
by the GSDTS in a concurrent operation of<br />
LR(1) parsing and semantic analysis. If no error<br />
is encountered during parsing and on successful<br />
428
eduction to the final state, the Semantic Analyzer<br />
outputs the generated <strong>FOR</strong>TRAN statements.<br />
We will llke to mention that all data declarations<br />
are made in the Translator Subsystem so<br />
that the routines would have access to the common<br />
variables, even if there is an overlay operation.<br />
This explains why only the Translator Subsystem<br />
declarative statements are generated in figure 6.2~<br />
because the Reader routine <strong>FOR</strong>TRAN statements of<br />
a structured database are generated by processing<br />
the SDML statements. On the other hand, since an<br />
unstructured source database has no SDML statements,<br />
so in this case the Reader routine <strong>FOR</strong>TRAN<br />
statements are generated along with the Translator<br />
Subsystem declarative statements by processing the<br />
SDDL statements.<br />
Input statemen t<br />
Conversion of SDDL statements<br />
5.3.1. SLR(I) Grammars for SDDL and SDML<br />
We have constructed one SLR(1) grammar for<br />
the SDDL such that terminal symbols for keywords<br />
are generally numerical codes with single letters<br />
wherever it is necessary to provide one unique<br />
lookahead symbol for consistency resolution. In<br />
order to maintain a modular programming approach<br />
and provide for execution time storage overlay<br />
should the need arise, we constructed two SLR(1)<br />
grammars for the SDML, which are one for the Direct<br />
and Indirect Encoding Sections, and another<br />
for the Linked Encoding Section with the Encoded<br />
File Section included in each grammar. The two<br />
SLR(1) grammars are similar to that of SDDL.<br />
The nontermlnals of the grammars are in selfexplicit<br />
BNF, e.g., , ,<br />
of figure 4 to Tokens<br />
Token<br />
<strong>DATA</strong>-DESCRIPTION:<br />
MATRIX-STRUCTURE:<br />
TYPE = SPARSE, NONSYMMETRIC, STATIC;<br />
FILE-CONTROL:<br />
TYPE = SOURCE;<br />
FILE-UNIT = 21, 22, 23;<br />
Token Stream -<br />
21N 22N 23N<br />
GSDTS Output -<br />
MEDIUM = DISK;<br />
RECORD: REC-KEY = integer;<br />
SIZE = 512,<br />
FIXED,<br />
UNBLOCKED;<br />
Output from Lexical Analyzer,<br />
Input to GSDTS.<br />
3 I 512 I 2<br />
<strong>FOR</strong>TRAN Declarative Statements for the Translator Sybsystem<br />
INTEGER ROWID, COLID, BUFSZE, SDATOG, UPRCOD<br />
INTEGER RCOSTA, RECSZE, FLEUNT<br />
INTEGER DIAGID, DENSTY, FLENAM, BLKSZE<br />
DIMENSION INDROW(500), INDCOL(500), <strong>DATA</strong>(500),<br />
I<br />
INDEX(500),FLEUNT(3)<br />
DIMENSION DATBUF(500), INDUF(500)<br />
DIMENSION FLEUNT(3), FLEID(3), FLENAM(42)<br />
COMMON/GLOBAL/NOROW, NOCOL, ROWID, COLID, LWRCOD,<br />
i BUFSZE, IERROR, SDATOG, UPRCOD, DATBUF, INDBUF<br />
COMMON/ENCCOM/RCOSTA, INDPTR, KONTRL, RECSZE,<br />
I <strong>DATA</strong>, INDROW, INDCOL, FLEUNT<br />
<strong>DATA</strong> BUFSZE/500/<br />
<strong>DATA</strong> FLEUNT(1), FLEUNT(2), FLEUNT(3) / 21,22,23/<br />
<strong>DATA</strong> RECSZE,BLKSZE,RECKEY /512,0,1/<br />
S<br />
21N<br />
22N<br />
23N<br />
3<br />
I<br />
512<br />
1<br />
2<br />
TRS20020<br />
TRS20040<br />
TRS20080<br />
TRS20100<br />
TRS20120<br />
TRS20140<br />
TRS20150<br />
TRS20160<br />
TRS20170<br />
TRS20210<br />
TRS20220<br />
TRS20310<br />
Figure 6.2<br />
An illustration of the Compilation Process<br />
429
and. One advantage<br />
of the modular SLR(1) gran~aar approach is that new<br />
features, llke additional pointer linkage definitions,<br />
could be added to the language with easy<br />
modification of the corresponding grammar. All<br />
the grammars have been proved to be SLR(1) by the<br />
LR(1) automatic parser generator.<br />
5.3.2. LR(1) Automatic Parser Generator<br />
The LR(1) automatic parser generator, developed<br />
by Wetherell and Shannon in [19], is written<br />
entirely in portable ANSI Standard <strong>FOR</strong>TRAN 66 and<br />
it has been successfully operating on a number of<br />
computers. It generates a space efficient parser<br />
for any LR(1) grammar. It reads a context-free<br />
grammar in a modified BNF format and produces tables<br />
which describe an LR(1) parsing automaton. It<br />
has been used to validate our SDDL and SDML grammars<br />
and to produce the corresponding tables for<br />
describing their LR(1) parsing automata. The tables<br />
consist of dimension and data statements to be<br />
embedded into the LR(1) parser subroutines to be<br />
described later. The procedure is performed once<br />
as part of our system initialization operation for<br />
the development of the GSDTS--for the SDDL and the<br />
SDML to be discussed below.<br />
5.3.3. GSDTS--for the SDDL and the SDML<br />
Generalized syntax-dlrected translation<br />
schemes (GSDTS) are well defined in literature and<br />
we chose to implement a bottom-up execution of<br />
GSDTS [i]. The major components of the GSDTS--<br />
for the SDDL and the SDML are, as illustrated in<br />
Figure 7, the following: (I) LR(1) parser, (2)<br />
LR(1) tables, (3) Semantic Analyzer, and (4) SDDL<br />
and SDML Semantic Tables. Its input is the SDDL<br />
and SDML token stream generated by the lexleal<br />
analyzer and assigned token values from LR(1) tables<br />
by the LR(1) parser's internal scanner. The<br />
outputs produced by the GSDTS are the reader, the<br />
restructurer and the writer <strong>FOR</strong>TRAN source subroutines<br />
produced from the tokens of the source<br />
SDDL and SDML, the target 3DML, and the target<br />
SDDL respectively.<br />
The LR(1) parser is a set of subroutines<br />
which interpret the LR(1) tables to construct a<br />
parse of the SDDL and SDML token stream. Some of<br />
SDDL & /<br />
SDML Token<br />
LR(1)<br />
Parser<br />
1<br />
Tables<br />
ii s° 1<br />
Semantic [<br />
Analyzer<br />
Rules<br />
[<br />
GSDTS<br />
<strong>FOR</strong>TRAN /<br />
Conversion<br />
Program<br />
Figure 7.<br />
GSDTS for SDDL and SDML.<br />
430
subroutines were part of the software developed<br />
by Wetherell and Shannon in [19], but they have<br />
been modified and tested to suit our application.<br />
We have developed three LR(1) parsers for the<br />
SDDL, the direct and indirect encodings, and the<br />
linked encoding SLR(1) granmaars respectively.<br />
The Semantic Analyzer consists of two major<br />
routines which perform the semantic analysis and<br />
the output production. The SDDL and SDML Semantic<br />
Tables contain the semantic rules corresponding<br />
to the SLR(1) grammar production rules. However,<br />
we are currently restricting our implementation to<br />
a few physical schemas which are representative of<br />
the three encoding groups, Therefore, the current<br />
semantic tables contain semantic rules corresponding<br />
to only those physical schemas, with null<br />
rules for the others so that they could be easily<br />
extended after the completion of the current development<br />
process.<br />
6. Future Directions and Developments<br />
In this paper, we have provided a model of a<br />
generalized approach for describing and mapping<br />
any numerical database to secondary storage by nonprocedural<br />
Stored-Data Description and Mapping<br />
Languages (SDDL and SDML). We have also shown how<br />
the DMBS concepts llke schema and data language<br />
facilities are also applicable to databases necessary<br />
to process numerical applications, which are<br />
residing on secondary devices. In addition, we<br />
have also discussed the feasibility of our model<br />
as a valuable tool in numerical database management<br />
as described in the current implementation<br />
of our generalized data translator for numerical<br />
databases.<br />
An area for the extension of thls research<br />
is in the implementation of a data manipulation<br />
language (DML). As previously mentioned, we have<br />
already designed a DML which consists of certain<br />
primitive statements that correspond to the operations<br />
permitted on the numerical database and embedded<br />
into <strong>FOR</strong>TRAN. The file control and the<br />
physical schema tables, and some of the conversion<br />
utility subroutines of our model would be of use<br />
in the implementation of the DML at a later date.<br />
Another area of research is in the performance<br />
evaluation of the numerical physical schemas<br />
with regards to specific applications or numerical<br />
operations. MacVelgh has reported in [i01, the<br />
effect of data representation on the cost of<br />
sparse matrix operations in primary storage. It<br />
is desirable to extend this work to secondary storage<br />
and to develop a performance evaluation model<br />
for matching numerical database of an applicatlon<br />
to the best-fit physical schema on secondary storage.<br />
Finally, we would like to identify some physical<br />
schemas of our model that have currently<br />
proved to be of practical applications in numerical<br />
database management. The threaded-llnked-list<br />
structure has been successfully implemented in the<br />
WARDEN system in use at the University of Warwick<br />
[9] for Computer-Aided Design. Besides, secondary<br />
storage implementations that are similar to our<br />
direct encoding group, are identified in EASY--<br />
an Engineering Analysis System of Utility Programs<br />
[8], while a row-column schema is used in Vectorized<br />
General Sparslty Algorithms with Backing<br />
Store [3]. Since the need for secondary storage<br />
backup is relative to the size of the primary<br />
storage, our model will be of great advantage in<br />
institutions with small or medium size computing<br />
facilities.<br />
REFERENCES<br />
I. Aho, A.V. & Ullman, J.D. "The Theory of Parsing,<br />
Translation and Computing, Volume II:<br />
Compiling," Prentlce-Hall, Inc., Englewood<br />
Cliffs, N.J., 1973.<br />
2. Bach, M.J., et al. "The ADAPT System: A Generalized<br />
Approach Towards Data Conversion,"<br />
Proc. 5th Int. Conf. Very Large Data Bases,<br />
ACM, N.Y. Oct. 1979, pp. 183-193.<br />
3. Calahan, D.A., et al. "Vectorlzed General<br />
Sparslty Algorithms with Backing Store," Systems<br />
Eng. Lab., University of Michlgan, Ann<br />
Arbor, SEL Report #96, Jan. 15, 1977.<br />
4. CODASYL Data Base Task Group Report, Conf.<br />
Data System Languages, April 1971, ACM, New<br />
York,<br />
5. CODASYL Data Description Language Journal of<br />
Development, June 1973 Report.<br />
6. Duff, I.S., "A Survey of Sparse Matrix Research,"<br />
Proc. of the IEEE, Vol. 65, No. 4,<br />
April 1977, pp. 500-535.<br />
7. Fry, J.P., et al. "Stored-Data Description<br />
and Data Translation: A Model and Language,"<br />
Information Systems, Vol. 2(3), 1977, pp.<br />
95-147.<br />
8. Jensen, paul S., "An Engineering Analysis System,"<br />
Proc. ACM 1978 Annual Conference, Washington,<br />
D.C., Vol. I of 2, Dec. 4-5-6, 1978,<br />
pp. 490-495.<br />
9. Larcombe, M.H.E., "A List Processing Approach<br />
to the Solution of Large Sparse Sets of Matrix<br />
Equations and the Factorlzation of the<br />
Overall Matrix," Proc. Oxford Con f. on "Large<br />
Sparse Sets of Linear Equations," J,K. Reid ,<br />
Editor, April 1970, Academic Press, New York,<br />
1971, pp. 25-40.<br />
I0. MacVelgh, Donald T., "Effect of Data Representation<br />
on Cost of Sparse Matrix Operations,"<br />
Acta Informatlca , Vol. 7, 1977,<br />
pp. 361-394.<br />
ii. Maurer, Herman H., "Data Structures and Progranm~Ing<br />
Techniques," Translated by Camille C~<br />
Price, Prentice-Hall, Inc., Englewood cliffs,<br />
N.J., 1977.<br />
12. Pooch, U.W. and Nieder, A., "A Survey of Indexing<br />
Techniques for Sparse Matrices," ACM<br />
Computing Surveys, pp. 109-133, Vol. 5. No. 2,<br />
June 1973.<br />
13. Ramlrez, J., "Automatic Generation of Data<br />
Converslon-Programs Using a Data Description<br />
431
Language (DDL)," Ph.D. Dissertation, University<br />
of Pennsylvania, 1973.<br />
14.<br />
15.<br />
16.<br />
17.<br />
18.<br />
19.<br />
Rosenberg, A.L. and Stockmeyer, .L., "Storage<br />
Schemes for Boundedly Extendible Arrays,"<br />
Acta Informatlca, 7, 1977, pp. 289-303.<br />
Scheuermann, Peter, "On the Design and Evaluation<br />
of Data Bases," IEEE Computer, Feb. 1978,<br />
pp. 46-54.<br />
Scheuermann, Peter, "Concepts of a Data Base<br />
Simulation Language", Proc. ACM SIGMOD Int'l.<br />
Conf. on Management of Data, 1977, pp. 144-156.<br />
Shu, N.C. et al., "EXPRESS: A Data EXtraction,<br />
Processing and REStructuring System," ACM<br />
Trans. Database Systems, Vol. 2, No. 2,<br />
June 1977, pp. 134-174.<br />
Taylor, Robert W., "Generalized Data Base<br />
Management System Data Structures and their<br />
Mapping to Physical Storage," Ph.D. dissertation,<br />
Univ. of Michigan, 1971.<br />
Wetherell, Ca. and Shannon, A., "LR Automatic<br />
Parser Generator and LR(1) Parser," Lawrence<br />
Livermore Lab., University of California,<br />
P.O. Box 808, Livermore, CA 94550, June 14,<br />
1979.<br />
432