05.01.2015 Views

A DATA DEFINITION AND MAPPING LANGUAGE FOR ...

A DATA DEFINITION AND MAPPING LANGUAGE FOR ...

A DATA DEFINITION AND MAPPING LANGUAGE FOR ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A <strong>DATA</strong> <strong>DEFINITION</strong> <strong>AND</strong> <strong>MAPPING</strong> <strong>LANGUAGE</strong> <strong>FOR</strong> NUMERICAL <strong>DATA</strong> BASES<br />

Ola-Olu A. Dainl and Peter Scheuermann<br />

Electrical Engineering and Computer Science Department<br />

Northwestern University<br />

Evanston, Illinois 60201<br />

Abstract<br />

Numerical data bases arise in many scientific<br />

applications to keep track of large sparse and<br />

dense matrices. Unlike the many matrix data storage<br />

techniques available for incore manipulation,<br />

very large matrices are currently limited to a few<br />

compact storage schemes on secondary devices, due<br />

to the complex underlying data management facilities.<br />

This paper proposes an approach for generalized<br />

numerical database management that would promote<br />

physical data independence by relieving users<br />

from the need for knowledge of the physical data<br />

organization on the secondary devices.<br />

Our approach is to describe each of the storage<br />

techniques for dense and sparse matrices by a<br />

physical schema, which encompasses the corresponding<br />

access path, the encoding to storage structures,<br />

and the file access method. A generalized<br />

facility for describing any kind of numerical database<br />

and its mapping to storage is provided via<br />

nonprocedural Stored-Data Description and Mapping<br />

Languages (SDDL and SDML). The languages are processed<br />

by a Generalized Syntax-Directed Translation<br />

Scheme (GSDTS) to automatically generate <strong>FOR</strong>TRAN<br />

conversion programs for creating or translating numerical<br />

database from one compact storage scheme<br />

to another. The feasibility of the generalized approach<br />

with regard to our current implementation<br />

is also discussed.<br />

I. Introduction<br />

The problem of storage representation for<br />

dense/sparse matrices in main core, in order to<br />

optimize storage costs or processing time, has received<br />

considerable attention in literature [6,10,<br />

* This research is supported by the University<br />

of Ife, lle-lfe, Nigeria.<br />

**On study leave from Computer Science Department,<br />

University of Ire, Nigeria.<br />

Permission to copy without fee all or part of this material is granted<br />

provided that the copies are not made or distributed for direct<br />

commercial advantage, the ACM copyright notice and the title of the<br />

publication and its date appear, and notice is given that copying is by<br />

permission of the Association for Computing Machinery. To copy<br />

otherwise, or to republish, requires a fee and/or specific permission.<br />

©1980 ACM 0-89791-028-1/80/1000/0418 $00.75<br />

12]. A variety of compact storage schemes have<br />

been developed and facilities for incore data manipulation<br />

using these schemes are available in a<br />

number of the software packages currently in use<br />

at any computing center. However, only a few matrix<br />

compact storage schemes are currently being<br />

implemented for the manipulation of large dense or<br />

sparse matrices residing on secondary devices and<br />

these are not readily available [3,8,9]. This is<br />

due to the fact that some of these methods employ<br />

quite complex data structures, such as threaded<br />

linked lists [Ii], which require complex programs<br />

for their implementation on secondary devices.<br />

In addition, there is also the added difficulty to<br />

an application user in accessing the compact matrix<br />

data residing on secondary devices.<br />

Numerical data bases refer to data bases necessary<br />

to process numerical applications, that are<br />

residing on secondary storage devices in matrix<br />

compact storage forms. A numerical application<br />

database may consist of from one to three interrelated<br />

set of files because pseudo data e.g. distance<br />

from the diagonal and row beginning in the<br />

data item vector, is usually kept on separate<br />

files from the data item file. In addition, the<br />

set of files may also be processed by different<br />

file access methods e.g. sequential for pseudo data<br />

file, and indexed sequential or direct for the<br />

index and data item files.<br />

While there recently have been important advances<br />

in the use of very large data bases in commercial<br />

applications, little has been done in the<br />

area of numerical applications because the current<br />

facilities of database management systems (DBMS)<br />

are not suitable for processing numerical data<br />

bases in the majority of the matrix compact storage<br />

schemes. In order to address this problem,<br />

there is a need for a generalized approach to numerical<br />

database management whereby the numerical<br />

application users have facilities for data definition<br />

and mapping as well as data access to numerical<br />

data bases in any matrix compact storage<br />

scheme by means of simple hlgh-level nonprocedural<br />

languages that relieve them from the need for<br />

knowledge of low-level details of physical implementation.<br />

The main advantage of data definition and<br />

mapping facilities is that the information that<br />

usually resides in an application program on any<br />

storage structure is removed into a schema which<br />

provides information on the physical storage or-<br />

418


ganization and its mapping interface to the operating<br />

system such that the user only provides infonm<br />

ation about logical data descriptions. These facilities<br />

are usually provided by a data definition<br />

language (DDL) or by stored-data description and<br />

mapping languages (SDDL and SDML) [7]. Similarly,<br />

data access facilities are provided by a data manipulation<br />

language (DML) which promotes physical<br />

data independence.<br />

Our investigation of data language facilities<br />

reported in [2,7,13,17,18] reveals that none<br />

are suitable for numerical data management, which<br />

usually requires different kinds of indexing and<br />

ordering capabilities. Therefore, we have designed<br />

the data language facilities (SDDL, SDML, and<br />

DML) which can provide a generalized approach to<br />

numerical database management. In view of the<br />

limited number of compact storage schemes currently<br />

in use for numerical database management, we<br />

are implementing a generalized data translator<br />

that will automatically restructure any numerical<br />

database from one compact storage scheme to another<br />

by means of SDDL and SDML facilities. This<br />

satisfies an important goal of data portability,<br />

and in addition the methodology developed for the<br />

data translator is an essential part for the support<br />

of a DML, which will be implemented in the<br />

second phase of our project.<br />

Our current approach provides the following<br />

features:<br />

i. Each dense or sparse matrix compact storage<br />

scheme can be described by a physical<br />

schema, which comprises the corresponding<br />

data access path, the encoding to storage<br />

structures and the file access method.<br />

2. A generalized facility for describing any<br />

kind of numerical database and its mapping<br />

to secondary storage, i.e. the physical<br />

schema, is provided via nonprocedural<br />

Stored-Data Description and Mapping Languages<br />

(SDDL and SDML).<br />

3. A generalized data translator that will<br />

enable application users to create or to<br />

restructure their numerical database from<br />

one compact storage scheme to another, by<br />

supplying the SDDL and SDML statements of<br />

the source and target database descriptlons.<br />

We begin by describing some relevant concepts<br />

from numerical analysis and DBMS in Section 2.<br />

Next, the numerical physical schemas and the SDDL<br />

and SDML facilities are described in Sections 3<br />

and 4. The feasibility of the SDDL and SDML in<br />

numerical database management and their implementation<br />

by a Generalized Syntax-Directed Translation<br />

Scheme (GSDTS) as part of our generalized data<br />

translator is discussed in Section 5.<br />

2. Overvlew of Numerical Analysis and DBMS<br />

Concepts<br />

Numerical data are usually generated in both<br />

quantitative and qualitative problem solving operations<br />

in the social sciences, physical sciences,<br />

engineering, etc. Numerical application data usu-<br />

ally corresponds to dense or sparse matrices, and<br />

any such data necessary to process a numerical application<br />

which is residing on secondary storage<br />

is called here a nmuerlcal database. We discuss<br />

matrix features which provide guidelines towards<br />

minimization of storage space and storage data<br />

representation as well as DBMS concepts, such as<br />

schema and the data language facilities, which enable<br />

our generalized approach.<br />

2.1. Dense and Sparse Matrix Compact Storage<br />

Schemes<br />

Two major types of matrices, dense and sparse<br />

matrices, will be considered. A dense matrix has<br />

a high proportion of nonzero elements, while a<br />

sparse matrix has a few nonzero elements. The two<br />

basic features for promoting compact matrix storeddata<br />

are symmetry and bandwidth. Different compact<br />

storage schemes for synmnetrlc and band matrices<br />

as well as several other sparse matrix indexing<br />

schemes are identified in literature [6,<br />

9-12]. These compact storage schemes are described<br />

by the corresponding numerical physical<br />

schemas in our generalized approach, as is described<br />

in section 3.<br />

2.2. Schema<br />

The term schema was originally coined in connection<br />

with the logical database description,<br />

i.e. the definition of the objects, roles and<br />

properties of interest to a given enterprise. The<br />

term was first brought into usage by the CODASYL<br />

Database Task Group [4,5]. However, it is now<br />

used in a broader sense to stand for data descriptions<br />

in database systems at the logical or physical<br />

level. Since for our numerical databases the<br />

logical structure is relatively simple, the role<br />

of the physical schema which describes the mapping<br />

to storage becomes predominant. Each type of<br />

matrix data organization, such as a square, lower<br />

triangular or band matrix, could be viewed as<br />

corresponding to a logical schema, while any compact<br />

storage scheme can be viewed as a storage<br />

model with a corresponding physical schema. The<br />

physical schema describes completely the mapping<br />

to storage in terms of: (I) access path organization,<br />

(2) encoding of storage structures, and<br />

(3) operating system accessing methods [15].<br />

2.3. Data Languase Facilities<br />

Data definition and mapping facilities are<br />

important features of a DBMS which support the<br />

concept of data independence. These facilities<br />

are provided either in the form of a self-contained<br />

language llke the data definition language<br />

(DDL) or as two languages which are a stored-data<br />

description language (SDDL) and a stored-data<br />

mapping language (SDML). A DDL is generally a<br />

declarative language for specifying logical data<br />

structures and a data mapping language specifies<br />

the mapping of the logical data structure to the<br />

storage space.<br />

The Database Task Group in [5] proposed a<br />

schema DDL as a language for defining a data model<br />

together with its mapping to storage so that<br />

it would meet the requirements of many distinct<br />

progran~ning languages. Another CODASYL group,<br />

419


the Stored-Data Definition and Translation Task<br />

Group [7], proposed a stored-data and data translation<br />

model and language for describing and translating<br />

among a wide class of logical and physical<br />

structures. Additional data definition and mapping<br />

languages have been proposed, with prototype<br />

implementations, for database reorganization, e.g.<br />

[2,13,17,18]. The language facilities are usually<br />

designed for operating on the traditional database<br />

schemas of relational, hierarchical and network<br />

data models.<br />

The matrix compact storage schemes which represent<br />

our model cannot be suitably defined using<br />

the data language facilities mentioned above because<br />

of the requirements for different kinds of<br />

indexing and data ordering capabilities. Therefore,<br />

we decided to develop nonprocedural storeddata<br />

description and mapping languages (SDDL and<br />

SDML) which provide a generalized approach for<br />

describing and mapping any numerical database to<br />

secondary storage. The two languages are discussed<br />

in section 4.<br />

Another important feature of a DBMS is a data<br />

manipulation language (DML) which provides the interface<br />

between the application users and the DBMS<br />

via a set of higher-level commands. We have designed<br />

a DML which contains commands embedded in<br />

<strong>FOR</strong>TRAN, corresponding to the operation performed<br />

on numerical databases. However, the DML will not<br />

be discussed further, since its implementation<br />

will be considered only in a future project.<br />

3. Numerical Physical Schemas<br />

As we mentioned previously, the various storage<br />

techniques for dense and sparse matrices suggested<br />

in literature can be represented by a corresponding<br />

physical schema, which depicts not only<br />

the access path, but also the encoding of storage<br />

structures and the file access method. In order<br />

to generalize the description of the physical<br />

schemas, we investigated their access paths for<br />

similarities. Our investigation reveals three<br />

groups which have direct, indirect and linked access<br />

paths respectively. The direct access path<br />

corresponds to dense array realization, the indirect<br />

to the technique of going through an index to<br />

access a data item (non-zero element) and the<br />

linked to the technique of accessing a data item<br />

through other data items connected to it by pointers.<br />

Formal definitions of the access paths will<br />

be presented later.<br />

Since in our case the access paths are closely<br />

related to the actual encodings of the storage<br />

structures, which specify mappings into a linear<br />

address space [15], we identify the groups as direct,<br />

indirect and linked encodlngs respectively.<br />

We shall assume that in our approach the linear<br />

address space refers to storage space on secondary<br />

devices.<br />

3.1. Direct Encoding Group<br />

Numerical physical schemas in this group describe<br />

compact storage schemes for dense matrices.<br />

Their logical schema comprises the dense m x n,<br />

lower-/upper-triangular or band matrices, and<br />

their storage is either an m x n matrix or a vect-<br />

or. The stored-data organization is in row or<br />

column major order and the access path is direct.<br />

Each of these storage schemes requires a single<br />

external file and those with a non-synmaetrlc dataset<br />

are usually processed by a sequential file access<br />

method. But indexed sequential or direct file<br />

access methods may be appropriate for symmetric<br />

matrices in order to reduce the access time involved<br />

in reconstructing the data items for a row/<br />

column. We identify the following storage schemes<br />

in this category (albeit, close to their logical<br />

counterparts).<br />

i. address-polynomial (regular m x n matrix)<br />

2. lower- or upper-trlangular<br />

3. symmetrlc-band<br />

4. nonsynmnetrlc-band<br />

An illustration of one of theschemas is shown<br />

below in Figure I.<br />

42000<br />

35600<br />

01430<br />

00917<br />

00023<br />

Source<br />

Dataset<br />

Logical<br />

Schema<br />

1356<br />

1143<br />

~917<br />

230<br />

Storage<br />

Scheme<br />

Figure i - Dense nonsymnetric-band matrix data<br />

structure.<br />

The group's access path is direct because the<br />

search technique uses computed-access array storage<br />

mapping which is defined as follows [14]:<br />

Definition: Let N denote the set of positive integers<br />

and A be a two-dimenslonal array scheme. A<br />

computed access storage mapping for A is a total<br />

function f: N x N 4 N such that: (I) f(l,l) = I,<br />

and (2) f is one-to-one on array scheme A.<br />

3.2. Indirect Encoding Group<br />

This group of numerical physical schemas describe<br />

the storage structures for all the sparse<br />

matrix indexing techniques whose access paths include<br />

reference data separately from the data-items<br />

themselves. Their logical schema is a m x n<br />

sparse matrix or a lower/upper diagonal matrix.<br />

Their storage scheme consists of vectors of data<br />

items, i.e., the non-zero elements, in row or columnmaJor<br />

order, with corresponding row and/or<br />

column indices and/or reference data. Reference<br />

data, i.e. pseudo data, refers to the location of<br />

data items within the source matrix; row/column<br />

beginning in the data item vector; or distance<br />

from the diagonal. These schemas usually require<br />

interrelated sets of two or three files respectively<br />

and their choice of file access method depends<br />

on the type of expected row/column retrieval.<br />

For sequential row/column retrieval, a sequential<br />

file access method is adequate; for random row/column<br />

retrieval, we can choose either indexed sequential/dlrect<br />

for all files or a combination of sequential<br />

for reference data file and indexed/dlrect<br />

for index and data item files. The schemas<br />

we identify in this group are:<br />

420


I. slngle-lndexlng<br />

2. double-lndexlng-I (row-column-I)<br />

3. double-lndexlng-2 (row-column-2)<br />

4. blt-map<br />

5. address-map<br />

An illustration of one of the schemas is shown<br />

in Figure 2.<br />

1234 i<br />

i 0 0 2 3 1 ~ ~i Row beginning in<br />

0 0 3 0 data item vector<br />

0400<br />

50 I 2 ~ 143 2 I 34 ~ j Column index<br />

i ! vector<br />

Logical<br />

Schema<br />

123 4567 ~j<br />

i<br />

I 12 3 4 5 1 2 Data item<br />

I<br />

vector<br />

123 4567 M(i,J)<br />

Storage Scheme<br />

Figure 2 - Double-indexing-2 (Row column-2)<br />

Their access path is indirect because the<br />

search technique uses a composite storage mapping<br />

which may be defined by the following [ii]:<br />

Definition: Let i and j represent the row and column<br />

data item subscripts; M(i,J)--data item location;<br />

~.--beginnlng relative address of indices<br />

i<br />

for row i; and n.--relatlve address of element 3 in<br />

column index vector as illustrated in column Figure<br />

2. Data ordering is assumed rowwise, for columnwlse<br />

ordering we can just interchange i and j.<br />

Let f represent any storage mapping function such<br />

that f(1) = ~i" A search function, ~f, is defined<br />

as follows:<br />

(~f) (j,~i) = ~j, iff f(~j) = j;<br />

and V ~j' s.t.<br />

!<br />

~i ~ ~j < ~j' f(~j' ) ~ j"<br />

= @, iff V ~j ¢ N, f(~j) ~ j.<br />

A composite mapping function, h, on a search function,<br />

~. is defined as follows: h(~f(j, f(i))) =<br />

M(i,j). r<br />

3.3. Linked Encodln~ Group<br />

The linked encoding group consists of numerical<br />

physical schemas for all the sparse indexing<br />

schemes with linked llst data structures.<br />

Their logical schema is the m x n sparse matrix<br />

and their storage scheme consists of lists of nodes.<br />

Each node has a format which might consist of data<br />

item, row and column indices and pointer fields.<br />

The schemas usually require a single file with indexed<br />

sequential or direct file access method.<br />

These schemas are further classified as:<br />

I. llnear-llnked-llst<br />

2. doubly-llnked-list<br />

3. threaded-linked-llst<br />

Figure 3 shows an illustration of such one of them.<br />

Their access path is called linked because the<br />

search technique uses a mapping defined through<br />

pointer linkage.<br />

It may be defined as follows:<br />

Definition: Let D = (X, R) be a storage structure<br />

with nodes xl, ..., x_ and relations (r,, ro) c R<br />

such that rl-represen~s a row equlvalen~e relatlon<br />

and r 2 represents a column equivalence relation.<br />

In adaltlon, let ~x I represent the address of node<br />

x.; k.x--the value of ith pointer field of node x<br />

l l r<br />

i.e, row pointer value; k4x--value of Jth polnte<br />

field of node x i.e., col6mn pointer value; X/rl--<br />

row equivalence class and X/r2--column equivalence<br />

class. A linked mapping is a linked realization<br />

of a relation from the header pointer node, if at<br />

least one of the following holds:<br />

I. The relation r I is realized as a linked<br />

structure (rel~tive to the ith pointer<br />

field) i,e., for every pair of nodes<br />

(x.,i x^)sz X/rl' ~xp ¢ k~x I holds, or<br />

similarly r 2 Is realizes ~s a linked<br />

structure.<br />

2. If for every ordered three nodes such that (xl, x~) c X/r. and (~., ~.)<br />

¢ X/r2, ~x2¢ ~ix~ and nx 3 ¢ ~jx I hol~.<br />

In addition, it is possible that the relation r is<br />

realized as a linked structure and the end node x<br />

points to the header node x', i.e. kx n = ~x'. n<br />

4. Data Lansuage Facilities<br />

The data language facilities provide a generalized<br />

approach for describing any numerical database<br />

and its mapping to storage. They consist<br />

of a stored-data description language (SDDL) and a<br />

stored-data mapping language (SDML). The two<br />

languages are similar to other data definition and<br />

mapping languages [7,17,18]. We have attempted as<br />

much as possible to make them user friendly, by<br />

including simple, self-explanatory language const,<br />

ructs. The choice of only one of the alternatives<br />

is represented by [] (braces) and an optional<br />

phrase by [] (square brackets). Language keywords<br />

appear in capital letters and user-defined words<br />

in lower case. Sample SDDL and SDML statements<br />

of both source and target numerical databases are<br />

shown in Figures 4 and 4.1 respectively. Other<br />

features of the two languages will be revealed as<br />

they are described below.<br />

4.1. Stored~Data Description Language (SDDL)<br />

The SDDL is intended mainly for the user to<br />

describe the logical characteristics of his numerical<br />

database and the associated type of file organization<br />

on secondary storage devices, or alternatively<br />

the card input-fornlst. Therefore, the<br />

language is divided into three parts which are<br />

(I) matrix structure, (2) file control, and (3)<br />

input format.<br />

The matrix structure describes the logical<br />

characteristics of the data and it also indicates<br />

if dynamic storage management is required. The<br />

basic matrix format is specified using the selfexplanatory<br />

keywords: ~DENSE ~ {SYMMETRIC ~,and<br />

~SPARSEy~ONSYMMETRIC 3<br />

B<strong>AND</strong>ED ~. If the matrix is symmetric, the<br />

ONB<strong>AND</strong>EDJ statement will include~UPPER-DIAGONAL~<br />

~LOWER-DIAGONAL~<br />

in order to specify the partition of the dataset<br />

421


i 0 0 2<br />

0 0 3 0<br />

0 4 0 0<br />

5 0 I 2<br />

Logical Schema<br />

<br />

D---~iII, I i1115121<br />

E~<br />

:"I 1412171oi<br />

1<br />

[i]------~ I, I ,I~I°I 0L<br />

Storage Scheme<br />

!<br />

-~L' 71 ~ b121o Ioi<br />

Node<br />

Format<br />

Node Row I Column I Data<br />

Key Index Index Item<br />

Column<br />

Node<br />

Pointer<br />

Row<br />

Node<br />

Pointer<br />

Figure 3.<br />

Doubly-llnked-llst<br />

422


<strong>DATA</strong>-DESCRIPTION:<br />

MATRIX-STRUCTURE:<br />

TYPE = SPARSE, NONSYMMETRIC, STATIC:<br />

FILE-CONTROL:<br />

TYPE<br />

= SOURCE;<br />

FILE-UNIT = 21, 22, 23;<br />

MEDIUM = DISK;<br />

RECORD: REC-KEY = integer;<br />

SIZE = 512, FIXED, UNBLOCKED;<br />

<strong>DATA</strong>-<strong>MAPPING</strong> (double-indexing-2);<br />

ACCESS-PATH-ENCODING:<br />

ACCESS-PATH = INDIRECT-ENCODING<br />

(REF-<strong>DATA</strong>-ORG);<br />

INDIRECT-ENCODING:<br />

REF-<strong>DATA</strong>-ORG: (REF-ORG-i,<br />

REF-ORG-2, <strong>DATA</strong>-ORG);<br />

REF-ORG-I: SET(LOC);<br />

LOC: integer, TYPE = ROW BEGIN-<br />

ING;<br />

REF-ORG-2: SET(INDEX);<br />

INDEX: integer, TYPE = COLUMN<br />

INDEX;<br />

<strong>DATA</strong>-ORG: DIMENSION = (5000,5000);<br />

ORDERING = ROWWISE;<br />

SET(<strong>DATA</strong>-ITEM);<br />

<strong>DATA</strong>-ITEM: real, REAL-PRECISION<br />

= DOUBLE;<br />

ENCODED-FILE:<br />

FILE-NAME = datfile,lndfile,locfile;<br />

ORGANIZATION = R<strong>AND</strong>OM,R<strong>AND</strong>OM, SEQUENTIAL;<br />

ENCODED-<strong>DATA</strong> = <strong>DATA</strong>-ORG, REF-ORG-2,<br />

REF-ORG-I;<br />

Figure 4.<br />

Sample SDDL & SDML statements of a<br />

source numerical database for a<br />

double-lndex-2 schema.<br />

to be processed. Similarly, a bandwidth statement<br />

which specifies the size of the band is required<br />

for a band matrix and a density statement giving<br />

an estimated density of a sparse matrix is necessary<br />

for creating a database with random file organization.<br />

Some statements in the matrix structure<br />

section are shown in the example below.<br />

MATRIX-STRUCTURE:<br />

TYPE = SPARSE, B<strong>AND</strong>ED, SYMMETRIC, LOWER-<br />

DIAGONAL, STATIC;<br />

B<strong>AND</strong>WIDTH = (250, 250);<br />

The file control specifies the file organization<br />

of a numerical database already residing<br />

on a secondary device or to be created, by listing<br />

the type of file, device medium, file unit etc.<br />

The file control statements depend on the device<br />

medi~m~ selected for processing as specified by the<br />

device medium keyword, CARD, TAPE, or DISK. If data<br />

is to be processed from card input stream, only<br />

the file-type, file-unit and device-medlum statements<br />

are required, but in addition to these three<br />

statements, both disk and tape files require record<br />

statements.<br />

The file-type statement identifies the source/<br />

target file and the file-unlt statement gives a<br />

set of <strong>FOR</strong>TRAN READ/WRITE unit numbers for processing<br />

the files in the database. The record statement<br />

lists the record properties llke record-slze,<br />

<strong>DATA</strong>-DESCRIPTION:<br />

MATRIX-STRUCTURE:<br />

TYPE = SPARSE, NONSYMMETRIC, STATIC;<br />

FILE-CONTROL:<br />

TYPE = TARGET;<br />

FILE-UNIT = 4;<br />

MEDIUM = DISK;<br />

RECORD: REC-KEY = integer;<br />

SIZE = 1024, FIXED, UNBLOCKED;<br />

<strong>DATA</strong>-<strong>MAPPING</strong>: (doubly-linked-list);<br />

ACCESS-PATH-ENCODING:<br />

ACCESS-PATH = LINKED-ENCODING:<br />

(LINKED-<strong>DATA</strong>-ORG);<br />

LINKED-<strong>DATA</strong>-ORG: (COL-HEAD-NODE,<br />

ROW-HEAD-NODE,<br />

<strong>DATA</strong>-ITEM-NODE);<br />

COL-HEAD-NODE: (PTR-ITEM,FIELD-<br />

LINKAGE);<br />

PTR-ITEM: integer, TYPE = COL PTR;<br />

FIELD-LINKAGE = FIRST COL NODE;<br />

ROW-HEAD-NODE: (PTR-ITEM, FIELD-<br />

LINKAGE) ;<br />

PTR-ITEM: integer, TYPE = ROW PTR;<br />

FIELD-LINKAGE = FIRST ROW NODE;<br />

<strong>DATA</strong>-ITEM-NODE: (KEY-FIELD, ROW-FIELD,<br />

COL-FIELD,<br />

<strong>DATA</strong>-FIELD, COL-PTR-<br />

FIELD, ROW-PTR-FIELD);<br />

KEY-FIELD: NODE-KEY = integer;<br />

ROW-FIELD: REF-ITEM = INDEX;<br />

INDEX: integer, TYPE = ROW<br />

INDEX;<br />

COL-FIELD: INDEX: integer, TYPE=<br />

COL INDEX;<br />

<strong>DATA</strong>-FIELD: ORDERING = NONE;<br />

<strong>DATA</strong>-ITEM = real, REAL-<br />

PRECISION;<br />

REAL-PRECISION = DOUBLE;<br />

COL-PTR-FIELD: PTR-ITEM, FIELD-<br />

LINKAGE;<br />

PTR-ITEM: integer, TYPE =<br />

COL PTR;<br />

FIELD-LINKAGE = NEXT COL NODE;<br />

ROW-PTR-FIELD: PTR-ITEM, FIELD-<br />

LINKAGE;<br />

PTR-ITEM: integer, TYPE =<br />

ROW PTR;<br />

FIELD-LINKAGE = NEXT ROW NODE<br />

ENCODED-FILE:<br />

FILE-NAME = NODFILE;<br />

ORGANIZATION = R<strong>AND</strong>OM;<br />

ENCODED-<strong>DATA</strong> = SET(LINKED-<strong>DATA</strong>-ORG);<br />

Figure 4.1.<br />

FIXED<br />

Sample SDDL & SDML statements of a<br />

target numerical database for a<br />

doubly-llnked-llst schema.<br />

IBLO KED<br />

VARIABL~and[UNBLOCKEDJ- In addition, the file<br />

control section may include any of the following<br />

optional statements: (I) a record-key statement<br />

to specify either integer or alphanumeric key<br />

for random file organization; (2) a block-size<br />

statement required for blocked records; and (3)<br />

a format statement (similar to <strong>FOR</strong>TRAN) for formatted<br />

records. Some of these statements are<br />

illustrated under FILE-CONTROL in figure 4.<br />

423


The input-format section provides facilities<br />

for processing unstructured database from cards.<br />

The section is comprised of the dimension, the data<br />

ordering and format statements respectively.<br />

The dimension statement, shown below, specifies<br />

the numbers of<br />

DIMENSION= SROW ~, integer,~COLUMN~, integer;<br />

COLUMN)<br />

[ROW<br />

rows and columns in the matrix. The data ordering<br />

statement specifies a rowwise/columnwlse/none ordering.<br />

The data-format statement:<br />

(SRARSE- YPE-q<br />

<strong>DATA</strong>-<strong>FOR</strong>MAT=~SPARSE-TYPE-21;<br />

(DENSE<br />

J<br />

gives users three choices of format specifications.<br />

Both SPARSE-TYPE-I and SPARSE-TYPE-2 are for sparse<br />

matrix input format specifications of only nonzero<br />

elements and the DENSE is for all the matrix elements.<br />

SPARSE-TYPE-i is for an ordered input data so<br />

that a row or column input data stream is processed<br />

at a time. As shown below,<br />

SPARSE-TYPE-i: CONTROL-<strong>DATA</strong> = ~ROW ~ data-type;<br />

ICOLU~NJ '<br />

<strong>FOR</strong>MAT = SET(data-type,<br />

data-type);<br />

it requires a control data to specify the row or<br />

column to be processed so that the format becomes<br />

a set of pairs of column/row and data item datatypes.<br />

A data-type is any valid <strong>FOR</strong>TRAN format<br />

specification for spacing, alphanumeric, integer<br />

or real variable e.g. 5X, 16, FIO.4 and E20.12.<br />

SPARSE-TYPE-2 is for an unordered input data<br />

so that the format is a set of row, column, and<br />

data item data-types as follows:<br />

SPARSE-TYPE-2 = SET([ROW], data-type,<br />

[COLUMN], data-type,<br />

data-type);<br />

Finally, DENSE = SET(data-type); provides for<br />

a set of regular <strong>FOR</strong>TRAN-type format specifications.<br />

An example of a SPARSE-TYPE-I input format is shown<br />

below.<br />

INPUT-<strong>FOR</strong>MAT:<br />

DIMENSION = ROW, 5000, COLUMN, 5000;<br />

ORDERING = ROWWISE;<br />

SPARSE-TYPE-l: CONTROL-<strong>DATA</strong> = ROW, 14;<br />

<strong>FOR</strong>MAT =5(14,2X,FI0.6) ;<br />

4.2. Stored-Data Mapping Language (SDML)<br />

The SDML has two functions: (i) to describe<br />

the different types of mapping which the<br />

system can make between a logical schema and a<br />

target storage space, and (2) to describe the encoding<br />

to storage structures. The major structure<br />

of the language is comprised of the access path encoding<br />

and the encoded file. The major emphasis<br />

of the language is on the access path encoding,<br />

which represents the most difficult part of the<br />

mapping description. The encoded file section enables<br />

the assignment of encoded data (data items<br />

and pseudo data) to the files in the database according<br />

to the corresponding definitions of filenames<br />

and file accessing methods.<br />

selection of an appropriate mapping subsection and<br />

relates its subsections to the mapping descriptions<br />

of the direct, indirect and linked schema<br />

encoding groups. Reference to mapping descriptions<br />

defined in one encoding group by another is<br />

a colmnon feature of the language, e.g. REF-ITEM<br />

definition of pseudo data in the indirect encoding<br />

subsection is referenced by the linked encoding<br />

subsection.<br />

The direct encoding, implied by the <strong>DATA</strong>-ORG:<br />

subsection, describes the data item with its properties<br />

llke data ordering and type. It also provides<br />

for an optional definition of dimension and<br />

bandwidth for a source database description. The<br />

indirect encoding provides a choice of mapping alternatives<br />

for encoding pseudo data and data item<br />

to separate encoded files by the mapping descriptions<br />

identified by MAP-ORG: and REF-ORG: (see<br />

Figure 4). In addition, an ordered combination<br />

of pseudo data and data items may be mapped to an<br />

encoded file by MIXED-ORG: mapping description as<br />

follows:<br />

MIXED-ORG: SET ~RDERED~(REF-ITEM, <strong>DATA</strong>-ORG)~.<br />

~(REF-ITEM, REF'ITEM,~r<br />

~ <strong>DATA</strong>-ORG) JJ<br />

The linked encoding enables the mapping of<br />

any set of nodes to an encoded file. Each node<br />

is identified by a user defined node-name and<br />

consists of a set of fields. Each field is described<br />

by an optional field-name and a field identifier<br />

which may be a node key, pseudo data, or<br />

data item. An example of linked encoding mapping<br />

is illustrated in Figure 4.1.<br />

The mapping description consists of definitions<br />

of both primitive and nonprimitive data<br />

structures. The representation of structures of<br />

primitive type is usually by an assignment statement,<br />

while that of nonprimltive is by a descriptive<br />

statement consisting of a set or group name,<br />

and a set or group definition [16]. We provide<br />

the following constructs in the language to specify<br />

data, ordering and linkage definitions:<br />

i. ordering definition types--rowwise, collumnwise<br />

and none;<br />

2. basic data types--integer, real, and alphanumeric;<br />

3. linkage definition types--header, first,<br />

next, prior, last, row, column, node,<br />

field, and null.<br />

A valid and meaningful linkage definition,<br />

except the NULL keyword, requires an ordered combination<br />

of the following: (I) a pointer linkage<br />

keyword, (2) row or column, and (3) node or field.<br />

The pointer linkage keywords are header, first,<br />

next, prior, and last. An example of a valid<br />

definition is FIRST ROW NODE.<br />

is:<br />

An example of a<br />

primitive type data structure<br />

integer<br />

<strong>DATA</strong>-ITEM = ~ real ~ ;<br />

L alpha 3<br />

The access path encoding section enables the<br />

424


An example of a nonprlmltive type data structure<br />

illustrating a SET definition is:<br />

<strong>DATA</strong>-ORG:<br />

[ROUSE<br />

SET(<strong>DATA</strong>-ITEM), ORDERING=~COLUMNWISE|;<br />

(.NONE .2<br />

A primitive type data structure which is semantlcally<br />

ambiguous, e.g. index and pointer, becomes<br />

a nonprlmltive structure by qualifying the<br />

basic data definition with a semantic phrase definition<br />

as follows:<br />

INDEX: ~integer~ , TYPE =[ROW INDEX<br />

Lalpha J ~COLUMN INDEX ~ ;<br />

]CONCAT(ROW INDEX,]<br />

£COLUM~ INDEX) J<br />

An access path is described by ORDERING and<br />

LINKAGE phrases. ORDERING describes the matrix<br />

data access path by row, column or none. It is assumed<br />

that the ORDERING of reference items, i.e.,<br />

indices and locations (within the matrix or from<br />

diagonal elements) corresponds to that of matrix<br />

data items. LINKAGE describes linked llst structure<br />

connectivity by a combination of linkage keywords<br />

as in the following example:<br />

PTR-ORG: SET(PTR-ITEM), LINKAGE=NEXT COLUMN FIELD;<br />

5. The Feaslbillty of SDDL and SDML in a Numerical<br />

Database System<br />

The current approach to numerical database<br />

management is restricted to a few matrix compact<br />

storage schemes. The most cmmnon compact storage<br />

scheme for processing sparse matrices residing on<br />

secondary devices is the double-lndexlng (rowcolumn)<br />

technique, but this is not the best technique<br />

for many applications. A few research<br />

groups, e.g., [9], have tried the linked llst<br />

technique for programs tailored to their applications;<br />

however, they are not always available for<br />

public distribution.<br />

Our investigation of the implementation of<br />

a generalized approach to numerical database management<br />

reveals two basic requirements. The<br />

first requirement is for the numerical database to<br />

reside on secondary storage using the storage<br />

scheme that is best fitted for its application.<br />

The second requirement is to provide tools for<br />

data access that will promote physical data independence<br />

through the implementation of a DML.<br />

It is obvious that the first requirement is<br />

a prerequisite to the second and that there are<br />

two options for its realization. The first option<br />

is for each user to be responsible for structuring<br />

his numerical database corresponding to the physical<br />

schema best suited to his application. This optlon<br />

is not practical because a user may not know<br />

how to structure his database to suit his objective.<br />

The second option is to have a generalized data<br />

translator that will automatically restructure any<br />

numerical database from one physical schema to another,<br />

or convert unstructured raw data not in a<br />

compact storage form, corresponding to a physical<br />

schema. It is essential for this option to be integrated<br />

into any effective generalized approach<br />

to numerical database management.<br />

Our first priority then is to develop a generalized<br />

data translator for numerical databases<br />

that will isolate the users from the underlying<br />

data management through stored-data description<br />

and mapping language facilities.<br />

5.1. A ~enerallzed Data Translator for Numerical<br />

Databases<br />

We are currently developing a generalized data<br />

translator for numerical databases as a first<br />

step towards developing a generalized numerical<br />

database management system. The generalized data<br />

translator is focused on the implementation of our<br />

nonprocedural Stored-Data Description and Mapping<br />

Languages (SDDL and SDML). Its function is to automatically<br />

create or restructure a numerical database<br />

from one schema to another in two consecutive<br />

processes of compilation and data translation<br />

(to be discussed later). Its input, supplied<br />

by the user, consists of the source and target<br />

SDDL and SDML statements (see Figure 4), and a<br />

source numerical database, Its output is the target<br />

numerical database. The overall functions are<br />

illustrated in Figure 5.<br />

During the compilation process, the user-supplied<br />

SDDL and SDML statements are converted by a<br />

lexical analyzer into a token stream which is<br />

translated by a Generalized Syntax Directed Translation<br />

Scheme (GSDTS) £nto <strong>FOR</strong>TRAN source programs<br />

of the reader, the restructurer, and the writer<br />

subroutines. After compilation by a <strong>FOR</strong>TRAN compiler,<br />

the subroutines become the major components<br />

of the translator subsystem. The translator subsystem<br />

also includes common data table information,<br />

shown in Figure 6, and utility functions and routines<br />

to compute mapping functions, e.g., synmnetrlc<br />

and band address locations, and to execute<br />

search and reordering algorithms.<br />

5.2. Data Translation Process<br />

The data translation process of the translator<br />

subsystem starts with the encoding of each record(s)<br />

of the source database into a translator<br />

internal form (TIF), followed by the decoding of<br />

TIF data to encoded record(s), and ending with the<br />

writing of record(s) on the storage devices. The<br />

components of the TIF are (I) the row/column identifier,<br />

(2) the index buffer for column/row index,<br />

and (3) the data item buffer for row/column data<br />

item. The translation process is controlled by<br />

the translation supervisor which activates the<br />

reader to encode the source database record(s) to<br />

TIF data, followed by the restructurer to decode<br />

the TIF data to encoded record(s), and then the<br />

writer to convert the encoded record(s) to physical<br />

record(s) and to wrlte it on the storage device,<br />

Each subroutine returns control to the supervisor,<br />

which activates the next subroutine accordingly,<br />

and the process is repeated until all<br />

the records of the source database have been processed.<br />

Figure 6.1 illustrates a data translation<br />

process of double-lndex-2 source database to doubly-llnked-llst<br />

target database.<br />

5.2.1. Reader Module<br />

The reader encodes both the unstructured matrix<br />

data, i.e., raw data not in any compact stor-<br />

425


age form, and the numerical database. In both cases,<br />

the information in the source file control<br />

table and either the input format or the physical<br />

schema table (see Figure 6) is used by the reader<br />

to read source data from cards or secondary devices<br />

and encode it into the translator internal<br />

form (TIF) data. The source data is processed by<br />

row/column according to the input format or physical<br />

schema specification. In order to produce the<br />

TIF data, each encode step ~f the translation iteration<br />

does the following: (i) fills in the appropriate<br />

row/column identifier, and (2) fills in<br />

the corresponding index and data buffers for that<br />

row/column (see Step la of Figure 6.1). For example,<br />

with row identifier equals I, we have I and<br />

4 in column index buffer, as well as I and 2 in<br />

data item buffer. On completion, control is re-<br />

turned to the supervisor for the next step of<br />

translation iteration, i.e., the decode step by<br />

the restructurer.<br />

5.2.2. Restrueturer Module<br />

If the source ordering is different from the<br />

target ordering, the TIF data of the entire database<br />

is temporarily stored in a workfile(s) to be<br />

reordered before it is decoded; otherwise, the TIF<br />

data is decoded into encoded data corresponding to<br />

the target schema as received. Each decode step<br />

of the translation iteration from the TIF data to<br />

a direct encoding group, dlslcards the index buffer,<br />

and reorganizes the data items to the appropriate<br />

encoded data. For the indirect encoding<br />

group, both the data items and the index which is<br />

I<br />

Source<br />

I<br />

SDDL & SDML<br />

Statements<br />

Lexical<br />

Analyzer<br />

Target<br />

|SDDL & SDML<br />

Statements<br />

I<br />

Lexical<br />

Analyzer<br />

I<br />

COMPILATION<br />

Token<br />

i<br />

Target<br />

Token<br />

GSDTS for SDDL and SDML<br />

/<br />

<strong>FOR</strong>TRAN<br />

Conversion<br />

Programs<br />

/<br />

\<br />

r-<br />

<strong>FOR</strong>TRAN Compiler<br />

i<br />

TRANSLATION<br />

C<br />

NSu.t~rr~.ceall, ,%<br />

Database j<br />

TRANSLATOR<br />

Subsystem<br />

.( Target<br />

,~Numerical<br />

~.D_atabase<br />

><br />

L<br />

Internal<br />

Form Data<br />

Figure 5.<br />

Usage and functions of the generalized data translator.<br />

426


SOURCE<br />

TARGET<br />

Control<br />

File I<br />

Table<br />

\<br />

\<br />

I Input<br />

Format<br />

I<br />

I<br />

\ I i<br />

f<br />

f<br />

f<br />

Physical<br />

Schema<br />

Table<br />

Physical<br />

Schema<br />

Table<br />

/<br />

/<br />

/<br />

/<br />

File<br />

Control<br />

Table<br />

I<br />

I<br />

I<br />

RESTRUCTURER<br />

WRITER<br />

1<br />

TRANSLATOR<br />

SUBSYSTEM<br />

< Tran81ator<br />

I,<br />

Target<br />

1<br />

Numerical<br />

Source 1<br />

Internal<br />

Numerical<br />

Database<br />

Database<br />

* Either Input Format--unstructured (raw) source matrix data.<br />

Or<br />

Physical Schema Table--source database in compact storage form.<br />

data descriptions<br />

data flow<br />

> processing sequence<br />

Figure 6.<br />

Major components of the translator subsystem.<br />

converted to the appropriate pseudo data, become<br />

the encoded data. However, the linked encoding<br />

group requires the supervisor to create null head<br />

nodes during initialization. Data item nodes with<br />

any appropriate pointers are created to form the<br />

encoded data at each decode step. For example, in<br />

Step Ib of Figure 6.1, two data item nodes for the<br />

first row are created to correspond to the TIF data<br />

in Step la. In addition, "i" in the row and column<br />

head nodes represents the pointer to the first data<br />

item node, and "2" in the column head and the first<br />

data item nodes respectively represents the column<br />

pointer to the second data item node. At the end<br />

of this step, control is returned to the supervisor<br />

for the last phase of the translation iteration<br />

i.e. writing the encoded data on the secondary devices<br />

by the writer.<br />

5.2.3. Writer Module<br />

The writer uses the information in the target<br />

file control table to open the file(s) of the target<br />

database during initialization and closes them<br />

after the entire database has been processed. It<br />

performs the last phase of each translation iteration<br />

by converting the encoded data into physical<br />

record(s) to be written on the secondary devices<br />

according to the user-deflned target file access<br />

method. For example, with regard to the encoded<br />

data of Step Ib in Figure 6.1, the head node records<br />

are updated records which are rewritten in<br />

place, and the data item node record is written<br />

as a new record on secondary device. On completion,<br />

control is returned to the supervisor for<br />

another translation iteration to begin with the<br />

reader.<br />

5.3. Compilation Process<br />

The compilation process is the sequence of<br />

operations necessary to automatically produce the<br />

reader, the restructurer, and the writer subrou-<br />

tine programs from the SDDL and SDML statements<br />

supplied by the user. Our investigation of automatic<br />

data conversion techniques [2,13,17,18] reveals<br />

tha= compiler-compiler techniques are generally<br />

used. In order to be able to perform a<br />

broad, useful and syntactically valid class of<br />

427


Source database of figure 2<br />

Step O<br />

1002<br />

0030<br />

0400<br />

5012<br />

Logical<br />

S ch ema<br />

II 3 4 5 0 0 0 0 1<br />

1 1 4 3 2 1 3 4 0 1<br />

II 2 3 4 5 1 2 0]<br />

Source record size = 4;<br />

Source file org. = sequential for all files.<br />

Row beginning file<br />

Column index file<br />

Data item file<br />

Target database of figure 3<br />

(Partial data description)<br />

Target record size = 14;<br />

No of row = 4;<br />

Target file org. = random;<br />

Translation Start<br />

Initialization Operation<br />

Create null head node records<br />

Buffer size = 4;<br />

No of column = 4;<br />

Record key = integer;<br />

Row-head node rec. [I [0 0 0 0 0 ..... 0 [<br />

Step la<br />

Col-head node rec. 12 I0 0 0 0 0 .....<br />

rec key<br />

ist Translation Iteration<br />

Source data to TIF (translator internal form) data<br />

Row identifier = I;<br />

01<br />

Step Ib<br />

Index buffer = I I 4 0 0 ~ Data buffer<br />

TIF data to Encoded Data<br />

Row-head node rec. ~I ~ i 0 0 0 0 .....<br />

Col-head node rec. 12 | I 0 0 2 0 .....<br />

Data-ltem node rec. [3l 1 l1 1 1 0 212ll 4 2 o<br />

Figure 6.1<br />

r~c ~ode n~de<br />

key key key<br />

An illustration of a data translation process.<br />

ii 2 0 0~<br />

0#<br />

0J<br />

01<br />

translations, we decided that a generalized syntaxdirected<br />

translation scheme (GSDTS) is the best model<br />

for our application. Because <strong>FOR</strong>TRAN is the<br />

progran~ning language of the majority of numerical<br />

application users, we decided to write the translation<br />

software in portable <strong>FOR</strong>TRAN so that it can<br />

be of general distribution with little or no modification<br />

of the source programs from one computer<br />

system to another.<br />

A GSDTS requires an underlying LR(k) contextfree<br />

grammar. Therefore, we had to construct LR(k)<br />

gralmaars for our SDDL and SDML, and in order to<br />

minimize the compilation time, we have constructed<br />

SLR(1) grammars for the SDDL and SDML such that the<br />

terminal symbols are single digits or letters except<br />

the user-deflned variables and constants.<br />

The grammars and the LR(1) automatic parser generator<br />

which is used to validate them as part of the<br />

system initialization process are discussed below.<br />

A token stream of single digits or letters<br />

for keywords, and user-defined variables and constants<br />

is the output from the conversion of the<br />

SDDL and SDML statements by the lexical analyzer<br />

Eli. For example, "TYPE = SOURCE"; is converted<br />

to "I", "TYPE = TARGET"; becomes "2", "FILE-NAME<br />

= SAMPLE"; becomes "SAMPLE." The token stream is<br />

the input to the GSDTS which produces the source<br />

<strong>FOR</strong>TRAN subroutine programs to be compiled by the<br />

<strong>FOR</strong>TRAN compiler into object decks as the final<br />

output of the compilation process.<br />

An illustration of the compilation process is<br />

shown in figure 6.2. The SDDL statements of figure<br />

4 are input to the lexical analyzer. The<br />

statements are processed by the lexieal analyzer<br />

to produce an output token stream, which becomes<br />

an input to the GSDTS. The token stream is processed<br />

by the GSDTS in a concurrent operation of<br />

LR(1) parsing and semantic analysis. If no error<br />

is encountered during parsing and on successful<br />

428


eduction to the final state, the Semantic Analyzer<br />

outputs the generated <strong>FOR</strong>TRAN statements.<br />

We will llke to mention that all data declarations<br />

are made in the Translator Subsystem so<br />

that the routines would have access to the common<br />

variables, even if there is an overlay operation.<br />

This explains why only the Translator Subsystem<br />

declarative statements are generated in figure 6.2~<br />

because the Reader routine <strong>FOR</strong>TRAN statements of<br />

a structured database are generated by processing<br />

the SDML statements. On the other hand, since an<br />

unstructured source database has no SDML statements,<br />

so in this case the Reader routine <strong>FOR</strong>TRAN<br />

statements are generated along with the Translator<br />

Subsystem declarative statements by processing the<br />

SDDL statements.<br />

Input statemen t<br />

Conversion of SDDL statements<br />

5.3.1. SLR(I) Grammars for SDDL and SDML<br />

We have constructed one SLR(1) grammar for<br />

the SDDL such that terminal symbols for keywords<br />

are generally numerical codes with single letters<br />

wherever it is necessary to provide one unique<br />

lookahead symbol for consistency resolution. In<br />

order to maintain a modular programming approach<br />

and provide for execution time storage overlay<br />

should the need arise, we constructed two SLR(1)<br />

grammars for the SDML, which are one for the Direct<br />

and Indirect Encoding Sections, and another<br />

for the Linked Encoding Section with the Encoded<br />

File Section included in each grammar. The two<br />

SLR(1) grammars are similar to that of SDDL.<br />

The nontermlnals of the grammars are in selfexplicit<br />

BNF, e.g., , ,<br />

of figure 4 to Tokens<br />

Token<br />

<strong>DATA</strong>-DESCRIPTION:<br />

MATRIX-STRUCTURE:<br />

TYPE = SPARSE, NONSYMMETRIC, STATIC;<br />

FILE-CONTROL:<br />

TYPE = SOURCE;<br />

FILE-UNIT = 21, 22, 23;<br />

Token Stream -<br />

21N 22N 23N<br />

GSDTS Output -<br />

MEDIUM = DISK;<br />

RECORD: REC-KEY = integer;<br />

SIZE = 512,<br />

FIXED,<br />

UNBLOCKED;<br />

Output from Lexical Analyzer,<br />

Input to GSDTS.<br />

3 I 512 I 2<br />

<strong>FOR</strong>TRAN Declarative Statements for the Translator Sybsystem<br />

INTEGER ROWID, COLID, BUFSZE, SDATOG, UPRCOD<br />

INTEGER RCOSTA, RECSZE, FLEUNT<br />

INTEGER DIAGID, DENSTY, FLENAM, BLKSZE<br />

DIMENSION INDROW(500), INDCOL(500), <strong>DATA</strong>(500),<br />

I<br />

INDEX(500),FLEUNT(3)<br />

DIMENSION DATBUF(500), INDUF(500)<br />

DIMENSION FLEUNT(3), FLEID(3), FLENAM(42)<br />

COMMON/GLOBAL/NOROW, NOCOL, ROWID, COLID, LWRCOD,<br />

i BUFSZE, IERROR, SDATOG, UPRCOD, DATBUF, INDBUF<br />

COMMON/ENCCOM/RCOSTA, INDPTR, KONTRL, RECSZE,<br />

I <strong>DATA</strong>, INDROW, INDCOL, FLEUNT<br />

<strong>DATA</strong> BUFSZE/500/<br />

<strong>DATA</strong> FLEUNT(1), FLEUNT(2), FLEUNT(3) / 21,22,23/<br />

<strong>DATA</strong> RECSZE,BLKSZE,RECKEY /512,0,1/<br />

S<br />

21N<br />

22N<br />

23N<br />

3<br />

I<br />

512<br />

1<br />

2<br />

TRS20020<br />

TRS20040<br />

TRS20080<br />

TRS20100<br />

TRS20120<br />

TRS20140<br />

TRS20150<br />

TRS20160<br />

TRS20170<br />

TRS20210<br />

TRS20220<br />

TRS20310<br />

Figure 6.2<br />

An illustration of the Compilation Process<br />

429


and. One advantage<br />

of the modular SLR(1) gran~aar approach is that new<br />

features, llke additional pointer linkage definitions,<br />

could be added to the language with easy<br />

modification of the corresponding grammar. All<br />

the grammars have been proved to be SLR(1) by the<br />

LR(1) automatic parser generator.<br />

5.3.2. LR(1) Automatic Parser Generator<br />

The LR(1) automatic parser generator, developed<br />

by Wetherell and Shannon in [19], is written<br />

entirely in portable ANSI Standard <strong>FOR</strong>TRAN 66 and<br />

it has been successfully operating on a number of<br />

computers. It generates a space efficient parser<br />

for any LR(1) grammar. It reads a context-free<br />

grammar in a modified BNF format and produces tables<br />

which describe an LR(1) parsing automaton. It<br />

has been used to validate our SDDL and SDML grammars<br />

and to produce the corresponding tables for<br />

describing their LR(1) parsing automata. The tables<br />

consist of dimension and data statements to be<br />

embedded into the LR(1) parser subroutines to be<br />

described later. The procedure is performed once<br />

as part of our system initialization operation for<br />

the development of the GSDTS--for the SDDL and the<br />

SDML to be discussed below.<br />

5.3.3. GSDTS--for the SDDL and the SDML<br />

Generalized syntax-dlrected translation<br />

schemes (GSDTS) are well defined in literature and<br />

we chose to implement a bottom-up execution of<br />

GSDTS [i]. The major components of the GSDTS--<br />

for the SDDL and the SDML are, as illustrated in<br />

Figure 7, the following: (I) LR(1) parser, (2)<br />

LR(1) tables, (3) Semantic Analyzer, and (4) SDDL<br />

and SDML Semantic Tables. Its input is the SDDL<br />

and SDML token stream generated by the lexleal<br />

analyzer and assigned token values from LR(1) tables<br />

by the LR(1) parser's internal scanner. The<br />

outputs produced by the GSDTS are the reader, the<br />

restructurer and the writer <strong>FOR</strong>TRAN source subroutines<br />

produced from the tokens of the source<br />

SDDL and SDML, the target 3DML, and the target<br />

SDDL respectively.<br />

The LR(1) parser is a set of subroutines<br />

which interpret the LR(1) tables to construct a<br />

parse of the SDDL and SDML token stream. Some of<br />

SDDL & /<br />

SDML Token<br />

LR(1)<br />

Parser<br />

1<br />

Tables<br />

ii s° 1<br />

Semantic [<br />

Analyzer<br />

Rules<br />

[<br />

GSDTS<br />

<strong>FOR</strong>TRAN /<br />

Conversion<br />

Program<br />

Figure 7.<br />

GSDTS for SDDL and SDML.<br />

430


subroutines were part of the software developed<br />

by Wetherell and Shannon in [19], but they have<br />

been modified and tested to suit our application.<br />

We have developed three LR(1) parsers for the<br />

SDDL, the direct and indirect encodings, and the<br />

linked encoding SLR(1) granmaars respectively.<br />

The Semantic Analyzer consists of two major<br />

routines which perform the semantic analysis and<br />

the output production. The SDDL and SDML Semantic<br />

Tables contain the semantic rules corresponding<br />

to the SLR(1) grammar production rules. However,<br />

we are currently restricting our implementation to<br />

a few physical schemas which are representative of<br />

the three encoding groups, Therefore, the current<br />

semantic tables contain semantic rules corresponding<br />

to only those physical schemas, with null<br />

rules for the others so that they could be easily<br />

extended after the completion of the current development<br />

process.<br />

6. Future Directions and Developments<br />

In this paper, we have provided a model of a<br />

generalized approach for describing and mapping<br />

any numerical database to secondary storage by nonprocedural<br />

Stored-Data Description and Mapping<br />

Languages (SDDL and SDML). We have also shown how<br />

the DMBS concepts llke schema and data language<br />

facilities are also applicable to databases necessary<br />

to process numerical applications, which are<br />

residing on secondary devices. In addition, we<br />

have also discussed the feasibility of our model<br />

as a valuable tool in numerical database management<br />

as described in the current implementation<br />

of our generalized data translator for numerical<br />

databases.<br />

An area for the extension of thls research<br />

is in the implementation of a data manipulation<br />

language (DML). As previously mentioned, we have<br />

already designed a DML which consists of certain<br />

primitive statements that correspond to the operations<br />

permitted on the numerical database and embedded<br />

into <strong>FOR</strong>TRAN. The file control and the<br />

physical schema tables, and some of the conversion<br />

utility subroutines of our model would be of use<br />

in the implementation of the DML at a later date.<br />

Another area of research is in the performance<br />

evaluation of the numerical physical schemas<br />

with regards to specific applications or numerical<br />

operations. MacVelgh has reported in [i01, the<br />

effect of data representation on the cost of<br />

sparse matrix operations in primary storage. It<br />

is desirable to extend this work to secondary storage<br />

and to develop a performance evaluation model<br />

for matching numerical database of an applicatlon<br />

to the best-fit physical schema on secondary storage.<br />

Finally, we would like to identify some physical<br />

schemas of our model that have currently<br />

proved to be of practical applications in numerical<br />

database management. The threaded-llnked-list<br />

structure has been successfully implemented in the<br />

WARDEN system in use at the University of Warwick<br />

[9] for Computer-Aided Design. Besides, secondary<br />

storage implementations that are similar to our<br />

direct encoding group, are identified in EASY--<br />

an Engineering Analysis System of Utility Programs<br />

[8], while a row-column schema is used in Vectorized<br />

General Sparslty Algorithms with Backing<br />

Store [3]. Since the need for secondary storage<br />

backup is relative to the size of the primary<br />

storage, our model will be of great advantage in<br />

institutions with small or medium size computing<br />

facilities.<br />

REFERENCES<br />

I. Aho, A.V. & Ullman, J.D. "The Theory of Parsing,<br />

Translation and Computing, Volume II:<br />

Compiling," Prentlce-Hall, Inc., Englewood<br />

Cliffs, N.J., 1973.<br />

2. Bach, M.J., et al. "The ADAPT System: A Generalized<br />

Approach Towards Data Conversion,"<br />

Proc. 5th Int. Conf. Very Large Data Bases,<br />

ACM, N.Y. Oct. 1979, pp. 183-193.<br />

3. Calahan, D.A., et al. "Vectorlzed General<br />

Sparslty Algorithms with Backing Store," Systems<br />

Eng. Lab., University of Michlgan, Ann<br />

Arbor, SEL Report #96, Jan. 15, 1977.<br />

4. CODASYL Data Base Task Group Report, Conf.<br />

Data System Languages, April 1971, ACM, New<br />

York,<br />

5. CODASYL Data Description Language Journal of<br />

Development, June 1973 Report.<br />

6. Duff, I.S., "A Survey of Sparse Matrix Research,"<br />

Proc. of the IEEE, Vol. 65, No. 4,<br />

April 1977, pp. 500-535.<br />

7. Fry, J.P., et al. "Stored-Data Description<br />

and Data Translation: A Model and Language,"<br />

Information Systems, Vol. 2(3), 1977, pp.<br />

95-147.<br />

8. Jensen, paul S., "An Engineering Analysis System,"<br />

Proc. ACM 1978 Annual Conference, Washington,<br />

D.C., Vol. I of 2, Dec. 4-5-6, 1978,<br />

pp. 490-495.<br />

9. Larcombe, M.H.E., "A List Processing Approach<br />

to the Solution of Large Sparse Sets of Matrix<br />

Equations and the Factorlzation of the<br />

Overall Matrix," Proc. Oxford Con f. on "Large<br />

Sparse Sets of Linear Equations," J,K. Reid ,<br />

Editor, April 1970, Academic Press, New York,<br />

1971, pp. 25-40.<br />

I0. MacVelgh, Donald T., "Effect of Data Representation<br />

on Cost of Sparse Matrix Operations,"<br />

Acta Informatlca , Vol. 7, 1977,<br />

pp. 361-394.<br />

ii. Maurer, Herman H., "Data Structures and Progranm~Ing<br />

Techniques," Translated by Camille C~<br />

Price, Prentice-Hall, Inc., Englewood cliffs,<br />

N.J., 1977.<br />

12. Pooch, U.W. and Nieder, A., "A Survey of Indexing<br />

Techniques for Sparse Matrices," ACM<br />

Computing Surveys, pp. 109-133, Vol. 5. No. 2,<br />

June 1973.<br />

13. Ramlrez, J., "Automatic Generation of Data<br />

Converslon-Programs Using a Data Description<br />

431


Language (DDL)," Ph.D. Dissertation, University<br />

of Pennsylvania, 1973.<br />

14.<br />

15.<br />

16.<br />

17.<br />

18.<br />

19.<br />

Rosenberg, A.L. and Stockmeyer, .L., "Storage<br />

Schemes for Boundedly Extendible Arrays,"<br />

Acta Informatlca, 7, 1977, pp. 289-303.<br />

Scheuermann, Peter, "On the Design and Evaluation<br />

of Data Bases," IEEE Computer, Feb. 1978,<br />

pp. 46-54.<br />

Scheuermann, Peter, "Concepts of a Data Base<br />

Simulation Language", Proc. ACM SIGMOD Int'l.<br />

Conf. on Management of Data, 1977, pp. 144-156.<br />

Shu, N.C. et al., "EXPRESS: A Data EXtraction,<br />

Processing and REStructuring System," ACM<br />

Trans. Database Systems, Vol. 2, No. 2,<br />

June 1977, pp. 134-174.<br />

Taylor, Robert W., "Generalized Data Base<br />

Management System Data Structures and their<br />

Mapping to Physical Storage," Ph.D. dissertation,<br />

Univ. of Michigan, 1971.<br />

Wetherell, Ca. and Shannon, A., "LR Automatic<br />

Parser Generator and LR(1) Parser," Lawrence<br />

Livermore Lab., University of California,<br />

P.O. Box 808, Livermore, CA 94550, June 14,<br />

1979.<br />

432

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!