XSP TECHNOLOGY
Set
Processing
Data
Accessing
Technology
|
A Systems Integration Technology
Possibly the greatest deficiency in
software systems development technology
is the lack of any mathematical guidance
for integrating data across system components.
INTRODUCTION
New more demanding software applications
and new more powerful hardware platforms
are stressing the capabilities of traditional
system data access strategies.
Business demands for integrated hardware-software products
delivering far greater performance,
requiring dramatically shorter installation
and data load times,
all with bankable system reliability
can not be accommodated by traditional system development models.
These demands are further exacerbated by the next-generation
enterprise expectations for
business analytics,
cloud computing,
fault tolerance,
rapid access to staggeringly large databases,
and a soaring volume of queries.
Access Operations Not Access Paths
Traditional system architectures
are constrained by
an element-at-a-time,
von Neumann bottleneck,
data access strategy.
This access strategy is not ideal for supporting diverse applications
simultaneously accessing large quantities of distributed data,
which are best served by a
collection-at-a-time data access strategy.
Nor can traditional
element-at-a-time
data access strategies support
the next-generation requirements
for reliable, high performance
data access,
but
collection-at-a-time
set processing data access strategies can.
-
Traditional Data Access Strategies
-
Very Poor I/O Performance - RECORD ACCESS.
-
Endangered Data Integrity - DESTRUCTIVE UPDATES.
-
Structured-Data Access Paths - PHYSICAL DEPENDENCE.
-
Set Processing Data Access Strategies
-
Highly Optimized I/O Performance - SET ACCESS.
-
Inviolate Data Integrity - CONSTRUCTIVE UPDATES.
-
Relevant-Data Access Operations - LOGICAL DEPENDENCE.
Traditional architectures use structures
to find relevant data.
Set processing architectures use operations
to extract relevant data.
The primary advantage of set processing data access operations
over traditional data access
structures
is I/O performance.
When accessing data
100 to 1,000 times faster
provides a time critical response
of value,
set processing should be a consideration.
A side benefit
of set processing mathematical foundations
is
protection of data integrity
and flexibility in restructuring data
for different application needs.
I/O Performance Potential
The I/O performance constraints,
imposed by the von Neumann style data access strategy,
have forced developers
to avoid I/O at all costs.
Even when just switching from using 16KB I/O buffers,
having a sustained DTR of 7.6MB/s, to using 32MB I/O buffers,
having a sustained DTR of 305MB/s, could improve I/O throughput
by a factor of 40.
Consider a record access oriented I/O strategy
relying on secondary data index structures for locating records.
Typically, half of an I/O buffer needs to be reserved for overflow,
updating and other system support requirements,
forcing twice the number of I/O transfers.
Of the data records in the other half, only 50% are likely to contain
records of application interest, and only about 10% of those records
will be relevant data,
forcing 20 times the number of required I/O transfers.
Thus a typical record access oriented I/O buffer may
contain only 2.5% relevant data,
requiring 40 I/O transfers to access a full I/O buffer of relevant data.
If a 32MB I/O buffer containing 100% relevant data
were to replace a 16KB I/O buffer with 2.5% relevant data,
the performance improvement would be a factor of 1600,
a 3.2 order of magnitude.
Experience using set processing oriented I/O strategies
on commercial applications
has shown that nearly 100% of an I/O buffer contains relevant data.
By providing multiple parallel access paths (24 to 96),
experiments have shown I/O throughput can be improved
to the point that system processing now becomes the performance bottleneck.
Future Systems
To tap the performance potential latent in existing
and future hardware platforms,
pre-structured data access paths (mechanical data model)
should be replaced with adaptive data access operations (mathematical data model).
This mathematical data modeling
replacement for the traditional mechanical data modeling
can allow developers of future systems to approach the performance
potential of any given hardware platform
for any given mix of applications.
SYSTEM ARCHITECTURES
It is generally assumed that the performance constraints
imposed by the von Neumann I/O bottleneck can only be
eliminated by a redesign and replacement of existing hardware.
However, a case can be made that the von Neumann bottleneck,
as it manifests itself in today's architectures,
is not a hardware design issue,
but rather a hardware usage issue.
Consider the basic components of a system architecture
and how these components
interact in the exchange of data.
An application processing component declares what data
is needed to be extracted from some
collection of available data.
The performance challenge is in evoking a data access
strategy, DAS,
that best fulfills the application request.
Ideally, such a DAS would provide informationally
dense I/O data transfers providing the application with just that data
needed and pre-structured for best application processing.
Since structurings imposed on
data for best execution are seldom best structurings for
the preservation and rapid access of repository data,
any DAS that stores data in a form required by an application
has to present a performance compromise.
Without the ability to independently separate and manage
application
processing structures from
storage preservation and access structures
the DAS will remain the
performance bottleneck of system architectures.
System Data Models
Though there are three distinct and independent
system data management components,
the pervading system data
models lump all three
under a single representation database model, such as the RDM.
Not being able to distinguish data access modelings,
from storage organization modelings,
from application processing modelings
ensures data representation dependence between all three,
thus severely limiting the DAS performance options.
Only by modeling the three system components
independently can truly data independent high-performance systems be built.
-
Data Processing Models:
Application programmers need to manipulate data
as conveniently, efficiently, and as reliably as possible.
Requisite data needs to be delivered in a form
best suiting a specific application's processing need.
Application data processing models are abundant and mature.
-
Data Accessing Models:
In principle, data accessing is quite simple,
just return (to a requesting application)
that relevant portion of available data
in a form best suited for application processing.
To model this data reduction and transformation process
requires an ability to distinguish
data content
from
data structuring.
Since such a modeling capability has not yet been adopted by
developers, the only available
models are those supporting one-to-one
archiving and retrieval of application formated data.
-
Data Storage:
There is no particular requirement
for the representation and organization
of stored data, so long
as it is persistent, can be easily located,
and efficiently trundled off to a new location.
Record Accessing Architectures
Traditional record-accessing architectures
rely on a single data model to
dictate how data is represented and organized for
application processing, data accessing, and data storage.
There is no discernible modeling recognition of the need
to preserve data independence between basic system components.
Without such a recognition it is very difficult to
develop a DAS that best services the needs of both the
application and the storage components.
Set-Processing Architectures
Set-processing architectures are distinguished from record-accessing
architectures only by the mathematical nature of the DAS model.
Processing data and storage data can be
represented and organized in any way desired by developers.
The only caveat is that mathematical identity of these
data representations be known to the DAS model.
Since the prevailing application model is the RDM and since
record arrays have a well-defined mathematical identity under XST,
current application programs and existing data storage repositories
could be reunited for better performance under an XSP-DAS.
Set-Store Data Access
Though record arrays are readily compatible with an XSP-DAS,
they are less than ideal for supporting informationally
dense I/O data transfers.
Even though column-stores can provide dramatic performance
improvements
over row-store architectures,
they can be dramatically out performed by set-store architectures.
When a set-store architecture implementation was
pitted against both an IBM and an Oracle DBMS in a
Rapid Information Access
comparison, the set-store architecture performed quite favorably.
SSDAM: Set-Store Data Access Model
RDBMSs are a proven
reality that mathematical modeling of data has practical applications.
However, the mathematical modeling employed is only intended
to model abstract relationships as perceived by users and not, in any way,
to acknowledge the underlying system data access mechanisms.
But without a companion data access model
developers have had to rely on array storage models
based on familiar row-column record access technologies.
Thus every application is currently wedded to a specific
store and fetch data access mechanism.
To support future robust cloud computing
the use of data fetching networks
needs to be replaced with information accessing networks.
This can only happen if individual applications are divorced from
their dependence on a specifically tailored data access engine and share
a universal engine that provides just the data required,
in exactly the
form required, and within the time required.
Of course, no such data access engine currently exists,
but if one ever does,
a Set-Store Data Access Model would provide a good
architectural foundation.
CURRENT RESEARCH
The formal modeling of system architectures involves two
synergistic activities.
First, the modeling tool, XST foundations, needs to be continually
explored for discovery of mathematical truths that can be
translated into operations and algorithms for more
productive access and processing in system architectures.
Second, practical application of XSP operations and access strategies
requires development of
proof of concept
implementations.
Lambda Calculus & Category Theory
Though the Lambda calculus and Category theory have been
the academic's faithful consort for many years,
productive results have not greatly influenced the
implementation of high-performance DAS implementations.
They both have fundamental deficiencies precluding either of them from
supporting a set-theoretic modeling of the functionality and projected
performance of system architectures.
By showing that the conditions and rules for defining
categories can be defined under XST allows
category derived precepts to carry over into
set-theoretic system modelings.
By defining functions as the behavior
of sets instead of as binary-set objects, allows
f(x) for "x" a value and for "x" a behavior.
This allows f(f) to be defined set-theoretically.
Lambda calculus also requires the use of Currying to support
f(a,b,..z), but with a
Skolem satisfactory definition of
n-tuples
it can be directly defined in XST.
XSP Data Access Engines
Current XSP software recognizes all physical data
structures as having an XST identity. Thus XSP defined
operations can access already existing databases.
An experimental COTS hardware platform having 6 parallel I/O ports,
10TB of storage,
and costing under $20,000
is projected to load the
TPC-H
1000GB
database in under 15 minutes and execute the 22 query
suite in about 1.5 minutes.
The XSP-DAS operations used form a small system I/O
kernel that can be implemented for any popular
operating system and COTS platform.
It is hoped that these operations find enough interest in the
computer community that they eventually become
integral to future system architectures.
REFERENCES
MICRO & STDS
MICRO was the first DBMS to use set-theoretic operations to create and manage stored data. Set-Theoretic Data Structure, STDS, storage management software was provided by the STIS corporation. The software provided a mathematical link between applications and stored data. The ability to adaptively reorganized storage data to fit application demands provided very responsive data access. MICRO used a natural language interface for non-programmers to access data. The system supported timesharing production use from 1972 through 1998.
XSP TECHNOLOGY Theory & Practice
Formal Modeling & Practical Implementation of XML & RDM Systems:
Every technology must have a sound underlying theory to support the consistency and predictability
of the methods promoted by the technology. In this context, the term theory is respected as an
articulation of a body of rules governing the relationships and behavior of objects in a specific
system of interest. - D L Childs
DATA REPRESENTATIONS AS MATHEMATICAL OBJECTS
Considering Content Compatibility of Relational & XML Data Representations:
The theme of this paper is to treat all data representations
as mathematical objects instead of as physical structures. - D L Childs
VLDB 1977 (Invited paper, abstract)
Extended Set Theory: A General Model For Very Large, Distributed, Backend Information Systems:
As databases become very large and as distributed systems become desirable the need for inherent (not
superficial) data independence becomes crucial. This paper is intended as a tutorial and will describe
conditions for data independence and summaries the concepts of Extended Set Theory as a general
model for expressing information systems embodying data independence. This generality will be
demonstrated by considering some major problems pertinent to the design and support of very large,
distributed, backend information systems. - D L Childs
SET PROCESSING AT THE I/O LEVEL
A Performance Alternative to Traditional Index Structures:
It is generally believed that index structures are essential for high-performance information
access. This belief is false. For, though indexing is a venerable, valuable, and mathematically
sound identification mechanism, its logical potential for identifying unique data items is
restricted by structure-dependent implementations that are extremely inefficient, costly,
functionally restrictive, information destructive, resource demanding, and, most importantly, that
preclude data independence. A low-level logical data access alternative to physical indexed data
access is set processing. System I/O level set processing minimizes the overall I/O workload
by more efficiently locating relevant data to be transferred, and by greatly increasing the
information transfer efficiency over that of traditional indexed record access strategies. Instead
of accessing records through imposed locations, the set processing alternative accesses records
by their intrinsic mathematical identity. By optimizing I/O traffic with informationally dense
data transfers, using no physical indexes of any kind, low-level set processing has demonstrated
a substantial, scalable performance improvement over location-dependent index structures.
- D L Childs
Introduction To A MATHEMATICAL FOUNDATION FOR SYSTEMS DEVELOPMENT
A Hypermodel Syntax for Precision Modeling of Arbitrarily Complex Systems:
This paper focuses on resolving three specific system development issues. The approach is to introduce the
concept of a Function Space Architecture as a new methodology to system design. The basic
architectural unit of this new methodology is a Function Space which can provide as much or as
little detail as a specific instance requires. Coverage will include: the Function Space aa a unit OF
architecture For general communication and design detail; Structure Independent Architectures
as an archi&tural design guide far reliable and productive systems; the Hypermodel to provide
the Function Space continuum with explosive resolution; and Extended Set Notation to provide
generality and rigor to the concept of a Hypermodel.
NATO-ASI Series, Vol. F24, 1986 , - D L Childs
AXIOMS AND MODELS FOR AN EXTENDED SET THEORY
This
draft presents the axioms of extended set theory (XST) and the ideas underlying
the axioms. It also presents an interpretation of XST in ZFC plus “there exist
arbitrarily large inaccessible cardinals,” thereby proving the consistency of XST
relative to this mild large-cardinal extension of ZFC. - Andreas Blass and D L Childs
Available from: iis
SET-STORE DATA ACCESS ARCHITECTURES
Data Access Architectures for Cloud Computing Environments
Row-store and column-store architectures
rely on DATA ACCESS PATHS for accessing and manipulating data by its physical properties.
Set-store architectures rely on DATA ACCESS OPERATIONS for accessing and
manipulating data by mathematically distinguishing between DATA CONTENT
and DATA REPRESENTATION.
Traditional architectures link applications and storage physically.
Set-Store architectures link applications and storage mathematically.
Set-Store architectures provide dynamic restructuring of storage to supply
applications with just the right data,
in just the right format,
at just the right time. - D L Childs
FUNCTIONS DEFINED BY SET BEHAVIOR
A Formal Foundation Based On Extended Set Axioms
ABSTRACT.
The term function seems to connote a sense of action or process
or behavior of something applied to something.
Within the framework of extended set theory, XST, the concept
of a function is defined as a behavior of sets in
terms of how specific sets react subject to their interaction
with other sets. In particular,
f(σ): A → B
asserts
that the set ‘f ’ behaves as a function under set ‘σ’ in
relating an individual member of the function domain, set ‘A’,
to exactly one member of the function codomain, set ‘B’.
It is shown that all Classical set theory, CST, graph
based function behavior can be expressed in terms of XST
function non-graph based behavior; that the behavior of
functions applied to themselves is supported; and that
the concepts of Category theory can be subsumed under XST.
A notable consequence of this approach is that the
mathematical properties of functions need no longer be dependent
on the mathematical properties of a Cartesian product. - D L Childs
TWO REMARKS ON SET THEORY
(The ordered n-tuples as sets)
Skolem concludes:
"I shall not pursue these considerations here, but only emphasize that
it is still a problem how the ordered n-tuple can be defined in the
most suitable way."
MATH. SCAND, 5 (1957) 40-46 , T. Skolem
SET-STORE ACCESS ARCHITECTURE Performance Comparison
A slide presentation of an industry performance comparison of IBM and Oracle
row-store based RDBMSs with iXSP, an interactive
set-store data access system.
Two benchmarks were performed.
One showing a "40-FOLD SPEED INCREASE".
The other showing a 76-98 fold performance improvement.
MANAGING DATA MATHEMATICALLY:
Data As A Mathematical Object
"Using Extended Set Theory for High Performance Database Management"
Video of presentation given at Microsoft Research Labs. with an introduction by Phil Bernstein.
(duration 1:10:52)
WHY SETS?
ABSTRACT: Sets play a key role in foundations of mathematics. Why?
To what extent is it an accident of history? Imagine that you have a
chance to talk to mathematicians from a far-away planet. Would their
mathematics be set-based? What are the alternatives to the set-theoretic
foundation of mathematics? Besides,
set theory seems to play a significant role in computer science;
is there a good justification for that? We
discuss these and some related issues.
- A. Blass, University of Michigan, Ann Arbor, MI;
Y. Gurevich, Microsoft Research, Redmond, WA
WHY NOT SETS?
ABSTRACT: Sets are well defined collections of uniquely identifiable items.
Data used by computers are well
defined collections of items representing situations of interest. Computers themselves are just
well defined collections of bits that change value over time. It would seem that all computer
processing is highly set oriented. Why are sets not more widely used in modeling the behavior
and assisting the development of computing systems? The following dialogue will attempt to
amplify this question, though neither of the participants has a clue to the answer.
- D L Childs
Information Access Intensive Systems
ABSTRACT: Performance demands of information access systems differ greatly from
those of traditional record retrieval systems.
The underlying technology supporting database management
systems is
antithetical to the needs and requirements of
information access systems.
This short paper will provide a summary of the necessary
performance requirements for
information access systems emphasizing why and how they differ from those of
record retrieval systems.
- D L Childs
Copyright © 2012
INTEGRATED INFORMATION SYSTEMS
« Last modified on 05/01/2012 »
- CONTACT -
|
|