Introduction
Today the problems hindering the performance of data management systems
are virtually the same as they were in the early
1970s, accessing data.^{[Cer10]}
Cloud computing
Cloud computing refers to the practice of transitioning computer services such as computation or data storage to multiple redundant offsite locations available on the Internet, which allows application software to be operated using internetenabled devices.
and
Big Data
"The term BigData is not welldefined, but is
generally used to refer to the type of data that breaks the limits of
traditional data storage and management stateoftheart."
applications are challenging the performance
of traditional data management
strategies.^{[Sto+14]}
Scalability, diverse data, schema development overhead, accessing large volumes of distributed data are a few of the concerns facing developers of future data management strategies.
Some original pioneering developers are even suggesting the need for
a complete system
overhaul.^{[Sto7]}
A system overhaul is not required. Existing systems can easily be upgraded
and new systems developed to provide optimal performance for all applications.
What is required is evolving physically dependent data management strategies
into mathematically reliable data management strategies.
Physically dependent data management strategies (those that manage data
using physical properties of data representations)
HAVE to use index structures to access data.
Mathematically reliable data management strategies (those that manage data
using a mathematical identity of data) HAVE
to use operations to access data.
Though the industry has long believed that
index structures
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure.
are essential for
best performance, the exact opposite is
true.^{[D]}
^{[Lar09]}
^{[Lig07]}
RDM: Relational Data Model
The need to access large amounts of diverse distributed data
was not a concern when traditional data management was being developed.
The primary concern, forty years ago, was improving user performance.
Since all computer data was managed by routines
using physical properties of data
representations,^{[*]}
users were required to know the physical organization of
data in
storage.^{[*]}
In 1970
Codd introduced the
Relational Data ModelRelational data model is the primary data model, which is used widely around the world for data storage and processing. This model is simple and it has all the properties and capabilities required to process data with storage efficiency.,
RDM.^{[*]}
Codd's approach was to replace physically dependent routines
with mathematically reliable operations, at the application level.
Storage data and data access were still managed by physically dependent
routines,
but these physical dependencies were hidden from application users.
Codd's contribution
to data management
was introducing the concept of
a mathematical identity of data
manipulated by settheoretic operations.
The RDM was successful because it
provided operations to
eliminate the physical dependency
between the user's view of data
and the system's representation of data.
Today physically dependent routines
for managing storage data are even more difficult to
develop,^{[*]}
difficult to use,^{[*]}
and difficult to
maintain.^{[*]}
Codd's solution to improve system performance in the 1970s,
by replacing physically dependent data management strategies
with mathematically reliable strategies,
will also work today for supporting performance of
existing and
future Big Data applications.
In his Turing acceptance paper in 1981,^{[Cod81]}
Codd provided some design objectives for
building future data management systems.
Codd's Data Management Objectives
(1) Data Independence Objective:
Separate logical/physical aspects of data management.
(2) Communicability Objective:
Make the model structurally simple.
(3) Setprocessing Objective:
Provide a foundation for setoriented processing.
(4) Foundation Objective:
Provide a sound theoretical foundation for data management.
In 1981 the foundations of set theory were inadequate to support
the requirements for faithfully modeling data representations,
making it impossible for any implementation to satisfy all four
data management conditions.
Recently, extensions to the foundations of set theory
have been completed and proven
consistent.^{[Bla11]}
Extended set theory, XST, provides a sound theoretical foundation
for data management by supporting a mathematical identity
for all data
representations, (RDM,
XML,
Xrel1,
Xrel2).
Most importantly, XST allows all of Codd's data management objectives to be achieved.
This means that the physical dependencies that distinguish Relational (SQL)
from nonRelational (NoSQL) implementations can be isolated and removed.
RDM vs. RDBMS vs. NoSQL
The Relational Data Model, RDM, is an abstract model
presenting a strategy for
modeling data as mathematical objects and manipulating these objects with mathematically well defined
operations.
A Relational Database Management System RDBMS, is an implementation combining
the data management specifications of the RDM with some data access and storage organization
data management strategy.
Unfortunately, all commercial RDBMS implementations cripple the performance
potential of the RDM.
The RDM presents a mathematically reliable data management strategy for
describing results of operations applied to sets.
All commercial RDBMS implementations support RDM applications
with physically dependent, transaction oriented
searches for individual records.
RDM: mathematically reliable, declarative, setprocessing.
RDBMS: physically dependent, procedural, recordprocessing.
Despite the fundamental mismatch between
RDM and RDBMS data management strategies,
RDBMS ability to mathematically manipulate data representations
has allowed Relational systems to dominate the industry for
over thirty years.
Unfortunately the mathematical muscle
of the RDM
currently supported by
RDBMS installations is not sufficient to
support requirements for Big Data applications.
Future Challenges (Physically dependent data management)
Big Data applications introduce issues of
scalability,
data diversity, and
distributed data access.
All of which challenge traditional
physically dependent data management capabilities.
The current battle seems to be forcing a choice
between the traditional RDBMS offerings represented by "SQL"
and extended capabilities of scalability, diverse data,
and volume data access represented by "NoSQL".
SQL: Settheoretic Query Language
Though SQL is commonly interpreted as "Structured Query Language",
it is more appropriately an acronym for "Settheoretic Query Language".
The roots of SQL are firmly planted in classical set theory, CST.
SQUARE, published in 1973, was the precursor to SEQUEL, published
in 1974. Which, in turn was the precursor to SQL.
SQUARE presented a powerful collection of CST operations.
The most important of which was the "Image" operation, the settheoretic
foundation for functions (i.e. mappings).
NoSQL: No Settheoretic Query Language
There seems to be no generally accepted translation of the
acronym NoSQL.
The only commonality among the large variety of NoSQL systems
seems to be a lack of mathematically reliable operations.
The distinction between SQL (implementations)
and NoSQL (implementations) is largely artificial.
Both approaches rely on physically dependent data management
strategies,
each having distinguished advantages
and disadvantages.
By replacing physically depended data management strategies
with mathematically reliable data management strategies
the best of both offerings can be integrated into
a highly productive, inherently reliable, efficient to use system.
SSDM: StructuredSet Data Model
The definition of a structuredset data model, SSDM,
is deceptively simple:
SSDM: Any collection of data representations and operations
on them that are welldefined under the axioms of
extended set theory, XST.
There is no restriction on how many operations are defined,
nor on what the operations do.
There is no restriction on how many data representations
are defined,
nor on how they are defined.
The only condition is that all operations and
data representations be defined using
extended set notation, XSN.
A structuredset
is a set as defined under the axioms of
extended set theory.
Conceptually an extended set
is just a classical
set with an extended membership
to provide two conditions for
set membership
instead of just one.
The particulars are rather boring, but the utility
of the extension
allows a set theoretic dimension for structure.
The only difference between classical sets
and structuredsets is that classical sets have only
one condition for membership
while structuredsets require two conditions.
The structure component of an extended set membership
can be used to distinguish the container part
of a data representation
from the content part.
Though set theory has been tried many times as a formal data model,
it has always failed to provide the ability
to suitably define data records as unambiguous sets.
Structuredsets provide an additional component to classical set membership
allowing a
formally defined representation of
data that uniquely distinguishes the logical data relationships (content)
from the physical data representation (container).
All Data Representations
ARE^{ [a]}
StructuredSets.
Thus, All Data Can Be Managed Using Set Operations.
Since structuredsets can formally represent any and all
application and storage data
with the ability to distinguish data content from data
container, structuredset based access strategies can
manipulate data content and data containers independently
to provide nearoptimal access performance for each and every application.
With structuredsets the distinction between content and structure is an innate property
of extended set membership. This property makes structuredsets a natural choice for
modeling representations of data.
Under a structuredset data model
all logical and physical representations of data are structuredsets.
All manipulations of logical and physical data representations
can be modeled by set operations.
For presentation convenience or performance
considerations extended
set operations can be defined that
map the content of one structuredset to another structuredset having a totally deferent structure.
Thus a
structuredset data model
is an ideal choice for modeling data independent access systems.
SQL & StructuredSets
The SQL SELECT statement was
originally^{
[Cha74]}
introduced as a set operation.
The language provided applications the means for specifying the membership of a result set
as derived from a given collection of sets.
There was no need for applications to know how data was actually stored.
Originally the sets were restricted to being arrays, since that was
the only structure reasonably supported by classical set theory.
Now that structuredsets are available for modeling
any representation of data, there is
no need to restrict SELECT statements to just arrays.
STDS & iXSP: Mathematically Reliable Systems
In 1968 two papers were published suggesting the potential
advantages of
recognizing data as a mathematical object,
managing data
with set operations,
and organizing data on disks using
a settheoretic data structure,
STDS.^{[A]}^{[B]}
RDBs vs. XSP
Current RDBs, based loosely on CST for which there is no explicit structure or order, must rely on indexes to efficiently find the requested data. However, this means that indexes must be planned for, designed, implemented, and maintained in order to be useful. This consumes considerable nonrecurring engineering resources, as well as recurring processing and system administration overhead. In a high rate, low latency data processing system, there may not be time to effectively execute the necessary index and related statistical updates to effectively use them. In large database instances with 100's of millions or billions of records, the indexes themselves can get very large and deep (in btree structures), often requiring secondary indexing structures to manage the primary indexes.
In addition, RDBs do not have a means to avoid transfer of significant irrelevant data across the I/O barrier. The entire table (or tables) involved in a query must be considered each time the query is executed. All columns of a table must be included in any record retrieval before performing any requested projections to obtain the desired output columns. In addition, records are often stored in random accessed pages, which can result in whole records being transferred over the I/O barrier for no other reason than they share a page with a desired record. All of these conditions result in a highly inefficient use of the storage system and exacerbates storage I/O limitations.
In XSP, processing is done at the set level, not the record level. So there is no need for record level indexes. Instead, set theoretic operations are performed on sets as operands. And the resulting sets, along with their identity and relationship to their parent sets, are saved into a continually evolving set universe. Each requested set operation is first reduced algebraically to its lowest cost equivalent, using already computed sets where possible. This step is extremely fast as it uses set metadata and requires no I/O accesses. The resulting set equation is then executed and the results returned and saved.
Typical queries are looking for specific relevant information and result in sets that are several orders of magnitudes smaller in size than the original. In addition, it is typical for multiple queries to be related in one or more attribute as analyses proceeds along a line of investigation or "drill down". This results in the creation of multiple, small data sets containing highly relevant data related to the recent queries. Instead of relying on indexes to improve efficiency in finding requested records, XSP uses these small set partitions of the full data set to rapidly and efficiently find the equivalent result set using minimum data transfer across the storage I/O boundary.
Conclusion
In 1970 Codd introduced the advantage of set processing over record processing.
His influence has dominated data management systems for more than forty years.
Though Codd advocated using set processing for both application data and
system data, classical set theory, CST, was not suitable for
faithfully modeling physical data representations.
Recent developments in extending the foundations of set theory
now allow all data representations (both abstract and physical) to be faithfully modeled by set theory.
It is now possible to fulfill Codd's data management objectives by developing
mathematically reliable data management systems that provide all of the following:
(1) Data Independence Objective:
Separate logical/physical aspects of data management.
(2) Communicability Objective:
Make the model structurally simple.
(3) Setprocessing Objective:
Provide a foundation for setoriented processing.
(4) Foundation Objective:
Provide a sound theoretical foundation for data management.
Since any and all data representations have a mathematical identity under extended set theory, XST,
all existing data representations can be mathematically identified and
manipulated by set operations.
This allows all existing systems to be upgraded
from physically dependent systems to mathematically reliable systems.
"It's like 1973 for Moving Data Around in the Cloud."
 Vint Cerf  2010
"There is a difficult road ahead for
enterprise database applications."
"The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow’s requirements, not continue to push code lines and architectures designed for yesterday’s needs."
This paper presents an extended set theory (XST) and proves its consistency relative to the classical ZermeloFraenkel set theory with the axiom of choice (ZFC) and an axiom asserting the existence of arbitrarily large inaccessible cardinals (also known as Grothendieck’s axiom of universes). The original motivation for this development of XST was to provide an improved, flexible mathematical foundation for certain constructions, like tuples and functions, that play important roles in computing. Subsequent developments revealed that XST also provides a smooth way of handling the setclass distinction that turns up especially in category theory.
The 1981 ACM Turing Award Lecture:
"The relational model calls not only for relational
structures (which can be thought of as tables), but also
for a particular kind of set processing called relational
processing. Relational processing entails treating whole
relations as operands."
Imagine trying to convince ancient sheepherders (who calculated and compared their wealth by equating piles of pebbles to the size of herds of sheep) that a piece of parchment with numbers on it would give the same results as a pile of pebbles, but that would be more accurate, easier to use, faster in processing, less resource intensive, and more portable? Now imagine trying to convince modern database researchers that the mathematical identity of records can be used to replace the physical indexing of records
"For a wide range of reasons, designing for and maintaining optimal data
access poses a genuine challenge to even the most sophisticated
enterprises."
Some computing professionals currently run their own consulting
businesses doing little else than
helping customers improve their table indexing design.
Their efforts can improve query performance by as much as 50 times.
If A is a structuredset and if
Γ_{A}(x,s)
is true,
then
x is a smember of A.
If Γ_{A}(x,s) is false,
x is not a smember of A.
For example: let A = <a,b,c> =
{a^{1}, b^{2}, c^{3}},
then
Γ_{A}(b,2)
is true, while
Γ_{A}(a,2)
is false.
Classical sets have no structure.
The membership test for any Classical set
A is Γ_{A}(x,∅).
Thus, all Classical sets are structuredsets with null structure.
"faithful" means isomorphic representation of content, structure, and behavior.
Notes

⋏
"faithful" means isomorphic representation of content, structure, and behavior.

⋏
MICRO(19721998) used mathematically well defined operations
in a time sharing environment
to manage application data,
storage data,
and all transformations between the two.
References

[Bla11]⋏
Blass, A., Childs, D L:
Axioms and Models for an Extended Set Theory  2011
♦
This paper presents the formal foundation for supporting "structuredsets".
5.1 TUPLES:
Traditional set theory has several ways of coding ordered tuples
< a_{1}, a_{2}, .. , a_{n} >
as sets, none of which is really
canonical^{
[Sk57]}.
XST provides a simple and natural way to represent tuples,
namely to use natural numbers as scopes. Thus, a tuple
< a_{1}, a_{2}, .. , a_{n} >
is identified with the (extended) set
{ a_{1}^{1}, a_{2}^{2}, .. , a_{n}^{n} }.
The natural numbers used here as
scopes can be represented by the traditional von~Neumann coding (with
scope ∅),
5.2 GENERALIZED TUPLES:
Instead of indexing the components of a tuple by (consecutive)
natural numbers, one could index them by arbitrary, distinct labels.
The same XST representation still works; use the labels as scopes.
This provides, for example, a convenient way to deal with what are
often called records in computing. Records have fields, in
which data are inserted. We can represent them settheoretically by
taking the field names as scopes with the data as elements.
Similarly, we can represent relations, in the sense of relational
databases, by sets of generalized tuples, one for each row of the
relation. The scopes for such a generalized tuple would be the
attribute names, while the corresponding elements would be the values,
in that row, of the attributes.
5.3 FUNCTIONS and MULTIFUNCTIONS (functions defined by set behavior):
In general, a set f of generalized tuples, all having the same scope
set D, can be regarded as describing several (possibly multivalued)
operations, called the behaviors of f.

[Boy73]⋏
Boyce, R. F.; Chamberlin, D. D.; King, W. F.; Hammer, M. M.:
Specifying Queries as Relational Expressions: SQUARE
 IBM Technical Report RJ 1291, 1973
♦
This paper presents a data sublanguage called SQUARE, intended for use
in ad hoc, interactive problem solving by noncomputer specialists.
SQUARE is based on the relational model of data, and is shown to be
relationally complete; however, it avoids the quantifiers and bound
variables required by languages based on the relational calculus.
Facilities for query, insertion, deletion, and update on tabular data
bases are described. A syntax is given, and suggestions are made for
alternative syntaxes, including a syntax based on English key words for
users with limited mathematical background.

[Cer10]↑
Cerf, V.:
"It's like 1973 for Moving Data Around in the Cloud"
♦
Using a cloud computing service may sound enticing, but you better
consider how that data can be moved around if you want to switch to a
different provider. It's a big problem that now has the attention of
Vint Cerf, who is calling for standards to define how customer data gets
passed between different cloud service providers.

[Cha74]⋏
Chamberlin, D. D.; Boyce, R. F.:
SEQUEL: A Structured English Query Language  IBM Research Laboratory, 1974
♦
ABSTRACT: In this paper we present the data manipulation facility for a
structured English query language (SEQUEL) which can be used for accessing
data in an integrated relational data base. Without resorting to the concepts
of bound variables and quantifiers SEQUEL identifies a set of simple operations
on tabular structures, which can be shown to be of equivalent power to
the first order predicate calculus. A SEQUEL user is presented with a consistent
set of keyword English templates which reflect how people use tables to
obtain information. Moreover, the SEQUEL user is able to compose these basic
templates in a structured manner in order to form more complex queries.
SEQUEL is intended as a data base sub language for both the professional programmer
and the more infrequent data base user.

[Cha01]⋏
Champion, M.:
XSP: An Integration Technology for Systems Development and Evolution
 Software AG  2001
♦
The mathematics of the relational model is based on classical set theory,
CST, and this is both its strength and its weakness.
An "extended set theory", XST, can be used to model
ordering and containment relationships
that are simply too "messy" to handle in classical set theory and the
formalisms (such as relational algebra) that are based on it.

[Cod70]
^{⋏}^{a}^{b}
Codd, E. F.:
A Relational Model of Data for Large Shared Data Banks CACM 13, No. 6 (June) 1970
♦
Abstract: Future users of large data banks must be protected from having to know
how the data is organized in the machine (the internal representation).

[Cod81]⋏
Codd, E. F.:
A Relational database: a practical foundation for performance CACM 25, No. 6 (June) 1970
♦
The 1981 ACM Turing Award Lecture:
"The relational model calls not only for relational
structures (which can be thought of as tables), but also
for a particular kind of set processing called relational
processing. Relational processing entails treating whole
relations as operands."

[Ell15]⋏
Ellis, T.:
Extended Set Theory: A Summary  2015
♦
Extended Set Theory (XST) was originally developed under an ARPA contract to address the limitations of Classical Set Theory (CST). At the root of the problem is the set membership definition of CST, which defines sets and the results of set operations based on a single membership condition: content.

[Fay13]⋏
Fayyad, U. M.:
Big Data Everywhere, and No SQL in Sight
SIGKDD Explorations, Volume 14, Issue 2  2013
♦
"The term BigData is not welldefined, but is
generally used to refer to the type of data that breaks the limits of
traditional data storage and management stateoftheart."

[Har08]⋏
Harizopoulos,~S.; Madden,~S.; Abadi,~D.; Stonebraker,~M.:
OLTP Through the Looking Glass, and What We Found There
SIGMOD'08, June 912, 2008
♦
Over 90% of an application process is indexedaccess overhead.

[Lar09]⋏
Larsen, S. M.:
The Business Value of Intelligent Data Access  March 2009
♦
Article provides an excellent description on how difficult
it is to optimize data access paths.
"For a wide range of reasons, designing for and maintaining optimal data
access poses a genuine challenge to even the most sophisticated
enterprises." p. 2

[Lig07]⋏
^{a}
^{b}
^{c}
Lightstone, S; Teorey, T.; Nadeau, T.:
Physical Database Design
 Morgan Kaufmann, 2007
♦
A comprehensive analysis of how complicated the physical database
design process can be
without the guidance of a formal data access model.
Without such formal support physical file data access structures
typically impede system performance. [excerpts below]
a) File data access strategies are extremely difficult to optimize.
"Some computing professionals currently run their own consulting
businesses doing little else than
helping customers improve their table indexing design."
Their efforts can improve query performance by as much as 50 times. (p. 2)
b) Files are physical representations of data.
Tables are logical representations of data.
"Tables are Files"? (p. 7)
c) An index is data organization set up to speed up the retrieval
of data from tables. In database management systems, indexes can be specified
by database application programmers. (p. 8)
d) "It is important to be able to analyze the different paths for the
quality of the result, in other words, the performance of the system to
get you the correct result and choose the best path to get you there."
(p. 31)
e) A block (or page) has been the basic unit of I/O from disk to fast
memory (RAM), typically 4~KB in size.
In recent years, prefetch buffers (typically 64~KB, as in DB2) have been used to
increase I/O efficiency. (p. 371)
f) The total I/O time for a full table scan is computed simply
as the I/O time for a single block, or prefetch buffer (64~KB),
times the total number of those I/O transfers in the table. (p. 372)

[Man14]⋏
Manoochehri, M.:
Data Just Right  Introduction to LargeScale Data & Analytics)
AddisonWesley, 2014.
♦
Largescale data analysis is now vitally important to virtually every business. Mobile and social technologies are generating massive datasets; distributed cloud computing offers the resources to store and analyze them; and professionals have radically new technologies at their command, including NoSQL databases. .

[MICRO]
⋏
MICRO RDBMS 19721998; User Manual:
MICRO A Relational Database Management System 1992
♦
MICRO supported timesharing commercial applications from 1972 to 1998.
It was the first system to use settheoretic operations to create and
manage stored data.
MICRO managed data with a mathematically reliable data storage and
access system (STDS*) which used no index structures,
required no schema design, provided nondestructive updates,
and supported both structured and semistructured data.

[MSQL]
^{⋏}
SQL Server Performance Team:
Great New TPCH Results with SQL Server 2008 17 Aug. 2009
♦
"HP also published a 300GB result on their ProLiant DL785 platform with SQL Server 2008. This publication illustrates the high performance and solid price/performance using industry standard components from HP and Microsoft." (Load time: 300GB in 13.33 hours)

[Nor10]⋏
North, K.:
♦
Three articles
presenting a short historical perspective on the
role of set theory,
mathematically sound data models,
and the importance of data independence.  2010
PART I:
Sets, Data Models and Data Independence
PART II:
Laying the Foundation: Revolution, Math for Databases and Big Data
PART III:
Information Density, Mathematical Identity, Set Stores and Big Data

[Sk57]⋏
^{a}
^{b}
^{c}
Skolem, T.:
Two Remarks on Set Theory (The ordered ntuples as sets)
MATH. SCAND, 5 (1957) 4046
♦
Skolem concludes:
"I shall not pursue these considerations here, but only emphasize that
it is still a problem how the ordered ntuple can be defined in the
most suitable way."

[Sto5]⋏
Stout, R.:
Information Access Accelerator
Information Builders Inc.  2005
♦
Slide presentation explaining a 40 to 1 performance improvement over commercial DBMSs by using a structured set access interface between applications and storage.

[Sto7]⋏
Stonebraker, M.; Madden, S.; Abadi, D.; Harizopoulos, S.; Hachem, N.; Helland, P.:
The End of an Architectural Era (It's Time for a Complete Rewrite)
33rd International Conference on Very Large Data Bases, 2007.
♦
"The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow’s requirements, not continue to push code lines and architectures designed for yesterday’s needs."

[Sto+14]⋏
Stonebraker, M., et. al.:
Enterprise Data Applications and the Cloud: A Difficult Road Ahead
Proc IEEE IC2E, Boston, Ma., March 2014
♦
Paper succinctly delineates potential growing pains for future DBMS,
if developers continue to rely on physically dependent data access technologies.
"There is considerable interest in moving DBMS
applications from inside enterprise data centers to the cloud,
both to reduce cost and to increase flexibility and elasticity.
In some circumstances, achieving good DBMS performance on
current cloud architectures and future hardware technologies will
be nontrivial.
In summary, there is a difficult road ahead for
enterprise database applications."

[Sto14]⋏
Stonebraker, M.:
Hadoop at a Crossroads?
BLOG@CACM, August, 2014.
♦
Persistent use of file systems perpetuating use of physical data location access strategies
will present serious challenges for future system developers.
"Hiding the (physical) location of data from the DBMS is death, and the DBMS will go to great lengths to circumvent this feature."

[Teo11]⋏
Teorey, T.;Lightstone, S; Nadeau, T.Jagadish, H. V.:
Database Modeling and Design
Morgan Kaufmann, 2011, Fifth Edition.
♦
Many in the industry consider this to be the best book available on classic database design and for explaining how to build database applications, complemented with objective commentary. For example in Chapt. 8: ``In short, transferring data between a database and an application program is an onerous process, because of both difficulty of programming and performance overhead."
Bibliography

⋏
Feasibility of a Settheoretic Data Structure
A general structure based on a reconstituted definition of relation
IFIP Cong., Edinburgh Scotland, August 1968
♦
This antique paper presented the thesis that mathematical control over
the representation, management, and access of data was critical for the
functional freedom of applications and I/O performance of future
systems.

⋏
Description of a Settheoretic Data Structure:
AFIPS fall joint computer conference San Fransico CA, December 1968
♦
Presents early development of STDS,
a machineindependent settheoretic data structure allowing rapid processing of
data related by arbitrary assignment.

⋏
Extended Set Theory:
A General Model For Very Large, Distributed, Backend Information Systems
VLDB 1977 (Invited paper)
♦
ABSTRACT Three distinct components comprise an Information System: INFORMATION MANAGEMENT, DATA MANAGEMENT, and STORAGE MANAGEMENT. Until recently, all three have been subsumed under data management. As applications become more demanding, as support criteria become more complex, and as storage capacity becomes very large, the need for functional independence of these three management areas has become more apparent. Recognition of this situation has been popularized through the phrase, "data independence", or more precisely, "data independence from information" and "data independence from storage".
The difficulty in achieving data independence arises through the incompatibility of a complex information space being supported by a simple storage space. The popular, but limiting approach, has been to force the information space into a restrictive record space. This achieves a deceptive compatibility allowing only the appearance of data independence at the user level. This record oriented approach has become pervasive for small databases even though it constrains user applications, requires substantial storage overhead, and imposes inherent processing inefficiencies.
As databases become very large and as distributed systems become desirable the need for inherent (not superficial) data independence becomes crucial. This paper is intended as a tutorial and will describe conditions for data independence and summaries the concepts of Extended Set Theory as a general model for expressing information systems embodying data independence. This generality will be demonstrated by considering some major problems pertinent to the design and support of very large, distributed, backend information systems.
It should be emphasized that Extended Set Theory is a formalism for expressing solutions and is not a specific solution in itself. Though "redundant membership condition", "distributed membership condition", and "settheoretic interface" may be new concepts, Extended Set Theory does not preclude any current DBMS concepts, data structures, or existing implementations. Rather, Extended Set Theory embraces them all under a unifying model.

⋏
PEBBLE PILES & INDEX STRUCTURES: A Parable
 2005
♦
Imagine trying to convince ancient sheepherders (who calculated and compared their wealth by equating piles of pebbles to the size of herds of sheep) that a piece of parchment with numbers on it would give the same results as a pile of pebbles, but that would be more accurate, easier to use, faster in processing, less resource intensive, and more portable? Now imagine trying to convince modern database researchers that the mathematical identity of records can be used to replace the physical indexing of records.

⋏
1984 VLDB Panel: Inexpensive Large Capacity Storage Will Revolutionize
The Design Of Database Management Systems
Proceedings of the Tenth International Conference on Very Large Data Bases.
Singapore, August, 1984
♦
As secondary storage devices increase in capacity and decrease in cost,
current DBMS design philosophies become less adequate for addressing
the demands to be imposed by very large database environments. Future
database management systems must be designed to allow dynamic
optimization of the I/O overhead, while providing more sophisticated
applications involving increasingly complex data relationships.

⋏
A Mathematical Foundation for Systems Development  NATOASI Series, Vol. F24, 1986
♦
Paper presents a Hypermodel syntax for precision modeling of arbitrarily complex systems
by providing
a function space continuum with explosive resolution and extended set notation to provide
generality and rigor to the concept of a Hypermodel.

⋏
Managing Data Mathematically:
Data As A Mathematical Object:
♦
"Using Extended Set Theory for High Performance Database Management"
Presentation given at Microsoft Research Labs. with an introduction by Phil Bernstein.
(video: duration 1:10:52)  2006

⋏
Why Not Sets?  2010
♦
Sets are well defined collections of uniquely identifiable items. Data used by computers are well defined collections of items representing situations of interest. Computers themselves are just well defined collections of bits that change value over time. It would seem that all computer processing is highly set oriented. Why are sets not more widely used in modeling the behavior and assisting the development of computing systems?

⋏
Functions as Set Behavior: A Formal Foundation Based On Extended Set Axioms  2011
♦
ABSTRACT: The term function suggests an action or process or behavior of something applied to something. Within the framework of extended set theory, XST, the concept of a function will be defined in terms of a liberal definition of morphism. Which in turn will be equated with the behavior of how specific sets interact with each other. It will be shown that all Classical set theory, CST, graph based function behavior can be expressed in terms of XST function nongraph based behavior; that the behavior of functions applied to themselves is supported; and that the concepts of Category theory can be subsumed under XST. A notable consequence of this approach is that the use of functions need no longer be constrained by properties of a Cartesian product.

⋏
XSP TECHNOLOGY: Theory & Practice:
Formal Modeling & Practical Implementation of XML & RDM Systems
 2011
♦
INTRODUCTION XSP (extended set processing) Technology introduces: [1] a formal mathematical foundation, extended set theory, XST, for defining and mapping user application expectations into internal machine representations; and [2] practical software implementations based on XST that assist in the design, development, and use of XML and RDM systems.

⋏
SETPROCESSING AT THE I/O LEVEL:
A Performance Alternative to Traditional Index Structures
 2011
♦
ABSTRACT It is generally believed that index structures are essential for highperformance information access. This belief is false. For, though indexing is a venerable, valuable, and mathematically sound identification mechanism, its logical potential for identifying unique data items is restricted by structuredependent implementations that are extremely inefficient, costly, functionally restrictive, information destructive, resource demanding, and, most importantly, that preclude data independence. A lowlevel logical data access alternative to physical indexed data access is setprocessing. System I/O level setprocessing minimizes the overall I/O workload by more efficiently locating relevant data to be transferred, and by greatly increasing the information transfer efficiency over that of traditional indexed record access strategies. Instead of accessing records through imposed locations, the setprocessing alternative accesses records by their intrinsic mathematical identity. By optimizing I/O traffic with informationally dense data transfers, using no physical indexes of any kind, lowlevel setprocessing has demonstrated a substantial, scalable performance improvement over locationdependent index structures.

⋏
SETSTORE DATA ACCESS ARCHITECTURES:
For High Performance Informationally Dense I/O Transfers
 2011
♦
ABSTRACT For high performance analytic processing of vast amounts of data buried in secondary storage, traditional performance strategies generally advocate minimizing system I/O. This paper advocates the converse supported by the use of setstore architectures. Traditional rowstore and columnstore architectures rely on mechanical data models (based on an imposed physical representation of data) for accessing and manipulating system data. Setstore architectures rely on a mathematical data model (based solely on the intrinsic mathematical identity of data) to control the representation, organization, manipulation and access of data for high performance informationally dense parallel I/O transfers.

⋏
iXSP Interactive Extended Set Processor:
Programmer’s Manual, User’s Tutorial and Mathematical Backgrounds
 v1.7 (44 pages) Draft 2013
♦
This manual has been produced for those interested in an introduction to the Extended
Set Processor from IIS and the commands that the processor supports.

⋏
I/O Technology For Big Data:
Massively Parallel Data Access of Big Data Environments  2011
♦
Traditional I/O technology is based on the storage and retrieval of records, records that are physically preserved in storage. Set processing I/O technology is based on the exchange of collections (sets) of records, records which may or may not physically exist in storage. Given the advances in hardware platforms, set processing I/O technology can offer one to three orders of magnitude better system performance than traditional I/O technology.
Copyright © 2015
INTEGRATED INFORMATION SYSTEMS
« Last modified on 07/27/2015 »
 CONTACT 
