XSP TECHNOLOGY    
Extended Set Processing Technology


Future Data Management
Leveraging the Mathematical Identity of Data
D L Childs

Big Data system performance objectives (scaling, volume data, diverse and distributed data) are hindered by physical dependencies required to access data. In 1970 Codd showed how to improve system performance by giving data a mathematical identity, tables, and by replacing physically dependent routines with mathematically reliable operations. Replacing physically dependent routines with mathematically reliable operations can greatly improve the performance of Big Data systems.

Introduction
Today the problems hindering the performance of data management systems are virtually the same as they were in the early 1970s, accessing data.[Cer10] Cloud computing and Big Data applications are challenging the performance of traditional data management strategies.[Sto+14] Scalability, diverse data, schema development overhead, accessing large volumes of distributed data are a few of the concerns facing developers of future data management strategies. Some original pioneering developers are even suggesting the need for a complete system overhaul.[Sto7]

A system overhaul is not required. Existing systems can easily be upgraded and new systems developed to provide optimal performance for all applications. What is required is evolving physically dependent data management strategies into mathematically reliable data management strategies.

Physically dependent data management strategies (those that manage data using physical properties of data representations) HAVE to use index structures to access data.

Mathematically reliable data management strategies (those that manage data using a mathematical identity of data) HAVE to use operations to access data.

Though the industry has long believed that index structures are essential for best performance, the exact opposite is true.[D] [Lar09] [Lig07]

RDM: Relational Data Model
The need to access large amounts of diverse distributed data was not a concern when traditional data management was being developed. The primary concern, forty years ago, was improving user performance. Since all computer data was managed by routines using physical properties of data representations,[*] users were required to know the physical organization of data in storage.[*]

In 1970 Codd introduced the Relational Data Model, RDM.[*] Codd's approach was to replace physically dependent routines with mathematically reliable operations, at the application level. Storage data and data access were still managed by physically dependent routines, but these physical dependencies were hidden from application users.

Codd's contribution to data management was introducing the concept of a mathematical identity of data manipulated by set-theoretic operations.

The RDM was successful because it provided operations to eliminate the physical dependency between the user's view of data and the system's representation of data. Today physically dependent routines for managing storage data are even more difficult to develop,[*] difficult to use,[*] and difficult to maintain.[*] Codd's solution to improve system performance in the 1970s, by replacing physically dependent data management strategies with mathematically reliable strategies, will also work today for supporting performance of existing and future Big Data applications.

In his Turing acceptance paper in 1981,[Cod81] Codd provided some design objectives for building future data management systems.

Codd's Data Management Objectives
     (1) Data Independence Objective: Separate logical/physical aspects of data management.
     (2) Communicability Objective: Make the model structurally simple.
     (3) Set-processing Objective: Provide a foundation for set-oriented processing.
     (4) Foundation Objective: Provide a sound theoretical foundation for data management.

In 1981 the foundations of set theory were inadequate to support the requirements for faithfully modeling data representations, making it impossible for any implementation to satisfy all four data management conditions. Recently, extensions to the foundations of set theory have been completed and proven consistent.[Bla11] Extended set theory, XST, provides a sound theoretical foundation for data management by supporting a mathematical identity for all data representations, (RDM, XML, Xrel1, Xrel2).

Most importantly, XST allows all of Codd's data management objectives to be achieved. This means that the physical dependencies that distinguish Relational (SQL) from non-Relational (NoSQL) implementations can be isolated and removed.

RDM vs. RDBMS vs. NoSQL
The Relational Data Model, RDM, is an abstract model presenting a strategy for modeling data as mathematical objects and manipulating these objects with mathematically well defined operations.

A Relational Database Management System RDBMS, is an implementation combining the data management specifications of the RDM with some data access and storage organization data management strategy.

Unfortunately, all commercial RDBMS implementations cripple the performance potential of the RDM. The RDM presents a mathematically reliable data management strategy for describing results of operations applied to sets. All commercial RDBMS implementations support RDM applications with physically dependent, transaction oriented searches for individual records.

RDM: mathematically reliable, declarative, set-processing.
RDBMS: physically dependent, procedural, record-processing.

Despite the fundamental mismatch between RDM and RDBMS data management strategies, RDBMS ability to mathematically manipulate data representations has allowed Relational systems to dominate the industry for over thirty years. Unfortunately the mathematical muscle of the RDM currently supported by RDBMS installations is not sufficient to support requirements for Big Data applications.

Future Challenges (Physically dependent data management)
Big Data applications introduce issues of scalability, data diversity, and distributed data access. All of which challenge traditional physically dependent data management capabilities. The current battle seems to be forcing a choice between the traditional RDBMS offerings represented by "SQL" and extended capabilities of scalability, diverse data, and volume data access represented by "NoSQL".

SQL: Set-theoretic Query Language Though SQL is commonly interpreted as "Structured Query Language", it is more appropriately an acronym for "Set-theoretic Query Language". The roots of SQL are firmly planted in classical set theory, CST. SQUARE, published in 1973, was the precursor to SEQUEL, published in 1974. Which, in turn was the precursor to SQL. SQUARE presented a powerful collection of CST operations. The most important of which was the "Image" operation, the set-theoretic foundation for functions (i.e. mappings).

NoSQL: No Set-theoretic Query Language There seems to be no generally accepted translation of the acronym NoSQL. The only commonality among the large variety of NoSQL systems seems to be a lack of mathematically reliable operations.

The distinction between SQL (implementations) and NoSQL (implementations) is largely artificial. Both approaches rely on physically dependent data management strategies, each having distinguished advantages and disadvantages. By replacing physically depended data management strategies with mathematically reliable data management strategies the best of both offerings can be integrated into a highly productive, inherently reliable, efficient to use system.

SSDM: Structured-Set Data Model
The definition of a structured-set data model, SSDM, is deceptively simple:

SSDM: Any collection of data representations and operations on them that are well-defined under the axioms of extended set theory, XST.

There is no restriction on how many operations are defined, nor on what the operations do. There is no restriction on how many data representations are defined, nor on how they are defined. The only condition is that all operations and data representations be defined using extended set notation, XSN.

A structured-set is a set as defined under the axioms of extended set theory.

Conceptually an extended set is just a classical set with an extended membership to provide two conditions for set membership instead of just one. The particulars are rather boring, but the utility of the extension allows a set theoretic dimension for structure. The only difference between classical sets and structured-sets is that classical sets have only one condition for membership while structured-sets require two conditions.

The structure component of an extended set membership can be used to distinguish the container part of a data representation from the content part. Though set theory has been tried many times as a formal data model, it has always failed to provide the ability to suitably define data records as unambiguous sets. Structured-sets provide an additional component to classical set membership allowing a formally defined representation of data that uniquely distinguishes the logical data relationships (content) from the physical data representation (container).

All Data Representations ARE[a] Structured-Sets.
Thus, All Data Can Be Managed Using Set Operations.

Since structured-sets can formally represent any and all application and storage data with the ability to distinguish data content from data container, structured-set based access strategies can manipulate data content and data containers independently to provide near-optimal access performance for each and every application.

With structured-sets the distinction between content and structure is an innate property of extended set membership. This property makes structured-sets a natural choice for modeling representations of data. Under a structured-set data model all logical and physical representations of data are structured-sets. All manipulations of logical and physical data representations can be modeled by set operations. For presentation convenience or performance considerations extended set operations can be defined that map the content of one structured-set to another structured-set having a totally deferent structure. Thus a structured-set data model is an ideal choice for modeling data independent access systems.

SQL & Structured-Sets
The SQL SELECT statement was originally [Cha74] introduced as a set operation. The language provided applications the means for specifying the membership of a result set as derived from a given collection of sets. There was no need for applications to know how data was actually stored.

Originally the sets were restricted to being arrays, since that was the only structure reasonably supported by classical set theory. Now that structured-sets are available for modeling any representation of data, there is no need to restrict SELECT statements to just arrays.

STDS & iXSP: Mathematically Reliable Systems
In 1968 two papers were published suggesting the potential advantages of recognizing data as a mathematical object, managing data with set operations, and organizing data on disks using a set-theoretic data structure, STDS.[A][B]

RDBs vs. XSP
Current RDBs, based loosely on CST for which there is no explicit structure or order, must rely on indexes to efficiently find the requested data. However, this means that indexes must be planned for, designed, implemented, and maintained in order to be useful. This consumes considerable non-recurring engineering resources, as well as recurring processing and system administration overhead. In a high rate, low latency data processing system, there may not be time to effectively execute the necessary index and related statistical updates to effectively use them. In large database instances with 100's of millions or billions of records, the indexes themselves can get very large and deep (in b-tree structures), often requiring secondary indexing structures to manage the primary indexes.

In addition, RDBs do not have a means to avoid transfer of significant irrelevant data across the I/O barrier. The entire table (or tables) involved in a query must be considered each time the query is executed. All columns of a table must be included in any record retrieval before performing any requested projections to obtain the desired output columns. In addition, records are often stored in random accessed pages, which can result in whole records being transferred over the I/O barrier for no other reason than they share a page with a desired record. All of these conditions result in a highly inefficient use of the storage system and exacerbates storage I/O limitations.

In XSP, processing is done at the set level, not the record level. So there is no need for record level indexes. Instead, set theoretic operations are performed on sets as operands. And the resulting sets, along with their identity and relationship to their parent sets, are saved into a continually evolving set universe. Each requested set operation is first reduced algebraically to its lowest cost equivalent, using already computed sets where possible. This step is extremely fast as it uses set metadata and requires no I/O accesses. The resulting set equation is then executed and the results returned and saved.

Typical queries are looking for specific relevant information and result in sets that are several orders of magnitudes smaller in size than the original. In addition, it is typical for multiple queries to be related in one or more attribute as analyses proceeds along a line of investigation or "drill down". This results in the creation of multiple, small data sets containing highly relevant data related to the recent queries. Instead of relying on indexes to improve efficiency in finding requested records, XSP uses these small set partitions of the full data set to rapidly and efficiently find the equivalent result set using minimum data transfer across the storage I/O boundary.

Conclusion
In 1970 Codd introduced the advantage of set processing over record processing. His influence has dominated data management systems for more than forty years. Though Codd advocated using set processing for both application data and system data, classical set theory, CST, was not suitable for faithfully modeling physical data representations.

Recent developments in extending the foundations of set theory now allow all data representations (both abstract and physical) to be faithfully modeled by set theory. It is now possible to fulfill Codd's data management objectives by developing mathematically reliable data management systems that provide all of the following:
          (1) Data Independence Objective: Separate logical/physical aspects of data management.
          (2) Communicability Objective: Make the model structurally simple.
          (3) Set-processing Objective: Provide a foundation for set-oriented processing.
          (4) Foundation Objective: Provide a sound theoretical foundation for data management.

Since any and all data representations have a mathematical identity under extended set theory, XST, all existing data representations can be mathematically identified and manipulated by set operations. This allows all existing systems to be upgraded from physically dependent systems to mathematically reliable systems.


Notes

  1. "faithful" means isomorphic representation of content, structure, and behavior.
  2. MICRO(1972-1998) used mathematically well defined operations in a time sharing environment to manage application data, storage data, and all transformations between the two.


References

  1. [Bla11] Blass, A., Childs, D L: Axioms and Models for an Extended Set Theory - 2011 ♦ This paper presents the formal foundation for supporting "structured-sets".
        5.1 TUPLES: Traditional set theory has several ways of coding ordered tuples < a1, a2, .. , an > as sets, none of which is really canonical [Sk57]. XST provides a simple and natural way to represent tuples, namely to use natural numbers as scopes. Thus, a tuple < a1, a2, .. , an > is identified with the (extended) set { a11, a22, .. , ann }. The natural numbers used here as scopes can be represented by the traditional von~Neumann coding (with scope ∅),
        5.2 GENERALIZED TUPLES: Instead of indexing the components of a tuple by (consecutive) natural numbers, one could index them by arbitrary, distinct labels. The same XST representation still works; use the labels as scopes. This provides, for example, a convenient way to deal with what are often called records in computing. Records have fields, in which data are inserted. We can represent them set-theoretically by taking the field names as scopes with the data as elements. Similarly, we can represent relations, in the sense of relational databases, by sets of generalized tuples, one for each row of the relation. The scopes for such a generalized tuple would be the attribute names, while the corresponding elements would be the values, in that row, of the attributes.
        5.3 FUNCTIONS and MULTI-FUNCTIONS (functions defined by set behavior): In general, a set f of generalized tuples, all having the same scope set D, can be regarded as describing several (possibly multi-valued) operations, called the behaviors of f.
  2. [Boy73] Boyce, R. F.; Chamberlin, D. D.; King, W. F.; Hammer, M. M.: Specifying Queries as Relational Expressions: SQUARE - IBM Technical Report RJ 1291, 1973 ♦ This paper presents a data sub-language called SQUARE, intended for use in ad hoc, interactive problem solving by non-computer specialists. SQUARE is based on the relational model of data, and is shown to be relationally complete; however, it avoids the quantifiers and bound variables required by languages based on the relational calculus. Facilities for query, insertion, deletion, and update on tabular data bases are described. A syntax is given, and suggestions are made for alternative syntaxes, including a syntax based on English key words for users with limited mathematical background.
  3. [Cer10]↑ Cerf, V.: "It's like 1973 for Moving Data Around in the Cloud" Using a cloud computing service may sound enticing, but you better consider how that data can be moved around if you want to switch to a different provider. It's a big problem that now has the attention of Vint Cerf, who is calling for standards to define how customer data gets passed between different cloud service providers.
  4. [Cha74] Chamberlin, D. D.; Boyce, R. F.: SEQUEL: A Structured English Query Language - IBM Research Laboratory, 1974 ♦ ABSTRACT: In this paper we present the data manipulation facility for a structured English query language (SEQUEL) which can be used for accessing data in an integrated relational data base. Without resorting to the concepts of bound variables and quantifiers SEQUEL identifies a set of simple operations on tabular structures, which can be shown to be of equivalent power to the first order predicate calculus. A SEQUEL user is presented with a consistent set of keyword English templates which reflect how people use tables to obtain information. Moreover, the SEQUEL user is able to compose these basic templates in a structured manner in order to form more complex queries. SEQUEL is intended as a data base sub language for both the professional programmer and the more infrequent data base user.
  5. [Cha01] Champion, M.: XSP: An Integration Technology for Systems Development and Evolution - Software AG - 2001 ♦ The mathematics of the relational model is based on classical set theory, CST, and this is both its strength and its weakness. An "extended set theory", XST, can be used to model ordering and containment relationships that are simply too "messy" to handle in classical set theory and the formalisms (such as relational algebra) that are based on it.
  6. [Cod70] -a-b Codd, E. F.: A Relational Model of Data for Large Shared Data Banks CACM 13, No. 6 (June) 1970 ♦ Abstract: Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal representation).
  7. [Cod81] Codd, E. F.: A Relational database: a practical foundation for performance CACM 25, No. 6 (June) 1970 ♦ The 1981 ACM Turing Award Lecture: "The relational model calls not only for relational structures (which can be thought of as tables), but also for a particular kind of set processing called relational processing. Relational processing entails treating whole relations as operands."
  8. [Ell15] Ellis, T.: Extended Set Theory: A Summary - 2015 ♦ Extended Set Theory (XST) was originally developed under an ARPA contract to address the limitations of Classical Set Theory (CST). At the root of the problem is the set membership definition of CST, which defines sets and the results of set operations based on a single membership condition: content.
  9. [Fay13] Fayyad, U. M.: Big Data Everywhere, and No SQL in Sight SIGKDD Explorations, Volume 14, Issue 2 - 2013 ♦ "The term BigData is not well-defined, but is generally used to refer to the type of data that breaks the limits of traditional data storage and management state-of-the-art."
  10. [Har08] Harizopoulos,~S.; Madden,~S.; Abadi,~D.; Stonebraker,~M.: OLTP Through the Looking Glass, and What We Found There SIGMOD'08, June 9-12, 2008 ♦ Over 90% of an application process is indexed-access overhead.
  11. [Lar09] Larsen, S. M.: The Business Value of Intelligent Data Access - March 2009 ♦ Article provides an excellent description on how difficult it is to optimize data access paths. "For a wide range of reasons, designing for and maintaining optimal data access poses a genuine challenge to even the most sophisticated enterprises." p. 2
  12. [Lig07] a b c Lightstone, S; Teorey, T.; Nadeau, T.: Physical Database Design - Morgan Kaufmann, 2007 ♦ A comprehensive analysis of how complicated the physical database design process can be without the guidance of a formal data access model. Without such formal support physical file data access structures typically impede system performance. [excerpts below]
      a) File data access strategies are extremely difficult to optimize. "Some computing professionals currently run their own consulting businesses doing little else than helping customers improve their table indexing design." Their efforts can improve query performance by as much as 50 times. (p. 2)
      b) Files are physical representations of data. Tables are logical representations of data. "Tables are Files"? (p. 7)
      c) An index is data organization set up to speed up the retrieval of data from tables. In database management systems, indexes can be specified by database application programmers. (p. 8)
      d) "It is important to be able to analyze the different paths for the quality of the result, in other words, the performance of the system to get you the correct result and choose the best path to get you there." (p. 31)
      e) A block (or page) has been the basic unit of I/O from disk to fast memory (RAM), typically 4~KB in size. In recent years, prefetch buffers (typically 64~KB, as in DB2) have been used to increase I/O efficiency. (p. 371)
      f) The total I/O time for a full table scan is computed simply as the I/O time for a single block, or prefetch buffer (64~KB), times the total number of those I/O transfers in the table. (p. 372)
  13. [Man14] Manoochehri, M.: Data Just Right - Introduction to Large-Scale Data & Analytics) Addison-Wesley, 2014. ♦ Large-scale data analysis is now vitally important to virtually every business. Mobile and social technologies are generating massive datasets; distributed cloud computing offers the resources to store and analyze them; and professionals have radically new technologies at their command, including NoSQL databases. .
  14. [MICRO] MICRO RDBMS 1972-1998; User Manual: MICRO A Relational Database Management System 1992 ♦ MICRO supported timesharing commercial applications from 1972 to 1998. It was the first system to use set-theoretic operations to create and manage stored data. MICRO managed data with a mathematically reliable data storage and access system (STDS*) which used no index structures, required no schema design, provided non-destructive updates, and supported both structured and semi-structured data.
  15. [MSQL] SQL Server Performance Team: Great New TPC-H Results with SQL Server 2008 17 Aug. 2009 ♦ "HP also published a 300GB result on their ProLiant DL785 platform with SQL Server 2008. This publication illustrates the high performance and solid price/performance using industry standard components from HP and Microsoft." (Load time: 300GB in 13.33 hours)
  16. [Nor10] North, K.: ♦ Three articles presenting a short historical perspective on the role of set theory, mathematically sound data models, and the importance of data independence. - 2010
    PART I: Sets, Data Models and Data Independence
    PART II: Laying the Foundation: Revolution, Math for Databases and Big Data
    PART III: Information Density, Mathematical Identity, Set Stores and Big Data
  17. [Sk57] a b c Skolem, T.: Two Remarks on Set Theory (The ordered n-tuples as sets) MATH. SCAND, 5 (1957) 40-46 ♦ Skolem concludes: "I shall not pursue these considerations here, but only emphasize that it is still a problem how the ordered n-tuple can be defined in the most suitable way."
  18. [Sto5] Stout, R.: Information Access Accelerator Information Builders Inc. - 2005 ♦ Slide presentation explaining a 40 to 1 performance improvement over commercial DBMSs by using a structured set access interface between applications and storage.
  19. [Sto7] Stonebraker, M.; Madden, S.; Abadi, D.; Harizopoulos, S.; Hachem, N.; Helland, P.: The End of an Architectural Era (It's Time for a Complete Rewrite) 33rd International Conference on Very Large Data Bases, 2007. ♦ "The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for tomorrow’s requirements, not continue to push code lines and architectures designed for yesterday’s needs."
  20. [Sto+14] Stonebraker, M., et. al.: Enterprise Data Applications and the Cloud: A Difficult Road Ahead Proc IEEE IC2E, Boston, Ma., March 2014 ♦ Paper succinctly delineates potential growing pains for future DBMS, if developers continue to rely on physically dependent data access technologies. "There is considerable interest in moving DBMS applications from inside enterprise data centers to the cloud, both to reduce cost and to increase flexibility and elasticity. In some circumstances, achieving good DBMS performance on current cloud architectures and future hardware technologies will be non-trivial. In summary, there is a difficult road ahead for enterprise database applications."
  21. [Sto14] Stonebraker, M.: Hadoop at a Crossroads? BLOG@CACM, August, 2014. ♦ Persistent use of file systems perpetuating use of physical data location access strategies will present serious challenges for future system developers. "Hiding the (physical) location of data from the DBMS is death, and the DBMS will go to great lengths to circumvent this feature."
  22. [Teo11] Teorey, T.;Lightstone, S; Nadeau, T.Jagadish, H. V.: Database Modeling and Design Morgan Kaufmann, 2011, Fifth Edition. ♦ Many in the industry consider this to be the best book available on classic database design and for explaining how to build database applications, complemented with objective commentary. For example in Chapt. 8: ``In short, transferring data between a database and an application program is an onerous process, because of both difficulty of programming and performance overhead."


Bibliography

  1. Feasibility of a Set-theoretic Data Structure A general structure based on a reconstituted definition of relation IFIP Cong., Edinburgh Scotland, August 1968 ♦ This antique paper presented the thesis that mathematical control over the representation, management, and access of data was critical for the functional freedom of applications and I/O performance of future systems.
  2. Description of a Set-theoretic Data Structure: AFIPS fall joint computer conference San Fransico CA, December 1968 ♦ Presents early development of STDS, a machine-independent set-theoretic data structure allowing rapid processing of data related by arbitrary assignment.
  3. Extended Set Theory: A General Model For Very Large, Distributed, Backend Information Systems VLDB 1977 (Invited paper) ♦ ABSTRACT Three distinct components comprise an Information System: INFORMATION MANAGEMENT, DATA MANAGEMENT, and STORAGE MANAGEMENT. Until recently, all three have been subsumed under data management. As applications become more demanding, as support criteria become more complex, and as storage capacity becomes very large, the need for functional independence of these three management areas has become more apparent. Recognition of this situation has been popularized through the phrase, "data independence", or more precisely, "data independence from information" and "data independence from storage".
         The difficulty in achieving data independence arises through the incompatibility of a complex information space being supported by a simple storage space. The popular, but limiting approach, has been to force the information space into a restrictive record space. This achieves a deceptive compatibility allowing only the appearance of data independence at the user level. This record oriented approach has become pervasive for small databases even though it constrains user applications, requires substantial storage overhead, and imposes inherent processing inefficiencies.
         As databases become very large and as distributed systems become desirable the need for inherent (not superficial) data independence becomes crucial. This paper is intended as a tutorial and will describe conditions for data independence and summaries the concepts of Extended Set Theory as a general model for expressing information systems embodying data independence. This generality will be demonstrated by considering some major problems pertinent to the design and support of very large, distributed, backend information systems.
         It should be emphasized that Extended Set Theory is a formalism for expressing solutions and is not a specific solution in itself. Though "redundant membership condition", "distributed membership condition", and "set-theoretic interface" may be new concepts, Extended Set Theory does not preclude any current DBMS concepts, data structures, or existing implementations. Rather, Extended Set Theory embraces them all under a unifying model.
  4. PEBBLE PILES & INDEX STRUCTURES: A Parable - 2005 ♦ Imagine trying to convince ancient sheepherders (who calculated and compared their wealth by equating piles of pebbles to the size of herds of sheep) that a piece of parchment with numbers on it would give the same results as a pile of pebbles, but that would be more accurate, easier to use, faster in processing, less resource intensive, and more portable? Now imagine trying to convince modern database researchers that the mathematical identity of records can be used to replace the physical indexing of records.
  5. 1984 VLDB Panel: Inexpensive Large Capacity Storage Will Revolutionize The Design Of Database Management Systems Proceedings of the Tenth International Conference on Very Large Data Bases. Singapore, August, 1984 ♦ As secondary storage devices increase in capacity and decrease in cost, current DBMS design philosophies become less adequate for addressing the demands to be imposed by very large database environments. Future database management systems must be designed to allow dynamic optimization of the I/O overhead, while providing more sophisticated applications involving increasingly complex data relationships.
  6. A Mathematical Foundation for Systems Development - NATO-ASI Series, Vol. F24, 1986 ♦ Paper presents a Hypermodel syntax for precision modeling of arbitrarily complex systems by providing a function space continuum with explosive resolution and extended set notation to provide generality and rigor to the concept of a Hypermodel.
  7. Managing Data Mathematically:   Data As A Mathematical Object: ♦ "Using Extended Set Theory for High Performance Database Management" Presentation given at Microsoft Research Labs. with an introduction by Phil Bernstein. (video: duration 1:10:52) - 2006
  8. Why Not Sets? - 2010 ♦ Sets are well defined collections of uniquely identifiable items. Data used by computers are well defined collections of items representing situations of interest. Computers themselves are just well defined collections of bits that change value over time. It would seem that all computer processing is highly set oriented. Why are sets not more widely used in modeling the behavior and assisting the development of computing systems?
  9. Functions as Set Behavior: A Formal Foundation Based On Extended Set Axioms - 2011 ♦ ABSTRACT: The term function suggests an action or process or behavior of something applied to something. Within the framework of extended set theory, XST, the concept of a function will be defined in terms of a liberal definition of morphism. Which in turn will be equated with the behavior of how specific sets interact with each other. It will be shown that all Classical set theory, CST, graph based function behavior can be expressed in terms of XST function non-graph based behavior; that the behavior of functions applied to themselves is supported; and that the concepts of Category theory can be subsumed under XST. A notable consequence of this approach is that the use of functions need no longer be constrained by properties of a Cartesian product.
  10. XSP TECHNOLOGY: Theory & Practice: Formal Modeling & Practical Implementation of XML & RDM Systems - 2011 ♦ INTRODUCTION XSP (extended set processing) Technology introduces: [1] a formal mathematical foundation, extended set theory, XST, for defining and mapping user application expectations into internal machine representations; and [2] practical software implementations based on XST that assist in the design, development, and use of XML and RDM systems.
  11. SET-PROCESSING AT THE I/O LEVEL: A Performance Alternative to Traditional Index Structures - 2011 ♦ ABSTRACT It is generally believed that index structures are essential for high-performance information access. This belief is false. For, though indexing is a venerable, valuable, and mathematically sound identification mechanism, its logical potential for identifying unique data items is restricted by structure-dependent implementations that are extremely inefficient, costly, functionally restrictive, information destructive, resource demanding, and, most importantly, that preclude data independence. A low-level logical data access alternative to physical indexed data access is set-processing. System I/O level set-processing minimizes the overall I/O workload by more efficiently locating relevant data to be transferred, and by greatly increasing the information transfer efficiency over that of traditional indexed record access strategies. Instead of accessing records through imposed locations, the set-processing alternative accesses records by their intrinsic mathematical identity. By optimizing I/O traffic with informationally dense data transfers, using no physical indexes of any kind, low-level set-processing has demonstrated a substantial, scalable performance improvement over location-dependent index structures.
  12. SET-STORE DATA ACCESS ARCHITECTURES: For High Performance Informationally Dense I/O Transfers - 2011 ♦ ABSTRACT For high performance analytic processing of vast amounts of data buried in secondary storage, traditional performance strategies generally advocate minimizing system I/O. This paper advocates the converse supported by the use of set-store architectures. Traditional row-store and column-store architectures rely on mechanical data models (based on an imposed physical representation of data) for accessing and manipulating system data. Set-store architectures rely on a mathematical data model (based solely on the intrinsic mathematical identity of data) to control the representation, organization, manipulation and access of data for high performance informationally dense parallel I/O transfers.
  13. iXSP Interactive Extended Set Processor: Programmer’s Manual, User’s Tutorial and Mathematical Backgrounds - v1.7 (44 pages) Draft 2013 ♦ This manual has been produced for those interested in an introduction to the Extended Set Processor from IIS and the commands that the processor supports.
  14. I/O Technology For Big Data: Massively Parallel Data Access of Big Data Environments - 2011 ♦ Traditional I/O technology is based on the storage and retrieval of records, records that are physically preserved in storage. Set processing I/O technology is based on the exchange of collections (sets) of records, records which may or may not physically exist in storage. Given the advances in hardware platforms, set processing I/O technology can offer one to three orders of magnitude better system performance than traditional I/O technology.


Copyright © 2015   INTEGRATED INFORMATION SYSTEMS  « Last modified on 07/27/2015 »
-  CONTACT -