CLOUD DATA
INTEGRATION
  • Data Independence -
    no shared knowledge of data organizations between processing components.


  • Data Integrity -
    constructive updating, not destructive updating.


  • I/O Performance -
    parallel I/O with informationally dense I/O buffers.


  • Cloud Computing -
    support for disparate applications sharing distributed data.


  • Fault Tolerance -
    support for disparate applications sharing distributed data.


  • Interoperability -
    support for disparate applications sharing distributed data.
                              






SOFTWARE

A large collection of XSP subroutines is available for research &   experimentation.

  • iXSP -
    An interactive interface for XSP functions.


  • STDS* -
    Low level I/O routines used to support an RDBMS.


  • XSTDS* -
    Set processing I/O management of massively parallel I/O access paths.
                              








SERVICES

Much of the value of XSP comes from experience and demonstrations of practical results.

  • XSP Analysis -
    Modeling existing systems in terms of XST to determine optimal behavior given a chosen platform.


  • XSP Functions -
    Derivation of set-theoretic descriptions for new applications.


  • Data Analysis -
    On site use of iXSP for data validation and discovery.
                              






RESEARCH

Many years of research results are available, much of which is in the public domain.

  • Cluster Computing -
    Synergistic data access for disparate applications with highly distributed data sources.


  • Category Theory -
    Modeling systems behavior in terms function spaces.


  • Intelligent Storage -
    Set processing data access operations embedded in storage devices.
                              

  XSP TECHNOLOGY  
Set  Processing  Data  Accessing  Technology

A Systems Integration Technology

Possibly the greatest deficiency in software systems development technology is the lack of any mathematical guidance for integrating data across system components.

INTRODUCTION
New more demanding software applications and new more powerful hardware platforms are stressing the capabilities of traditional system data access strategies. Business demands for integrated hardware-software products delivering far greater performance, requiring dramatically shorter installation and data load times, all with bankable system reliability can not be accommodated by traditional system development models. These demands are further exacerbated by the next-generation enterprise expectations for business analytics, cloud computing, fault tolerance, rapid access to staggeringly large databases, and a soaring volume of queries.

Access Operations Not Access Paths
Traditional system architectures are constrained by an element-at-a-time, von Neumann bottleneck, data access strategy. This access strategy is not ideal for supporting diverse applications simultaneously accessing large quantities of distributed data, which are best served by a collection-at-a-time data access strategy. Nor can traditional element-at-a-time data access strategies support the next-generation requirements for reliable, high performance data access, but collection-at-a-time set processing data access strategies can.

  • Traditional Data Access Strategies
    • Very Poor I/O Performance - RECORD ACCESS.
    • Endangered Data Integrity - DESTRUCTIVE UPDATES.
    • Structured-Data Access Paths - PHYSICAL DEPENDENCE.
  • Set Processing Data Access Strategies
    • Highly Optimized I/O Performance - SET ACCESS.
    • Inviolate Data Integrity - CONSTRUCTIVE UPDATES.
    • Relevant-Data Access Operations - LOGICAL DEPENDENCE.
Traditional architectures use structures to find relevant data. Set processing architectures use operations to extract relevant data. The primary advantage of set processing data access operations over traditional data access structures is I/O performance. When accessing data 100 to 1,000 times faster provides a time critical response of value, set processing should be a consideration. A side benefit of set processing mathematical foundations is protection of data integrity and flexibility in restructuring data for different application needs.

I/O Performance Potential
The I/O performance constraints, imposed by the von Neumann style data access strategy, have forced developers to avoid I/O at all costs. Even when just switching from using 16KB I/O buffers, having a sustained DTR of 7.6MB/s, to using 32MB I/O buffers, having a sustained DTR of 305MB/s, could improve I/O throughput by a factor of 40. - Buffer.png - Consider a record access oriented I/O strategy relying on secondary data index structures for locating records. Typically, half of an I/O buffer needs to be reserved for overflow, updating and other system support requirements, forcing twice the number of I/O transfers. Of the data records in the other half, only 50% are likely to contain records of application interest, and only about 10% of those records will be relevant data, forcing 20 times the number of required I/O transfers. Thus a typical record access oriented I/O buffer may contain only 2.5% relevant data, requiring 40 I/O transfers to access a full I/O buffer of relevant data. If a 32MB I/O buffer containing 100% relevant data were to replace a 16KB I/O buffer with 2.5% relevant data, the performance improvement would be a factor of 1600, a 3.2 order of magnitude. Experience using set processing oriented I/O strategies on commercial applications has shown that nearly 100% of an I/O buffer contains relevant data. By providing multiple parallel access paths (24 to 96), experiments have shown I/O throughput can be improved to the point that system processing now becomes the performance bottleneck.

Future Systems
To tap the performance potential latent in existing and future hardware platforms, pre-structured data access paths (mechanical data model) should be replaced with adaptive data access operations (mathematical data model). This mathematical data modeling replacement for the traditional mechanical data modeling can allow developers of future systems to approach the performance potential of any given hardware platform for any given mix of applications.

SYSTEM ARCHITECTURES
It is generally assumed that the performance constraints imposed by the von Neumann I/O bottleneck can only be eliminated by a redesign and replacement of existing hardware. - SAbc.png - However, a case can be made that the von Neumann bottleneck, as it manifests itself in today's architectures, is not a hardware design issue, but rather a hardware usage issue. Consider the basic components of a system architecture and how these components interact in the exchange of data. An application processing component declares what data is needed to be extracted from some collection of available data. The performance challenge is in evoking a data access strategy, DAS, that best fulfills the application request. - IDIODT.png - Ideally, such a DAS would provide informationally dense I/O data transfers providing the application with just that data needed and pre-structured for best application processing. Since structurings imposed on data for best execution are seldom best structurings for the preservation and rapid access of repository data, any DAS that stores data in a form required by an application has to present a performance compromise. Without the ability to independently separate and manage application processing structures from storage preservation and access structures the DAS will remain the performance bottleneck of system architectures.

System Data Models
Though there are three distinct and independent system data management components, the pervading system data models lump all three under a single representation database model, such as the RDM. Not being able to distinguish data access modelings, from storage organization modelings, from application processing modelings ensures data representation dependence between all three, thus severely limiting the DAS performance options. Only by modeling the three system components independently can truly data independent high-performance systems be built.

  • Data Processing Models: Application programmers need to manipulate data as conveniently, efficiently, and as reliably as possible. Requisite data needs to be delivered in a form best suiting a specific application's processing need. Application data processing models are abundant and mature.
  • Data Accessing Models: In principle, data accessing is quite simple, just return (to a requesting application) that relevant portion of available data in a form best suited for application processing. To model this data reduction and transformation process requires an ability to distinguish data content from data structuring. Since such a modeling capability has not yet been adopted by developers, the only available models are those supporting one-to-one archiving and retrieval of application formated data.
  • Data Storage: There is no particular requirement for the representation and organization of stored data, so long as it is persistent, can be easily located, and efficiently trundled off to a new location.

Record Accessing Architectures
Traditional record-accessing architectures rely on a single data model to dictate how data is represented and organized for application processing, data accessing, and data storage. There is no discernible modeling recognition of the need to preserve data independence between basic system components. Without such a recognition it is very difficult to develop a DAS that best services the needs of both the application and the storage components.

Set-Processing Architectures
Set-processing architectures are distinguished from record-accessing architectures only by the mathematical nature of the DAS model. Processing data and storage data can be represented and organized in any way desired by developers. The only caveat is that mathematical identity of these data representations be known to the DAS model. Since the prevailing application model is the RDM and since record arrays have a well-defined mathematical identity under XST, current application programs and existing data storage repositories could be reunited for better performance under an XSP-DAS.

Set-Store Data Access
Though record arrays are readily compatible with an XSP-DAS, they are less than ideal for supporting informationally dense I/O data transfers. Even though column-stores - RCS.png - can provide dramatic performance improvements over row-store architectures, they can be dramatically out performed by set-store architectures. When a set-store architecture implementation was pitted against both an IBM and an Oracle DBMS in a Rapid Information Access comparison, the set-store architecture performed quite favorably.

SSDAM: Set-Store Data Access Model
RDBMSs are a proven reality that mathematical modeling of data has practical applications. However, the mathematical modeling employed is only intended to model abstract relationships as perceived by users and not, in any way, to acknowledge the underlying system data access mechanisms. But without a companion data access model developers have had to rely on array storage models based on familiar row-column record access technologies. Thus every application is currently wedded to a specific store and fetch data access mechanism.

To support future robust cloud computing the use of data fetching networks needs to be replaced with information accessing networks. This can only happen if individual applications are divorced from their dependence on a specifically tailored data access engine and share a universal engine that provides just the data required, in exactly the form required, and within the time required. Of course, no such data access engine currently exists, but if one ever does, a Set-Store Data Access Model would provide a good architectural foundation.

CURRENT RESEARCH
The formal modeling of system architectures involves two synergistic activities. First, the modeling tool, XST foundations, needs to be continually explored for discovery of mathematical truths that can be translated into operations and algorithms for more productive access and processing in system architectures. Second, practical application of XSP operations and access strategies requires development of proof of concept implementations.

Lambda Calculus & Category Theory
Though the Lambda calculus and Category theory have been the academic's faithful consort for many years, productive results have not greatly influenced the implementation of high-performance DAS implementations. They both have fundamental deficiencies precluding either of them from supporting a set-theoretic modeling of the functionality and projected performance of system architectures. By showing that the conditions and rules for defining categories can be defined under XST allows category derived precepts to carry over into set-theoretic system modelings. By defining functions as the behavior of sets instead of as binary-set objects, allows f(x) for "x" a value and for "x" a behavior. This allows f(f) to be defined set-theoretically. Lambda calculus also requires the use of Currying to support f(a,b,..z), but with a Skolem satisfactory definition of n-tuples it can be directly defined in XST.

XSP Data Access Engines
Current XSP software recognizes all physical data structures as having an XST identity. Thus XSP defined operations can access already existing databases. An experimental COTS hardware platform having 6 parallel I/O ports, 10TB of storage, and costing under $20,000 is projected to load the TPC-H 1000GB database in under 15 minutes and execute the 22 query suite in about 1.5 minutes. The XSP-DAS operations used form a small system I/O kernel that can be implemented for any popular operating system and COTS platform. It is hoped that these operations find enough interest in the computer community that they eventually become integral to future system architectures.


REFERENCES

Micro DBMS
Micro was one of the earliest set theoretic/relational database management systems. Its major underpinnings and algorithms were based on the set-theoretic model of D L Childs of the University of Michigan's CONCOMP (Conversational Use of Computers) Project. It was also influenced to a lesser extent by the relational model made famous by Edgar F. Codd, a research scientist at IBM. It used a natural language interface which allowed non-programmers to use the system.

XSP TECHNOLOGY Theory & Practice
Formal Modeling & Practical Implementation of XML & RDM Systems: Every technology must have a sound underlying theory to support the consistency and predictability of the methods promoted by the technology. In this context, the term theory is respected as an articulation of a body of rules governing the relationships and behavior of objects in a specific system of interest. - D L Childs

DATA REPRESENTATIONS AS MATHEMATICAL OBJECTS
Considering Content Compatibility of Relational & XML Data Representations: The theme of this paper is to treat all data representations as mathematical objects instead of as physical structures. - D L Childs

VLDB 1977 (Invited paper, abstract)
Extended Set Theory: A General Model For Very Large, Distributed, Backend Information Systems: As databases become very large and as distributed systems become desirable the need for inherent (not superficial) data independence becomes crucial. This paper is intended as a tutorial and will describe conditions for data independence and summaries the concepts of Extended Set Theory as a general model for expressing information systems embodying data independence. This generality will be demonstrated by considering some major problems pertinent to the design and support of very large, distributed, backend information systems. - D L Childs

SET PROCESSING AT THE I/O LEVEL
A Performance Alternative to Traditional Index Structures: It is generally believed that index structures are essential for high-performance information access. This belief is false. For, though indexing is a venerable, valuable, and mathematically sound identification mechanism, its logical potential for identifying unique data items is restricted by structure-dependent implementations that are extremely inefficient, costly, functionally restrictive, information destructive, resource demanding, and, most importantly, that preclude data independence. A low-level logical data access alternative to physical indexed data access is set processing. System I/O level set processing minimizes the overall I/O workload by more efficiently locating relevant data to be transferred, and by greatly increasing the information transfer efficiency over that of traditional indexed record access strategies. Instead of accessing records through imposed locations, the set processing alternative accesses records by their intrinsic mathematical identity. By optimizing I/O traffic with informationally dense data transfers, using no physical indexes of any kind, low-level set processing has demonstrated a substantial, scalable performance improvement over location-dependent index structures. - D L Childs

Introduction To A MATHEMATICAL FOUNDATION FOR SYSTEMS DEVELOPMENT
A Hypermodel Syntax for Precision Modeling of Arbitrarily Complex Systems: This paper focuses on resolving three specific system development issues. The approach is to introduce the concept of a Function Space Architecture as a new methodology to system design. The basic architectural unit of this new methodology is a Function Space which can provide as much or as little detail as a specific instance requires. Coverage will include: the Function Space aa a unit OF architecture For general communication and design detail; Structure Independent Architectures as an archi&tural design guide far reliable and productive systems; the Hypermodel to provide the Function Space continuum with explosive resolution; and Extended Set Notation to provide generality and rigor to the concept of a Hypermodel. NATO-ASI Series, Vol. F24, 1986 , - D L Childs

AXIOMS FOR AN EXTENDED SET THEORY   A Formal Foundation for Unified Modeling of Mathematical Objects
This brief paper introduces extensions to ZFC axioms intended to accommodate the representation, manipulation, and behavior of a larger body of mathematical objects then are currently allowed by ZFC axioms alone. ZFC extensions support: a Skolem-suitable definition for n-tuples, infinitely nested sets, an escalating hierarchy of arbitrarily large sets, constructive definitions for categories and functors, and functions defined as the behavior of interacting sets. - D L Childs

SET-STORE DATA ACCESS ARCHITECTURES   For High Performance Informationally Dense I/O Transfers
Designing the physical data representation and organization for optimal I/O performance is generally considered to be a complex and difficult task. There are good reasons why this is so: mechanical data models. There are also equally good reasons why this need not be so: mathematical data models. Traditional row-store and column-store architectures rely on mechanical data models for accessing and manipulating data by its physical properties. Set-store architectures rely on mathematical control of the organization, manipulation and access of data to optimize I/O performance. - D L Childs

FUNCTIONS DEFINED BY SET BEHAVIOR A Formal Foundation Based On Extended Set Axioms
ABSTRACT. The term function seems to connote a sense of action or process or behavior of something applied to something. Within the framework of extended set theory, XST, the concept of a function is defined as a behavior of sets in terms of how specific sets react subject to their interaction with other sets. In particular, f(σ): A → B asserts that the set ‘f ’ behaves as a function under set ‘σ’ in relating an individual member of the function domain, set ‘A’, to exactly one member of the function codomain, set ‘B’. It is shown that all Classical set theory, CST, graph based function behavior can be expressed in terms of XST function non-graph based behavior; that the behavior of functions applied to themselves is supported; and that the concepts of Category theory can be subsumed under XST. A notable consequence of this approach is that the mathematical properties of functions need no longer be dependent on the mathematical properties of a Cartesian product. - D L Childs

INTERPRETING EXTENDED SET THEORY IN CLASSICAL SET THEORY
ABSTRACT. We exhibit an interpretation of the Extended Set Theory proposed by Dave Childs in classical Zermelo-Fraenkel set theory with the axiom of choice and an axiom asserting the existence of arbitrarily large inaccessible cardinals. In particular, if the existence of arbitrarily large inaccessible cardinals is consistent with ZFC, then Extended Set Theory is also consistent. Mathematics Department, University of Michigan, Ann Arbor, MI 48109-1043, U.S.A. , A. Blass

TWO REMARKS ON SET THEORY  (The ordered n-tuples as sets)
Skolem concludes: "I shall not pursue these considerations here, but only emphasize that it is still a problem how the ordered n-tuple can be defined in the most suitable way." MATH. SCAND, 5 (1957) 40-46 , T. Skolem

INFORMATION ACCESS ARCHITECTURES
A slide presentation on Structure-Dependent & Operation-Centric Data Access Methods.

THE INFORMATION ACCESS ACCELERATOR
Ralph Stout's de-mystifying account of the relevance of XSP Technology to real world information access needs.

MANAGING DATA MATHEMATICALLY:   Data As A Mathematical Object
"Using Extended Set Theory for High Performance Database Management" Video of presentation given at Microsoft Research Labs. with an introduction by Phil Bernstein. (duration 1:10:52)  [Version with slides]

WHY SETS?
ABSTRACT: Sets play a key role in foundations of mathematics. Why? To what extent is it an accident of history? Imagine that you have a chance to talk to mathematicians from a far-away planet. Would their mathematics be set-based? What are the alternatives to the set-theoretic foundation of mathematics? Besides, set theory seems to play a significant role in computer science; is there a good justification for that? We discuss these and some related issues. - A. Blass, University of Michigan, Ann Arbor, MI; Y. Gurevich, Microsoft Research, Redmond, WA

WHY NOT SETS?
ABSTRACT: Sets are well defined collections of uniquely identifiable items. Data used by computers are well defined collections of items representing situations of interest. Computers themselves are just well defined collections of bits that change value over time. It would seem that all computer processing is highly set oriented. Why are sets not more widely used in modeling the behavior and assisting the development of computing systems? The following dialogue will attempt to amplify this question, though neither of the participants has a clue to the answer. - D L Childs

Information Access Intensive Systems
ABSTRACT: Performance demands of information access systems differ greatly from those of traditional record retrieval systems. The underlying technology supporting database management systems is antithetical to the needs and requirements of information access systems. This short paper will provide a summary of the necessary performance requirements for information access systems emphasizing why and how they differ from those of record retrieval systems. - D L Childs


Copyright © 2011   INTEGRATED INFORMATION SYSTEMS  « Last modified on 09/06/2011 »
-  CONTACT -