Entries by Steve Holcombe (178)
Scientific American: Can You Spare Some DNA?
In an aside to Traces of a Distant Past published in the July, 2008 issue of the Scientific American, author and Senior Editor, Gary Stix wrote:
"No matter what assurances are given, some groups will be reluctant to yield a cheek swab or blood sample. Investigators in this field may never achieve their goal of obtaining a set of samples that fully reflects every subtle gradation of human genetic diversity."
See specifically the side comment entitled Can You Spare Some DNA? at the top of page 8, below.
I've e-mailed the following to Gary Stix.
From: "Steve Holcombe"
Sent: Sunday, August 17, 2008 12:02 PM
To: editors@sciam.com
Subject: For GaryStix re "Can You Spare Some DNA"
Gary,
You emphasize a very important point in this sub-article.
Would you have interest in an article exploring the movement from a documents web (the current web) to that of a data web (Web 3.0, semantic web)?
With the movement toward a data web there will be greater opportunities for 'data ownership' as defined by the actual information producers. The emergence of a data web should provide opportunities for ameliorating resistance to the sharing of genetic bio-material by empowering those who provide their genetic heritage with more direct, technological oversight and control over how the derived information is used, who uses, when they use it, etc.
I'm not saying that all American indigenous tribes would jump on the band wagon in providing their genetic material and information. That is, most people put their money in banks but there will always be a few who only put their money under their mattress, right? But there are technological means arising within the context of a data web that are specifically designed to address personal and societal fear factors that you well point out.
Hope to hear back from you.
Best regards,
Steve
It will be interesting to see if I receive a response from Gary Stix.
The explosive demand for providing global data access and integrated business solutions
The following quoted text (indented at the bullet) is taken from the statement of prior art in US Patent 6,988,109 entitled System, method, software architecture, and business model for an intelligent object based information technology platform and filed in 2001 by IO Informatics, Inc.
- As demand for Information Technology (IT) software and hardware to
provide global data access and integrated business solutions has
exploded, significant challenges have become evident. A central problem
poses access, integration, and
utilization of large amounts of new and valuable information generated
in each of the major industries. Lack of unified, global, real-time
data access and analysis is detrimental to crucial business processes,
which include new product discovery,
product development, decision-making, product testing and validation,
and product time-to-market.
With the completion of the sequence of the human genome and the continued effort in understanding protein expression in the life sciences, a wealth of new genes are being discovered that will have potential as targets for therapeutic intervention. As a result of this new information, however, Biotech and Pharmaceutical companies are drowning in a flood of data. In the Life Sciences alone, approximately 1 Terabyte of data is generated per company and day, of which currently the vast majority is unutilized for several reasons.
First, data are contained in diversified system environments using different formats, heterogeneous databases and have been analyzed using different applications. These applications may each apply different processing to those data. Competitive software, based on proprietary platforms for network and applications analysis, have utilized data platform technologies such as SQL with open database connectivity (ODBC), component object model (COM), Object Linking and Embedding (OLE) and/or proprietary applications for analysis as evidenced in patents from such companies as Sybase, Kodak, IBM, and Cellomics in U.S. Pat. Nos. 6,161,148, 6,132,969, 5,989,835, 5,784,294, for data management and analysis, each of which patents are hereby incorporated by reference. Because of this diversity, despite the fact, that the seamless integration of public, legacy and new data is crucial to efficient drug discovery and life science research, current data mining tools cannot handle all data simultaneously. There is a significant lack of data handling methods, which can utilize these data in a secure, manageable way. The shortcomings of these technologies are evident within heterogeneous software and hardware environments with global data resources. Despite the fact that the seamless integration of public, legacy and new data is crucial to efficient research (particularly in the life sciences), product discovery (such as for example drug, or treatment regime discovery) and distribution, current data mining tools cannot handle or validate all diverse data simultaneously.
Second, with the expansion of high numbers of dense data in a global environment, user queries often require costly massive parallel or other supercomputer-oriented processing in the form of mainframe computers and/or cluster servers with various types of network integration software pieced together for translation and access functionality as evidenced by such companies as NetGenics, IBM and ChannelPoint in U.S. Pat. Nos. 6,125,383[,] 6,078,924, 6,141,660, 6,148,298, each of which patents are herein incorporated by reference--(e.g. Java, CORBA, "wrapping", XML) and networked supercomputing hardware as evidenced by such companies as IBM, Compaq and others in patents such as for example U.S. Pat. Nos. 6,041,398, 5,842,031, each of which is hereby incorporated by reference. Even with these expensive software and hardware infrastructures, significant time-delays in result generation remain the norm.
Third, in part due to the flood of data and for other reasons as well, there is a significant redundancy within the data, making queries more time consuming and less efficient in their results.
Fourth, an additional consideration, which is prohibitive to change towards a more homogenous infrastructure, is cost. The cost to bring legacy systems up to date, to retool a company's Intranet based software systems, to carry out analysis with existing tools, or even to add new applications can be very expensive. Conventional practices require retooling and/or translating at application and hardware layers, as evidenced by such companies as Unisys and IBM in U.S. Pat Nos. 6,038,393, 5,634,015.
Because of the constraints outlined above, it is nearly impossible to extract useful, relevant information from the entity of data within reasonable computing time and efforts. For this reason, the development of architecture to overcome these obstacles is needed.
These are not the only limitations. With the advent of distinct differentiations in the field of genomics, proteomics, bioinformatics and the need for informed decision making in the life sciences, the state of object data is crucial for their overall validation and weight in complex, multi-disciplinary queries. This is even more important due to inter-dependencies of a variety of data at different states. Furthermore, because biological data describe a "snapshot" of complex processes at a defined state of the organism, data obtained at any time refer to this unique phase of metabolism. In order to account for meaningful comparison, thus, only data in similar states can be utilized. Therefore, there is a growing need for a object data state processing engine, which allows to continuously monitor, govern, validate and update the data state based on any activities of intelligent molecular objects in real-time.
Data translation processes between different data types are time-consuming and require provision of information on data structure and dependencies, in spite of advances in information technology. These processes, although available and used, have a number of shortcomings. Data contained in diversified system environments may use different formats, heterogeneous databases and different applications, each of which may apply different processing to those data. Because of that, despite the fact that the seamless integration of public, legacy and new data is crucial to efficient drug discovery and life science research, several different applications and/or components have to be designed in order to translate each of those data sets correctly. These require significant effort and resources in both, software development and data processing. With the advent of distinct differentiations in the field of genomics, proteomics, bioinformatics and the need for informed decision making in the life sciences, access to all data is crucial for overall validation and weight in complex, multi-disciplinary queries. This is even more important due to inter-dependencies of a variety of data at different states. The current individual data translation approach does not support these needs. Because biological data describe a "snapshot" of complex processes at a defined state of the organism, data obtained at any time refer to this unique phase of metabolism. In order to account for meaningful comparison, thus, only data in similar states can be utilized. The latter requires real-time processing and automated, instant data translation of data from different sources. Therefore, there is a growing need for an object data translation engine, which allows for bi-directional translation of multidimensional data from various sources into intelligent molecular objects in real-time.
The flood of new and legacy data results in a significant redundancy within the data making queries more time consuming and less efficient in their results. There is a lack of defined sets of user interaction and environment definition protocols, which are needed to provide means for intelligent data mining and optimization in result validation towards real solutions and answers. An additional consideration, which is prohibitive to change towards a more homogeneous infrastructure is the missing of object representation definition protocols to prepare and present data objects for interaction within heterogeneous environments. Lastly, data currently are interacted with and presented in diverse user interfaces with dedicated, unique features and protocols preventing universal, unified user access. Thus, a homogeneous, unified presentation such as a web-enabled graphical user interface which integrates components from diverse applications and laboratory systems environments is highly desirable, but currently non-existent for objects in real-time.
Because of these constraints, it is nearly impossible to extract useful, relevant information from the entity of data within reasonable computing time and efforts. For this reason, the development of an architecture and unifying user interface to overcome these obstacles is needed.
The drive towards peer-to-peer mass communications
The following quoted text (indented at the bullet) is taken from the statement of prior art in US Patent 5,706,434 entitled Integrated request-response system and method generating responses to request objects formatted according to various communication protocols and filed in 1995 by Electric Classifieds, Inc.
- Open Systems Interconnection (OSI) Communications Model
As will be appreciated by those skilled in the art, communication networks and their operations can be described according to the Open Systems Interconnection (OSI) model. This model includes seven layers: an application, presentation, session, transport, network, link, and physical layer. The OSI model was developed by the International Organization for Standardization (ISO) and is described in "The Basics Book of OSI and Network Management" by Motorola Codex from Addison-Wesley Publishing Company, Inc., 1993 (First Printing September 1992).
Each layer of the OSI model performs a specific data communications task, a service to and for the layer that precedes it (e.g., the network layer provides a service for the transport layer). The process can be likened to placing a letter in a series of envelopes before it's sent through the postal system. Each succeeding envelope adds another layer of processing or overhead information necessary to process the transaction. Together, all the envelopes help make sure the letter gets to the right address and that the message received is identical to the message sent. Once the entire package is received at its destination, the envelopes are opened one by one until the letter itself emerges exactly as written.
In a data communication transaction, however, each end user is unaware of the envelopes, which perform their functions transparently. For example, an automatic bank teller transaction can be tracked through the multilayer OSI system. One multiple layer system (Open System A) provides an application layer that is an interface to a person attempting a transaction, while the other multiple layer system (Open System B) provides an application layer that interfaces with applications software in a bank's host computer. The corresponding layers in Open Systems A and B are called peer layers and communicate through peer protocols. These peer protocols provide communication support for a user's application, performing transaction related tasks such as debiting an account, dispensing currency, or crediting an account.
Actual data flow between the two open systems (Open System A and Open System B), however, is from top to bottom in one open system (Open System A, the source), across the communications line, and then from bottom to top in the other open system (Open System B, the destination). Each time that user application data passes downward from one layer to the next layer in the same system more processing information is added. When that information is removed and processed by the peer layer in the other system, it causes various tasks (error correction, flow control, etc.) to be performed.
The ISO has specifically defined all seven layers, which are summarized below in the order in which the data actually flows as they leave the source:
Layer 7, the application layer, provides for a user application (such as getting money from an automatic bank teller machine) to interface with the OSI application layer. That OSI application layer has a corresponding peer layer in the other open system, the bank's host computer.
Layer 6, the presentation layer, makes sure the user information (a request for $50 in cash to be debited from your checking account) is in a format (i.e., syntax or sequence of ones and zeros) the destination open system can understand.
Layer 5, the session layer, provides synchronization control of data between the open systems (i.e., makes sure the bit configurations that pass through layer 5 at the source are the same as those that pass through layer 5 at the destination).
Layer 4, the transport layer [also known as the Transmission Control Protocol (TCP) layer], ensures that an end-to-end connection has been established between the two open systems and is often reliable (i.g., layer 4 at the destination "confirms the request for a connection," so to speak, that it has received from layer 4 at the source).
Layer 3, the network layer [also known as the Internet Protocol (IP) layer], provides routing and relaying of data through the network (among other things, at layer 3 on the outbound side an "address" gets slapped on the "envelope" which is then read by layer 3 at the destination).
Layer 2, the data link layer, includes flow control of data as messages pass down through this layer in one open system and up through the peer layer in the other open system.
Layer 1, the physical interface layer, includes the ways in which data communications equipment is connected mechanically and electrically, and the means by which the data moves across those physical connections from layer 1 at the source to layer 1 at the destination.
The particular focus of the following discussion is on media access control (MAC) for communication networks which is performed in the OSI network and data link layers. It will be appreciated by those skilled in the art that various applications and components operating in the other OSI layers may be interchangeably used with the particular MAC described below so long as these applications and components adhere to the OSI design structures. For example, many different OSI physical layer components (e.g., a parallel bus, a serial bus, or a time-slotted channel) can be used in conjunction with the same MAC so long as each of the OSI physical layer components passes the particular information required by OSI design parameters to the OSI data link layer.
Standard Data Communication Network Protocols
Standard data communication network protocols, such as that which is described above, offer new abilities and corresponding technological problems. Standard network protocols suites such as the Transmission Control Protocol/Internet Protocol suite (TCP/IP) allow for the easy creation of transport-layer protocols. These transport protocols and accompanying data languages allow for a diverse set of client applications which communicate using them. This ability to support a diverse client base is a boon to the commercial and mainstream potential of data communications networks because diversity is a necessary condition for a product in the marketplace. Diversity, in some cases being helpful, can also be a problem. The main problem with the diverse client base is that not every client knows every version of every protocol or every dialect of every data language. Thus, coordination between a server and these clients can be a difficult problem. Electronic mail (email) on the Internet, for example, uses Simple Mail Transport Protocol (SMTP) and adheres to the Request for Comments 822 (RFC822) message standard.
Common email clients for the Internet are Udora, Elm, PINE, PRODIGY Mail, and AMERICA ONLINE Mail. The protocols and languages used are all generally the same email clients. However, each client has different properties and capabilities which make the general process of creating and serving objects via email difficult. For example, email clients differ in how wide each line of a message may be (email messages used by the PRODIGY e-mail client are 60 characters wide, whereas those used by the AMERICA ONLINE email client are 49). They also differ in terms of what extensions they support--some email clients support the Multipurpose Internet Mail Extensions (MIME), but most do not. Similarly, the World Wide Web (WWW), an internet-distributed hypermedia (text, image, sound, video) network, primarily uses Hypertext Transfer Protocol (HTTP) and Hypertext Markup Language (HTML) for its communication.
The WWW has the same problem as email with regard to clients. There exist different versions of HTTP and different dialects of HTML, and most clients do not know all of them. For example, Netscape Communications Corporation has put forth its own extensions to HTML for features such as background images behind text and centered text. These features are not official industry standards but are becoming just as good in the sense that they are de facto standards. Another problem of the easy creation of new transport-layer protocols is that new transport-layer protocols and data languages (either standard or custom) continue materializing. These protocols and languages either solve new communications problems (such as HTTP and HTML enhancing the Internet with hypermedia), or better solve older ones (e.g., Sun Microsystems' new HOT JAVA extensions to HTML and the JAVA programming language promise to be the next best feature for the WWW).
The Importance of an Integrated System
With the diversity of data communication protocols and languages, there is a tendency for engineers to try to build a system for each of the many protocols and languages, rather than to build a single system to address all of them. This is because in a certain sense it is easier to write an individual system with a more limited domain than an integrated system with a more expanded domain; one does not have to expend the effort necessary to abstract out the common features across a diverse set. However, it is very expensive to maintain many individual systems and usually much cheaper to maintain a single integrated system. Also, it is much more difficult to maintain feature parity and share data between many individual systems as opposed to within an integrated system. An additional benefit of an integrated system would be that, because the step has already been taken to abstract out common features of various protocols and languages, it is much easier to adapt to newer ones, which will most likely be abstractly similar enough to the older ones to allow for easy integration.
Two-Way Communication and the Drive Towards Peer-To-Peer Mass Communications
New communication models emerging from usage of data communications networks such as the Internet are causing changes in the relationship between data producers and data consumers. It used to be that with the one-to-many, one-way model of television and radio, content was produced by a small elite of entertainer-business persons and was consumed by the masses. The two-way nature of data communications networks, however, allows for the more intimate participation of the consumer. Even to the point of blurring the distinction between consumer and producer: because of the ability to participate, the old consumer can now be both a consumer and a producer on networks such as the Internet. Relationships on data communications networks are much flatter and more egalitarian in the sense that each participant is a peer (both a producer and consumer). One important outgrowth of this elevated status of the individual on a data communications network is that the individual now demands more personalized attention. For example, if a peer provides another with knowledge about him or herself (e.g., geographic location, gender, age, interests), then he or she expects that responses should take this knowledge into account. A peer can no longer be treated like a demographic average as in the world of television and radio.
The Technological Problems in Need of Solution
Given these new data communications networks and their technological and social implications, it is evident that new systems should provide solutions, for example, to the following problem: How does one design an integrated system that uses multiple protocols and data languages and serves data in a way that takes advantage of knowledge about clients and users?
Description of the Prior Art
The ability to create and serve objects has been around ever since the birth of the client/server model. Also, the notion of providing services to general users through data communications networks has been around at least since the early days of VideoTex. However, this technology was limited and was more focused on building specialized VideoTex terminals that fit some particular mold required by particular servers (see, for example, U.S. Pat. No. 4,805,119) rather than being focused on making servers be flexible in handling a variety of client types. This is probably due to the fact that work on VideoTex was done by individuals with the television frame of mind--content would be created in a central way and then broadcast to the masses. The cultural importance of peer-to-peer communications was not fully recognized at that time.
Integrated services platforms have been developed for telephone networks (see, for example, U.S. Pat. No. 5,193,110) but still only focus on telephone calls (essentially a single protocol as opposed to many). In recent years, a number of online service providers--the Prodigy Service Company, America Online, CompuServe, and others--have developed their own means to create and serve objects in a similar vein. Technologically, these companies are not all that different from the older VideoTex companies. They require users to obtain custom software clients which speak custom protocols in order to interact with their servers, even if their software can be used on different personal computers to make their services personal computer-independent (see, for example, U.S. Pat. No. 5,347,632). Because of these custom clients and protocols, these services are mutually incompatible.
What is not addressed by services like these is the growing usage of standard communication protocols and languages (like SMTP, HTTP, and HTML) in providing services to standard clients. Ultimately, this usage of proprietary clients and protocols leads to self-destruction in a marketplace which demands standardization, decentralized control, and diversity. With regard to the personalization of service, attention has lately been paid to the need to support customized content to clients, such as customized television commercials (see, for example, U.S. Pat. No. 5,319,455). But one can see limitations in the philosophy of this work in that clients are still seen as "viewers" in a one-way communications model, rather than as participants in a two-way model. Also, technologies have been developed to abstract data and present it in dynamic ways based upon user parameters (see, for example, U.S. Pat. Nos. 5,165,030 and 4,969,093). However, this effort has not been focused on protocol-independence and language-independence of this abstracted data.[emphasis added]
I am filing this entry in the Reference Library to this blog.
An overview of the control of system resources in computer systems arranged in a client/server configuration
The following quoted text (beginning at the bullet) is taken from the statement of prior art in US Patent 5,689,708 entitled Client/server computer systems having control of client-based application programs, and application-program control means therefor and filed in 1995 by ShowCase Corporation. Some of the quoted paragraphs have been re-positioned for easier readability.
-
"There are several broad types of computer systems. In a mainframe computer system, a single central processor complex running under an operating system executes application programs, stores databases, enforces security, and allocates all resources such as processor time, memory usage, and subsystem processing. Users interact with the system via "dumb" terminals, which essentially only display data and receive keystrokes. Peer-to-peer networks of systems such as personal computers are essentially standalone computers each running similar operating system programs, which can share application programs and data among each other according to a defined protocol. Client/server networks have a central server computer coupled via a communications medium to a number of client computers, usually smaller personal computers running under a conventional operating system. In the earliest client/server network model, the server was only a "file server" which could download data to the clients and upload data from them; application programs were executed entirely in the client computers. That is, the server's function was to store large amounts of data, which could then be shared among a number of smaller clients.
FIG. 1 shows a network 100 of computers 110-140 configured in a [typical] client/server configuration.
Server computer 110 may be any type of system, from a relatively small personal computer (PC) to a large mainframe. In the particular implementation discussed below, server 110 is a mid-range computer, specifically an IBM AS/400 data-processing system. ("IBM", "AS/400," "OS/400," and "400" are registered trademarks of IBM Corp.) Very broadly, the AS/400 has one or more processors 111, memory 112, input/output devices 113, and workstations controllers 114, coupled together by one or more busses 115. WSCs 114 are physically a type of I/O device which interact with multiple terminals over communication facilities 150. A number of conventional facilities, such as APPC (Advanced Program-to-Program Communications) routers and LU6.2 (Logical Unit version 6.2) are available to handle the necessary communications protocols.
Additional devices, represented by block 120, may also be coupled to facilities 150 for interaction with the client terminals and with server 110. As mentioned previously, block 120 may represent one or more complete computer systems functioning as multiple servers in network 100. Communications 150 may assume any of a number of conventional forms, such as cables 151-152, switched telephone lines, wireless devices, and many others.
Client terminals 130, 140 are commonly personal computers (PCs) coupled to facilities 150 by cable 151 to form a local area network (LAN) with server 110. Other arrangements, however, may also be employed in any conventional manner. A typical PC 130 contains a processor 131, memory 132, I/O devices 133, and a port 133 coupled to cable 151. An internal bus 135 interconnects these components with a display 136 for presenting data to a user and a keyboard, mouse, and/or other input devices 137 for receiving data and commands from the user.
Most present client/server networks implement an "application server" model in which some or all application programs are split into two portions. A server portion executes within the server computer, while a separate client portion executes within each client computer from which a user can invoke the application. The two portions employ cooperative processing to pass data between the server and client computers; typically, most of the data is stored in the server. The first major application of this model was the processing of client queries against a central database, so that this model is also sometimes known as a "database server" network. Newer applications sometimes employ the terms "groupware" and "transaction processing" (TP). Advances in technology additionally allow multiple servers in the same network, so that a user at a client computer can choose to sign on to a number of different servers. A third client/server type of network is beginning to emerge; in the "distributed object" model, encapsulated objects containing both data and executable code may roam about the network, run on different platforms, and manage themselves. In this model, clients and servers are not fixed in particular computers: a given computer may be simultaneously a server for one application and a client for another, and a given application may run one computer as its server, and later run another computer on the network as a server.
A network operating system (NOS) mediates communications between servers and clients in a network. An NOS contains a server control module executing within the server computer, and a client control module executing within each client computer. These control modules cooperate with each other to transfer data (and sometimes code or other objects) over the network's communications medium between the servers and particular clients. They provide interfaces to the operating systems and application-program portions running in the client and server.
A client/server network contains system resources which can be shared by some or all of the other computers in the network. One or more server processors, for example, are scheduled among the tasks of different users. Memory pools of various sizes are allocated to tasks being executed and awaiting execution. Printers may be physically connected to the network as a print server, accessible via a print spooler to other computers on the network; storage devices may be similarly connected as separate file servers. Other capabilities are also considered to be system resources. For example, database applications generally have both an interactive and a batch mode for processing queries from a client. The interactive-mode resource uses large amounts of processor time, and is frequently restricted to short and/or time-critical queries; the batch-mode resource batches multiple queries together for processing at times of low processor load. Even the ability to execute a given application program can be considered a resource of the system.
Each user, sitting at a client computer in a c/s network, sees the server as a virtual part of his own system. For example, the client portion of a database application, being the same in each client computer or workstation, allows any user to choose the processor-intensive interactive mode. System printers usually appear as possible choices on the normal "Print" menu of a word-processing application program, alongside choices for local printers available only from the user's own computer.
While system resources appear to be at the total disposal of each user, in fact they are shared among all clients on the network and among all applications being executed by all users. Unlike the more abstract programming objects which can be multiplied forever, system resources are physical and finite, and must be divided among contending users.
In a mainframe type of computer organization, conventional central-processor-based operating systems schedule system resources, place restrictions upon particular users at particular times, block certain users from running certain applications or from running them in certain ways, and so forth. Servers in c/s networks can place restrictions upon the resources themselves, and upon which users can access certain resources, based upon the identity of the user. Some application programs can specify certain resources they can access, on an individual basis. That is, resource management in a c/s network is conventionally done by the server system (i.e., its operating-system program), or by restrictions which are hard-coded into each individual application program at the client level or specified by an initialization (.INI) file.
However, no facilities exist for specifying that user Alice, executing a particular program, is restricted to (for instance) run database queries only in batch mode, to avoid hogging the system with network traffic and processor time; but user Bob may run small queries in interactive mode, because he accesses only small amounts of data, or needs results quickly. The problem is that both users upload and run their queries from exactly the same database application program. Those in the art might respond by denying system permission at the server level to user Alice for the interactive mode, by restricting the number of database records in a query, or killing a query after a certain amount of time has been spent processing it. But Alice might also run another application program from her terminal for which she needs interactive mode for large queries. Or she may occasionally run smaller queries which can be serviced in interactive mode without significantly delaying other users; many database programs have facilities for estimating the resources required to fulfill a requested query. That is, the nature of present c/s networks make it difficult to achieve a sufficient granularity of control in allowing or denying particular system resources to particular users for particular application programs or in response to certain factors such as the size of a query. Dynamic control of such resources is also precluded."
I am filing this entry in the Reference Library to this blog.