CSDocs Architecture Sketch

InstanceId Position Statement:

DmaId for InstanceId Values

$$Author: Admin $
$$Date: 06-08-29 16:21 $
$$Revision: 17 $

The CSDocs proposal includes creation of a new general property that can be used on any persistent element, the InstanceId.  This is a structural solution, accomplished entirely within the model, for identifying elements of independently-persistent objects when it is valuable to do so, as in CSDocs compounding relationships.

This position statement is part of the CSDocs Architecture Sketch.  It elaborates on proposed rules for InstanceId values and how the property is introduced.  This is the basis for further e-mail discussion and arrival at a consensus position in the CSDocs Foundation proposal.

The statement is in two parts:

  1. Having InstanceId be an Optionally-Supported Property of dmaClass_DMA Class

    That is discussed in the companion page that provides the overall position statement on InstanceIds.

  2. Having InstanceId values be DmaId values.

    That is discussed on this page, along with background on using DmaId values in legacy situations.

This is InstanceId Position Statement 0.2 created on 2003-03-03 14:30 -0700 (pdt)


Content

Position: DmaId for InstanceId Values

Why DmaId?

Standard DmaId Structure and Usage

DCE UUID Version 1 Format

Pseudo-Node Identifications

Legacy Considerations

When InstanceId is Relevant

Alternatives to the CSDocs InstanceId in Identifying Specific CSRelationship Elements

Mapping Locally-Unique Ids to DmaIds

References and Resources on DmaId

Normative Specifications

Informative References


Position: DmaId for InstanceId Values

  1. The InstanceId property has values of type DmaId. 
  2. It InstanceId is supported on a class of DMA object, it is required to be read-only and system-derived.  The presence of a value is optional.
  3. An InstanceId property has a value if and only if the DMA object supporting the property corresponds to a persistent element of a DMA System
  4. Any time that an object-valued property that corresponds to a dependently-persistent element is given a new value (via a Put..., Insert, or Replace method), and that value is made persistent, any InstanceId property for that new persistent element will have a never-used DmaId value.

In this position paper, use of a DmaId and achievement of global uniqueness and complete unambiguity is proposed to avoid having to worry about degrees of uniqueness, especially in query, cross-repository operations, etc.

to contents

Why DmaId?

DmaId values are DCE Universally Unique IDentifiers (UUIDs) [DCE RPC].  DmaIds are 128-bit binary values.  DmaIds that are generated correctly on computers with network cards are guaranteed to be different from all other DmaIds until at least 3400 AD. 

DmaId values can be generated without consultation with a centralized authority.   The uniqueness of IEEE 802 network identifications is sufficient to allow each network node to independently produce DmaId values that will be unique and unambiguous.   These DmaId values can be conveyed anywhere and never be confused with any other DCE UUID.

The technique for generating DCE UUIDs  already provides for clock corrections.   The UUID generator will recover after system failures and crashes where state and possible history are lost.  There is no need to invent another technique to prevent accidental duplications of locally-unique labels or identifiers.  UUIDs are already available, can be generated very quickly, are both safe and unique, and can be used for a very long time (at least another millenium). 

The DmaId is also compatible with, and fully interoperable with, the Microsoft GUID as it is widely used and supported on the Microsoft Windows platform and in the Component Object Model [COM Spec].

DCE UUIDs are easily compared and different UUIDs are usually recognized as different with very little effort.  There need be no accomodation of differing sizes and comparisons based on anything but binary values.  There are also standard 16-bit hash-value functions that work well with UUIDs to provide rapid lookup in to sets of UUIDs.

Generally, a fixed 128-bit binary value is simple to use routinely as an InstanceId for any number of purposes.   Indexing is simplified by having the same kind of easily compared, easily-stored value be used everywhere that an InstanceId is useful.

Use of text strings, small integers, and other varieties of values is useful on a local, application or implementation-specific basis.  However, there is no reliable interoperable case that will work for different document spaces that use special-case solutions for their different local situations.  The use of DmaId for InstanceId provides for predictable methodology across a wide variety of document collections.

Another difficulty with local solutions is that over time, as more functionality is introduced and more use of InstanceId is required, one is led slowly but surely to create more features of DmaId without having DmaId compatibility.

Finally, it is difficult to specify an optional, interoperable facility that relies on local options for achieving degrees of uniqueness.  The concern is not only about added complexity of the specification: there is also concern that the solution will not scale to meet the requirements of advanced features, forcing replacement of the InstanceId implementation or leading to introduction of further special cases.

It is proposed to accept the use of a general InstanceId solution, even for the specific case of the CSDocs Foundation, so that there is no doubt about the ability to extend to use of InstanceId to additional usages and requirements beyond the foundation.

Standard DmaId Structure and Usage

to contents

The DCE UUID used for DmaId is exactly that specified for [DCE RPC] and used by Microsoft for COM [COM Spec].  This is the same as the GUID format generated by Microsoft utilities and development tools [VS60].  This tutorial description is provided to provide more familiarity with UUIDs and the options for applying them in legacy situations.

There is a standard format for presentation of DCE UUID values as sequences of hexadecimal digits separated by "-" characters.  Typical sequences and their interpretations as structures of unsigned binary integers is illustrated in the Figure.

csd990717-UUID-structure.gif (11283 bytes)

Figure.  DCE UUID Structure (click for Visio image)

DCE UUID Version 1 Format

At the top of the figure, UUIDs are shown in their standard text representation.  This sequence of 36 characters is the standard form used for DCE UUIDs everywhere.  The '-' characters are a required part of the format.  The other 32 characters specify hexadecimal numbers, using '0' .. '9' and 'a' .. 'f' (or 'A' .. 'F') to represent 4-bit binary fields (decimal 0 to 15 and binary 0000 to 1111).

All of the illustrated UUIDs are in DCE version 1 format.  This format is distinguished by the following characteristics:

The parts between the '-' characters separate different components of the UUID.  The components represent the following information:

to top of section
to contents

UUID Time Stamp: time-low, time-mid, time-hi

to contents

The time stamp values correspond to the number of 100-nanosecond intervals that have occurred since the initiation of the current Gregorian calendar on October 15, 1582 A.D of that calendar.

The 60 bits are expressed in three parts: time-low (32 bits), time-mid (16 bits), and time-hi (12 bits).

The integer value of the time stamp is

(time-hi נ216 + time-mid) נ232 + time-low

It is not expected that time stamps be accurate to the nearest 100 nanoseconds.  It is expected that time stamps generated with the same node and clock-seq values be monotonically increasing.  There must be sufficient accuracy of time-keeping that duplication of identical time stamp and clock-seq combinations is statistically impossible for a given node identification.

time-low

The first group, consisting of 8 hexadecimal digits, specifies the value of a 32-bit unsigned binary integer, time-low.  As a 100-nanosecond interval counter, time-low takes  429.4967296 seconds (about 7.16 minutes) before it repeats.

to top of section
to contents

time-mid

The second group, consisting of 4 hexadecimal digits, specifies the value of a 16-bit unsigned binary integer, time-mid.  The time-mid value advances each time time-low changes to 0.  So time-mid changes every 429.4967296 seconds.  It will take about

28,147,497.67 seconds, or about

7,818.75 hours, or about

325.78 days

for time-mid to repeat.

to top of section
to contents

time-hi

The third group, consisting of 4 hexadecimal digits, specifies the value of a 16-bit unsigned binary integer.  The high-order nibble of 4 bits is used for a version field and the remaining 12 bits constitute the time-hi value.   Since time-hi changes every 325.78 days, it will take

1,334,399.89 days, or over

3,653 years (4096 clicks of time-hi)

for time-hi, and hence a value of the entire time stamp, to repeat.

to top of section
to contents

Clock-Seq Values

The fourth field of the UUID string, consisting of 4 hexadecimal digits, specifies the value of a 16-bit unsigned binary integer.  For a DCE UUID, the high-order 2 bits are used for a constant variant-identification field and the remaining 14 bits constitute the clock-seq value. 

The clock-seq value is required to start as a random number generated in such a way that there is a negligible chance that the same clock-seq value would be employed with the same time stamp if the node identification were moved to (or from) another computer.

[DCE RPC] specifies the conditions for generating clock-seq values and altering them any time there is risk of duplicate time stamps being produced without detection.  This can occur, for example, when the local system clock is adjusted or the system is restarted in a way where it is not certain that the last-generated UUID has been remembered.  The [UUIDs-GUIDs] working paper provides a sample implementation that illustrates how these qualities can be achieved.

In the figure, the fourth field is also shown separated into two 8-bit binary fields, one containing the variant and clock-seq-hi, the other containing clock-seq-low.  In that storage arrangement, which is used in the interchange of UUIDs in binary data structures (including DmaIds), it is the case that

clock-seq = clock-seq-hi ײ56 + clock-seq-low

or, put slightly differently,

clock-seq-low = clock-seq mod 256;

clock-seq-hi = (clock-seq - clock-seq-low) 獊 256

to top of section
to contents

Node Identification

The final part of the UUID sequence consists of a 6-octet node identification sequence.  Each pair of hexadecimal digits specifies the value of a single 8-bit binary value (octet), and the six octets are presented in the same order used in the interchange of IEEE 802.3 network node identifications. 

In the pure use of DCE GUIDs, a node identification will always have the first octet hold a value less than 128: the high-order bit of the value is always  0.

In the figure, the first example is the UUID used to identify the well-known DMA 1.0 dmaClass_DMA class.  It conforms exactly in structure to an official DCE UUID.  The value was generated using a node on a system that, on its own, will never generate that particular UUID again.

to top of section
to contents

Pseudo-Node Identifications

When there is no node identification available, or it is undesirable to use the true node identification, different procedures are required to produce UUIDs that cannot conflict with UUIDs generated on systems having legitimate IEEE 802.3 network node identifications.

Some approaches extend the basic DCE UUID format by using the high-order bit of the first node octet to indicate when a pseudo-node identification is present..

In the figure, the second and third sample UUIDs are produced using such a technique.  The UUIDGEN utility of [VS60] was used on a Windows 98 system with no network card.  The "Day-1 pseudonode" example was used on the first day.  After shutting down the computer and restarting on the next day, the "Day-2 pseudonode" example was obtained.

There is no standard for the creation of pseudo-node identifications, so interoperability among computers systems is not assured. 

A working paper by Paul Leach and Rich Salz [UUIDs-GUIDs] describes ways to extend the [DCE RPC] UUID specification to allow pseudo-node identification in DCE Version 1 UUIDs.  Additional versions are also proposed.  None of these have been adopted as standard extensions to the DCE UUID.

UUIDs that are intended to be communicated to other computer systems should not be generated using anything but the "pure" DCE UUID format. 

to top of section
to contents

Legacy Considerations

to contents

Given a legacy repository that does not support InstanceId already, it is suggested that the requirement to support DmaId for InstanceId is a barrier to adoption of CSDocs.

When InstanceId is Relevant

Sticking with the CSDocs foundation, ignoring other potential applications of InstanceId, it would appear that having InstanceId be a DmaId cannot become a barrier until the following situations has obtained:

  1. The legacy system has found a way of expressing the DMA Relationship model and realizing CSRelationship objects to at least bridge CSRoot and CSComponent DocVersions.
  2. The legacy system is prepared to support identification of specific elements within a CSRoot and/or CSComponent as part of the CSRelationship implementation.
  3. The legacy system could operate with a simpler or already-available facility if globally-unique identifiability of elements were not required.
  4. The Ids the legacy system employs, or can employ, are ones that the legacy system can ensure are not duplicated, even in the face of failures and loss of system state.

to top of section
to contents

Alternatives to the CSDocs InstanceId in Identifying Specific CSrelationship Elements

The CSDocs Foundation provisions for InstanceId are for interoperability in the identification of specific elements (2, above).

It is legitimate for a DMA DocSpace to support subclasses of CSRelationship that do not support the additional properties of CSRelationship for identifying specific elements.   That is, none of the dmaProp_HeadRenditionId, dmaProp_HeadContentElementId, dmaProp_TailRenditionId, and dmaProp_TailContentElementId properties would be supported on the SCRelationship subclass.

There is no impact on CSDocs-unaware requesters.  There is some impact on CSDocs-aware requesters, in that there will be CSRelationship objects for which the principal, if any, for locating specific elements is not known.  This is not a new condition to deal with.  No CSDocs implementation is required to use a known method for identifying specific elements. 

When a CSDocs implementation does employ the CSDocs provisions for identification of specific elements on a CSRelationship object, there is more available to a CSDocs-aware requester, including the prospect of being able to create or alter a CSRelationship in an interoperable way.

to top of section
to contents

Mapping Locally-Unique Ids to DmaIds

Assuming that conditions (1-4), above, are satisfied by a legacy system, it is possible to map the local identifer to a globally-unique DmaId.  This assumes that there is an unambiguous mapping between the locally-unique identification and a 32-bit binary value.

To ensure that the legacy system will never exhaust the capacity of a single 32-bit value, it may be necessary to partition the set of local identifications and map each set to a different locally-unique 32-bit binary value.    How this partitioning is accomplished depends on the characteristics of the legacy system.  The only requirement to have a DmaId mapping is that there be a means for associating a single DmaId (a BaseId) with each partition, and then mapping the locally-unique identifiers ever used in that partition to unique 32-bit values.

For CSDocs, the legacy implementation need only add a DMA implementation for producing the correct DmaId value of an InstanceId property.   It should be able to recognized the correct partition and produce the same 32-bit binary value without error.  There is no requirement, within the CSDocs Foundation, to be able to directly access some element, given the value of its InstanceId.    It is only necessary to deliver the value of the InstanceId once the element is accessed.

to top of section
to contents

Reserving UUID Blocks

Here is a trivial way to obtain a set of DmaId values that can be mapped to and from 32-bit unsigned integers:

to top of section
to contents

Mapping the UUID time-low Field

Once the UUID of an unused block is found in this form, the mapping between the UUID and some locally-unique identifier is usually straightforward:

to top of section
to contents

References and Resources on DmaId

to contents

References on UUIDs, how they can be generated, and provisions for generating/reserving blocks of UUIDs that can be used to fabricate UUIDs from locally-unique IDs that are easily mapped to dense integers.

Normative Specifications

Definitive sources on the format, generation, and usage of UUIDs and GUIDs.

[ISO 11578]
ISO (International Organization for Standardization) ISO/IEC 11578:1996.   Information Technology - Open System Interconnection - Remote Procedure Call (RPC).   570pp.  JTC 1.  ICS 35.100.70.  This ISO standard covers the complete RPC system.  It is an outgrowth of the OSF DCE specification and covers UUIDs and GUIDs.  The document is only available in hard copy through a national standards body, such as the American National Standards Institute (ANSI).  Printed copies of this specification cost over $250 in the United States.
[DCE RPC]
DCE 1.1: Remote Procedure Call.  Open Group Technical Standard.  Document Number C706.   The Open Group (Reading, UK: August, 1997).   737pp.  Electronic edition, $58.00.  HTML edition on-line.  UUIDs are specified in an Appendix of the on-line edition.  Registration is required to access the on-line material.   C706 was formerly C309 [CAE RPC].

Informative References

to top of section
to contents

[CAE RPC]
CAE Specification, X/Open DCE: Remote Procedure Call.  X/Open Company Limited (Reading, Berkshire, UK: 1994).   X/Open Document Number C309.  ISBN 1-85912-041-5.  pp. 585-592 describe UUIDs.  According to [COM Spec], COM GUIDs are identifiers that are interoperable with the UUIDs of p.586, and not related to the GUIDs mentioned as a specific scheme on p.587 of [CAE RPC].
[COM Spec]
Microsoft Corporation and Digital Equipment Corporation.  The Component Object Model Specification. Microsoft Corporation (Redmond, WA: 1992-1995).  Version 0.9, October 24, 1995.  This specification is available on the Microsoft Developer Network (MSDN) Library CD-ROM distribution, as late as April, 1999.  The specification can also be found at Microsoft's on-line pages.   There is an on-line (HTML) version and two downloadable versions as Microsoft Word 6.0 Master Documents with individual chapters.  A Word 6.0 version is required in order to have the footnotes and bibliography.  Chapter 3.2 Globally-Unique Identifiers, provides the description of GUID usage in COM.  [CAE RPC] is the referenced source.
[Inside OLE]
Brockschmidt, Kraig.  Inside OLE.  ed.2. Microsoft Press (Redmond, WA: 1995).  ISBN 1-55615-843-2.  pbk with two disks.  Chapter 2, Objects and Interfaces, Object Identity section, has a treatment of Globally Unique Identifiers.   This book is available on the Microsoft Developer Network (MSDN) Library CD-ROMs, as late as April, 1999.  The Globally Unique Identifiers section of the book can also be found on MSDN Online at http://msdn.microsoft.com/LIBRARY/BOOKS/TECHNLANG/INOLE/D1/S10E8.HTM.   It is necessary to register, although access to MSDN Online is free.   Brockschmidt discusses creation of blocks of GUIDs
[OSF DCE]
Miller, Steven.  DEC/HP Network Computing Architecture Remote Procedure Call Run Time Extensions version OSF TX1.0.11. Open Software Foundation (Cambridge, MA: July 23, 1992).  This is the reference that is commonly given for the definition of UUID and GUID, a particular adaptation of it.  The current specification from The Open Group is [DCE RPC].
[UUIDs-GUIDs]
Leach, Paul J., Salz, Rich.  UUIDs and GUIDs.  Internet Draft <draft-leach-uuids-guids-01.txt>.    Microsoft, Certco. February 4, 1998.  Expired August 4, 1998.  This is an expired draft document.  It has no standing as an IETF specification and may not be employed as such.  To determine if this draft has been updated or replaced, consult the IETF Internet-Draft Directory.   This draft is also referenced by The Open Group at http://www.opengroup.org/dce/info.  This draft provides implementations of common UUID creation, comparison, and conversion functions.  The DCE version 1 UUID is the same form used for Microsoft COM GUIDs in [COM Spec].  Leach and Salz introduce a method for producing DCE version 1 UUIDs using a cryptographic-quality random number for the Node identification when there is no IEEE 802 node identification available or it is undesirable to use it.
[VS60]
Microsoft Corporation.  Visual Studio 6.0.  The Microsoft Win32 development for Intel platforms provides two utilities, UUIDGEN and GUIDGEN.  UUIDGEN is a console application that uses the operating system to derive UUIDs.  GUIDGEN is a VS60 tool that uses the same UUID generator to offer GUIDs in a variety of formats used in programming and creating installation scripts.  When UUIDGEN produces a pseudo-node form of UUID on a machine with no detected network node, there is a warning that the resulting UUID is not reliable beyond the single machine and should not be interchanged with other systems.  GUIDGEN makes no such announcement.

to top of section
to contents


updated 2003-03-03 14:30 -0700 (pdt) by Dennis E. Hamilton
created 1999-07-11-22:06 -0700 (pdt) by Dennis E. Hamilton