- 1 Proposal for path-based MRML3: Summary
- 2 Concepts
- 3 Interpreting paths
- 4 Parsing MRML3
- 5 Writing and modifying MRML3
- 6 Higher level semantics
- 7 Optimizations
Proposal for path-based MRML3: Summary
The requirements for MRML have grown beyond the simple data description language used in previous versions of Slicer. With an eye towards current and future needs, MRML3 must:
- separate data description from visual appearance and from semantic information,
- allow incorporation of data from potentially many different sources,
- allow support for multiple hierarchies of structures,
- support use of modern XML parsing techniques.
Previous MRML versions have not adequately addressed these areas. In particular, existing XML file formats such as MRML2 have had significant difficulty sharing data elements between files because of their strict hierarchical and lexical data structure.
The proposed design solves many of these problems through several mechanisms, including hierarchical path-based access to nodes; referencing, aliasing and extending remote nodes; automatic parsing of node information; metadata description of nodes; and a variety of other nice properties.
The central idea of this MRML3 proposal is the path construct. Paths build upon existing XML data naming mechanisms. XML elements are named using an "id" attribute: a node named with an "id" can be referred to using URL fragment syntax. IDs must be unique within an XML file.
Paths provide local naming and finer granularity on top of IDs. An ID-labelled element represents the root of a tree. Children of the ID-labelled element may have a path attribute that represents a locally unique name for the element. For example, here's a simplified version of a path-based element structure:
<MRML> <Node id="myid"> <Node path="child1" /> <Node path="child2" /> </Node> </MRML>
This fragment describes two child element of a top level element.
The top level element can be referenced externally using the following syntax: http://www.example.org/file.mrml#myid
The first child element can be referred to using a slash-separated path: http://www.example.org/file.mrml#myid/child1
It's also possible to have path-named elements at top level:
<MRML> <Node path="child1" /> <Node path="child2" /> </MRML>
In this case, the first child is named as follows: http://www.example.org/file.mrml#/child1
Note that the "/" after the fragment separator "#" is important, since otherwise "child1" would be considered a fragment.
Child names must be unique only within its parent context. For example:
<MRML> <Node path="child1"> <Node path="grandchild" /> </Node> <Node path="child2"> <Node path="grandchild" /> </Node> </MRML>
This simplified fragment is valid, since every element has a path name unique in its parent.
To be accessed externally using URI syntax, an element and all of its ancestors (up to the nearest XML fragment or top level) must have a path attribute. The one exception to this rule occurs if the element is referenced through another element (see references, below).
A MRMLPath parser may choose to give elements without explicit paths temporary path names so that their parent nodes can refer to them by name. These temporary path names are transient and cannot be relied upon to have a constant value in between parsing operations.
Local path URI's are interpreted in the following way:
- URI's without a "#" fragment identifier are assumed to be paths.
- A path without a "#" that begins with a "/" is an absolute path that begins at the nearest fragment.
- A URI that does not begin with a "/" is a path relative to the containing element.
- A URI that begins with a "#" fragment identifier specifies a fragment which marks the root of the path.
- A URI that begins with a "#" fragment identifier immediately followed by a "/" begins at the toplevel MRML element.
- A URI that does not begin with a "#" fragment identifier but that contains one is assumed to have the form of a conventional URI with a fragment and path specification as described above, where the string to the left side of the "#" is the relative or absolute URI describing the location of the MRML resource.
- The path name "." is reserved. It refers to the current node.
- The path name ".." is reserved. At this time, its semantics are not defined.
- Path names that begin with "__" should not be generated by an application; they are reserved for system-generated names.
An element may include a "ref" attribute, in which case the value of the attribute is the path of another element. This element type is called a reference element, and acts similarly in some ways to a symbolic link in UNIX: the referred to element appears within the context of the local element. A reference element can have a path attribute, which gives it a new name within its local parent.
A MRMLPath parsing API treats reference elements in several ways. First, and most commonly, it can offer a "portal" view of the reference, where path references step through the reference into the referenced context. Children of the referenced element appear to be attached under the local path name. For example:
<Node path="parent1"> <Node path="child1" /> </Node> <Node path="parent2" ref="parent1" />
In this case, both of these URI's are valid, and point to the element labeled "child1":
When an remote element is referenced in this way, paths and other values in the element are evaluated in the lexical environment where the remote element was defined. In other words, an element is evaluated in exactly the same way independent of from where it is referenced (either from it's local lexical environment or from a remote reference). Specifically, reference elements are not evaluated by lexically substituting the definition of the remote node into the environment where they are referenced.
A MRMLPath API should also allow reference elements to be distinguished from the remote node itself, much as UNIX contains system calls to check the status of either a symbolic link itself or the file it points to.
Finally, reference elements can act as prototypes based on other elements. A reference element may have children, in which case the reference element's children either augment of override the remote elements children (depending on whether the path name does not or does exist in the remote element). For example:
<Node path="proto"> <Node child="child1">proto's Child 1</Node> <Node child="child2">proto's Child 2</Node> </Node> <Node path="overrider" ref="proto"> <Node child="child2">overrider's Child 2</Node> </Node>
In this case, both of the following paths reference valid elements:
The text content of the element referenced by the first path contains "proto's Child 1", while the text content of the element referenced by the second path is "overrider's Child 2".
If an API client asked for the composite child elements of a referenced element in order, the children of the reference element itself should be listed first, then the children of the remote element next. References to references should be treated in the same way.
There is currently no way to undefine or "whiteout" a child reference in a remote element using a reference element.
Composite types and structural composition
Complex structures can be created by using child paths much as member elements in programming language data structures. In general, MRML3 elements have XML element names based on their type, and paths based on their role. For example:
<mrml:RGBColor path="white"> <mrml:Double path="r" value="1.0" /> <mrml:Double path="g" value="1.0" /> <mrml:Double path="b" value="1.0" /> </mrml:RGBColor>
In this case, the given RGBColor has three named children (r, g, b), each of Double type. The interpretation of child elements is left to the API or the application.
References can be used to refer to other elements. For example:
<mrml:RGBColor path="also_white"> <mrml:Double path="r" value="1.0" /> <mrml:Double path="g" ref="r" /> <mrml:Double path="b" ref="r" /> </mrml:RGBColor>
More complex type example
Here's an example of a Material element description, based on the Material specification used in VRML:
<mrml:Material path="material_example"> <mrml:RGBColor path="ambientColor"> <mrml:Double path="r" value="0.1" /> <mrml:Double path="g" value="0.1" /> <mrml:Double path="b" value="0.1" /> </mrml:RGBColor> <mrml:RGBColor path="diffuseColor"> <mrml:Double path="r" value="1.0" /> <mrml:Double path="g" value="0.0" /> <mrml:Double path="b" value="0.0" /> </mrml:RGBColor> <mrml:RGBColor path="specularColor"> <mrml:Double path="r" value="0.8" /> <mrml:Double path="g" value="0.8" /> <mrml:Double path="b" value="0.8" /> </mrml:RGBColor> <mrml:RGBColor path="emissiveColor"> <mrml:Double path="r" value="0.0" /> <mrml:Double path="g" value="0.0" /> <mrml:Double path="b" value="0.0" /> </mrml:RGBColor> <mrml:Double path="shininess" value="2.0" /> <mrml:Double path="transparency" value="0.5" /> </mrml:Material>
Relative paths are interpreted at the location where they appear in the hierarchy. This statement is somewhat difficult to interpret with respect to element attributes. A path name value in an element attribute is interpreted with respect to the node's parent content, consistent with the interpretation of "path" and "ref".
<Node path="parent"> <Node path="mypath" attr="some_other_path" child="mypath/child"> <Node path="child" /> <Node path="some_other_path" /> </Node>
In the "mypath" element, the "attr" and "child" attributes are written to refer to the appropriate nodes. The "child" attribute needs to refer to the Node's own path name ("mypath") to get to its child element. This may seem confusing; just remember, all attributes are treated the same as "path" and "ref".
To avoid confusion, avoid naming child paths the same as attributes that also point to paths.
Low-level MRML3 parsing involves no knowledge of underlying content semantics: it simply constructs an element structure based on the the "id" and "path" elements and the element type, creates special nodes for reference elements, and assembles attributes for later interpretation.
The MRML3 API also needs to be able to handle reference elements. In efficient implementations, dereferencing of the reference can be delayed until access. When that occurs, the parser must retrieve the remote element. It must also maintain the content of that element (for instance, the name of the resource and fragment in which the remote element is contained) in order to correctly interpret path references made inside the element. The API must then composite the remote elements contents with any overriding or augmenting children of the reference element.
The API may provide temporary paths for any unnamed children of an element. These names need not survive multiple parsing instances. However, these temporary paths should be identical when accessed through the original element or through any reference elements that reference it.
MRML3 elements contain either only child nodes or only text content: they do not contain a mix of the two. A MRML3 parsing implementation may give a special name to a text element's content. Since text content is untyped, and attribute content is untyped, the attribute "value" could be considered a synonym for text content by convention.
Writing and modifying MRML3
Writing single MRML3 resources is a straightforward analog to parsing: whole files or fragments can be created, modified and composed at write time.
Modifying existing resources can present a challenge, since a writer cannot blindly assume that an element was defined in a resource that is writable. In particular, references complicate writing.
To help minimize problems in the most common cases, a MRML3 API should include a "writable" or "modifiable" flag for each element. Elements accessed from read-only resources can be marked as read-only. An application has the option of using an API "copy" or "harden_reference" call to make a local, referenceless copy of a remote resource. This copy can be written to a resource under the application's control.
In general, MRML3 modifying applications should respect the type of elements already in the tree. For example, let's assume that MRML3's RGBColor element can use color components specified as Float, Double, and Integer. In one MRML resource, an RGBColor instance is specified with Doubles. A modifying application should avoid rewriting the fields as Integers, even though it might be allowed.
Higher level semantics
MRML3 type definition and interpretation uses low level parsing operations. Types consist of simple types (Double, Integer, and so on), composite types (Collections, Arrays, etc), and more complex structures (Material, Actor, Geometry, ...).
Composite elements are free to accept children of a variety of types; the type of elements is provided to simplify parsing, not to establish a fixed type hierarchy.
A special type, "Untyped", is reserved for elements where the type of the element is unknown and may be subject to multiple interpretations. By definition, all element attributes have type "Untyped".
Many applications have use of key-value pairs. One example are the fields contained in medical data files such as DICOM. By convention, properties for an element are stored in a path called "properties" or ".properties".
Individual properties have the following structure:
<mrml:Property path="property_path"> <mrml:Untyped path="name" /> <mrml:Untyped path="value" /> <mrml:Validity path="validity" /> <mrml:URI path="origin" /> </mrml:Property>
The "path" attribute allows convenient access to a particular property in a collection of properties. The name of the property is a distinct field to allow URIs and other complex names to be used to describe a property. The value can be any value or element; however, simple types are preferred.
The optional validity field helps the reader interpret the property. It can have the following values:
- cached: the value is copied exactly from the object described by the parent node, and is cached here for convenience.
- derived: the value is a quantity derived completely from values in the object described by the parent node; it can be reconstructed using a deterministic algorithm.
- override: the property exactly corresponds to a value in the object described by the parent node. This value should override the value in that object.
- augment: the property provides new information about the object described by the parent node; it cannot be derived from the original data.
The "origin" field can be used to describe the origin of data in a property if it is not completely contained in the described data.
Each element can be described using metadata. Metadata is stored by convention at a "metadata" or ".metadata" path. Metadata can either be a list of properties as described above, or a specialized RDF metadata object (not yet specified).
Separating data from appearance
MRML2 implementations did not separate data from its appearance, not did it distinguish sources of data from instances of that data.
MRML3 nodes should be defined in the following categories:
- Data access: location of data sources in remote locations
- Data sets: named data accessed using data access methods
- Data objects: image, geometry, transforms, fiducials, property instances created from data sets or defined in-line
- Styles and appearance: User or program choose characteristics with regard to appearance of the object
- Actors: composition of data objects and appearance
- Stages: composition of actors and device-independent view specifications
- Semantics: elements to describe the semantic interpretation of objects and relate them to each other and to external entities such as ontological specifications.
MRML3 can be optimized in several ways. Here are a few ideas:
- XML resources can be preparsed to do XML entity replacement and fragment discovery.
- Fragments can be preparsed to find all paths.
- Fragments can be stored in a more compact form in a database.
- Paths can be indexed and preparsed.
- Parsing of references can be delayed until access, or alternatively, cached.
- Local caches of remote resources can be pre-fetched or stored.