匹配管理領域定義語言(MappingMaster DSL)——敘詞轉換爲本體專用語言(二)

(續)

5、處理單元格內容

默認的操作是直接用引用單元格的內容。然而,默認的規則可以通過使用可選的值指定語句(value specification clause)改變。

這個語句通常由緊跟在編碼指定的關鍵詞後面的‘=’符號和由一個圓括號包圍,逗號分隔的值指定列表。這些值指定列表,一個接一個的。這些值指定可以是單元格引用,引用的值,包含匹配組的正則表達式,或者內置的文檔處理功能。


5.1 基本的單元格內容處理

   例如,擴展一個引用的表達式從而指定實體從單元格A5創建,就使用rdfs:label 命名編碼並且名字的值是字符Sale在前,單元格值在後的的值。可以表達如下:

   Class:@A5(rdfs:label=("Sale:",@A5))

   值指定引用並不侷限於引用單元格本身,也可以表達任意的單元格。多於一個編碼也可爲一個專門的引用指定,例如,不同的標識和標籤值可以因爲一個特殊的實體二生成,通過使用不同單元格的內容的方式。

   例如,我們能擴展上面的例子,從而給生成類的rdf:ID賦值爲B5,如下:

   Class:@A5(rdf:ID=@B5  rdfs:label=("Sale:",@A5))


這個語言包含幾個內置的文本處理方法,這些方法可以被用在值指定過程中。目前支持的方法包括mm:replacemm:replaceAllmm:replaceFirstmm:prependmm:appendmm:toLowerCasemm:toUpperCasemm:trimmm:reverse, and mm:printfmm:decimalFormat。這些方法能有0個或者更多個參數,並且有一個返回值。提供的參數可以是引用字符串和引用本身的任意組合。


一個在標籤分配之前轉換單元格A5中的內容爲大寫的格式的語句可以書寫爲:

Class:@A5(mm:toUpperCase(@A5))


值處理函數也可用在值指定語句的後面,但是僅限於這些語句沒有在引用中使用,並且只有一個函數被使用。


5.2 decimalFormat and printf


decimalFormat 和printf支持對文字的和數字的內容的編碼。他們的行爲遵守標準的java語言的格式。

例如:

  Individual: Fred Facts: hasSalary @A1(mm:decimalFormat("###,###.00", @A1))
 Class: @A1(mm:printf("A_%s", @A1))
5.3 替換字符
mm:replace和mm:replaceAll函數從標準Java String類關聯的方法中起作用。
例如,爲了移除單元格中所有的非字母數字的字符,mm:replaceAll函數將通過如下方式使用:

Individual:@A5

Facts:hasItems @B5(mm:replaceAll("[^a-zA-Z0-9]",""))


5.4 前追加和後追加

  Class: @A5(rdfs:label=mm:prepend("Sale:")) 
Individual: @A2(mm:append("_MM")) 
5.5  文字

匹配管理目前支持如下的數據類型:

xsd:stringxsd:booleanxsd:bytexsd:shortxsd:intxsd:longxsd:floatxsd:double,xsd:integerxsd:decimalxsd:dateTimexsd:datexsd:timexsd:Durationrdf:PlainLiteralrdf:XMLLiteral


5.6 IRIs

爲了自定義IRI創建過程,匹配管理有幾個原則

mm:iri,mm:camelCaseEncode,mm:snakeCaseEncode,mm:uuidEncode,mm:hashEncode


5.7 缺失值處理

爲了處理缺失單元格的值,默認值也可以在引用中被指定。默認值子句用來爲這些單元的分配值。這個子句由mm:DefaultLocationValue,mm:DefaultLiteral,mm:DefaultLabel,和mm:DefaultID關鍵詞表示,這些關鍵詞後面緊跟一個爲字符串的分配。例如,下面的表達式用這個子句來表明,“Unknown”值應該被用作新創建的類的label,如果單元格A5爲空的情況下:

Class:@A5(rdfs:label mm:DefaultLabel="Unkown")

其他的行爲也被支持來處理缺失的單元格值。默認的行爲是忽略整個表達,如果它包含任何有空單元格值的引用。四個關鍵詞被提供來更正這種行爲。四個關鍵詞是:

mm:ErrorIfEmptyLocation

mm:SkipIfEmptyLocation

mm:WarningIfEmptyLocation

mm:ProcessIfEmptyLocation

最後一個關鍵詞允許電子表格的處理,這個電子表格可能包含大量缺失的值。這個關鍵詞表明,這個語言處理器應該,如果可能的話,謹慎的去掉包含空引用的子表達語句,而不是去掉所有的表達。例如,下面的表達用電子表格的單元格A5申明一個Individual,並且用在單元格A6的值關聯一個屬性hasAge。

Individual:@A5

Facts:hasAge @A6(mm:ProcessIfEmptyLocation)


這裏,用默認的忽略行爲情況下,在單元格A5中丟失的值將會導致整個表達式都被忽略。然而,用Process規則的話,單元格A6將會被丟棄,僅僅會在包含它的子句爲空的情況下。因此,如果單元格A5包含一個值,而單元格A6爲空,這個結果表達式將任會申明一個Individual。


相似的方法,更多好的空值處理方法也被支持來指定一個不同的空值處理行爲。這些處理行爲可以針對:mm:Literal,rdf:ID和rdfs:label值。這裏,這個標註指導規則包括mm:ErrorIfEmptyLabel.mm:SkipIfEmptyLable,mm:WarningIfEmptyLabel,和mm:ProcessIfEmptyLabel,響應的rdf:id和mm:Literal有相同的關鍵詞:

mm:ErrorIfEmptyIDmm:SkipIfEmptyIDmm:WarningIfEmptyIDmm:ProcessIfEmptyID 和 mm:ErrorIfEmptyLiteralmm:SkipIfEmptyLiteralmm:WarningIfEmptyLiteral,mm:ProcessIfEmptyLiteral.


5.8 位置移動(轉換)

一個額外的選項被提供來處理空單元格的值。這個選項的目標是在許多電子表格中通常出現的情況,一個特定的單元格被提供一個值,而其下面所有空單元格隱含着與它有着相同的值。在這種情況下,當這些空單元格被處理時,他們的位置必須裝換到包含這個值的的單元格的位置。例如,下面的表達式用這個關鍵詞來表明,如果調用A5不包含申明類的名稱值,則行號補習向上轉換直到一個值被找到。

Class:@A5(mm:ShiftUp)

如果沒有值被找到,通用的空值處理國科可以被使用。相似的規則還有:mm:ShiftDown,mm:ShiftLeft,mm:ShiftRight


5.9 在一個引用中遍歷一系列的單元格

很明顯,大部分的匹配將不會僅僅引用單個的單元格,而是會遍歷表格中的一系列的行或者列,通配符‘*’能在一個序列中引用中被用來引用到目前的列或者行。匹配管理提供一個圖形接口來指定這些範圍。

用這些通配符標註的引用範例包括:

@A3

@A*

@**

例如,遍歷格網D4到G6以創建一個實例類,Sale,可以表達爲:

Individual:@**

Types:Sale **

這個表達式可以被拓展來爲這些實例的屬性值分配屬性

Individual:@**

Types:Sale

Facts:hasAmount @**

           hasProduct @B*

           hasState    @*2




附件:(英文原文)


MappingMaster uses a domain specific language (DSL) to define mappings from spreadsheet content to OWL ontologies. This language is based on the Manchester OWL Syntax, which is itself a DSL for describing OWL ontologies.

An introduction to the Manchester Syntax can be found here. A set of example Manchester Syntax expressions can be found in the Quick Reference section of that document.

The Manchester Syntax supports the declarative specification of OWL axioms.

For example, a Manchester Syntax declaration of an OWL named class Gum that is a subclass of a named class called Product can be written using using a class declaration clause as:

  Class: Gum SubClassOf: Product 

The MappingMaster DSL extends the Manchester Syntax to support references to spreadsheet content in these declarations. MappingMaster introduces a new reference clause for referring to spreadsheet content. In this DSL, any clause in a Manchester Syntax expression that indicates an OWL named class, OWL property, OWL individual, data type, or a literal can be substituted with this reference clause. Any declarations containing such references are preprocessed and the relevant spreadsheet content specified by these references is imported. As each declaration is processed, the appropriate spreadsheet content is retrieved for each reference. This content can then be used in four main ways:

  • It can be used to directly name OWL entities that are created on demand.
  • It can be used to annotate OWL entities that are created on demand.
  • The content may reference existing OWL entities, either directly as a URI or through an annotation property.
  • Finally, the content may be used as a literal.
Using one of these approaches, each reference within an expression is thus resolved during preprocessing to a named OWL entity, a data type, or a literal. The resulting expression can then be executed by a standard Manchester Syntax processor.

Table of Contents

References

Reference in the MappingMaster DSL are prefixed by the character @. These are generally followed by an Excel-style cell reference. In the standard Excel cell notation, cells extend from A1 in the top left corner of a sheet within a spreadsheet to successively higher columns and rows, with alpha characters referring to columns and numerical values referring to rows .

Basic References Use

For example, a reference to cell A5 in a spreadsheet is written as follows:

  @A5 

The above cell specification indicates that the reference is relative, meaning that if a formula containing the reference is copied to another cell then the row and column components of the reference are updated appropriately.

Sheets can also be specified by enclosing their name in single quotes and using the "!" character separator between the sheet name and the cell specification:

  @'A sheet'!A3 

For example, in the following spreadsheet rows 4 to 6 of column B contain product categories; columns D to G of row 2 contain state identifiers, and the grid range D4 to G6 contains sales amounts.

These references can then be used in MappingMaster's DSL to define OWL constructs using spreadsheet content.

For example, a MappingMaster expression to declare that a class FlavouredGum is a subclass of the class named by the contents of cell B4 can be written:

  Class: FlavouredGum SubClassOf: @B4 

When processed, this expression will create an OWL named class using the contents of cell B4 ("Gum") as the class name and declare FlavouredGum to be its subclass. If the class Gum already exists, the subclass relationship will simply be established.

That is, references can be used both to define new OWL entities or to refer to existing entities.

A similar expression to declare that the class SalesItem is equivalent to the class named by the contents of cell B4 can be written:

  Class: SalesItem 
  EquivalentTo: @B4 

The Manchester Syntax also supports an individual declaration clause for declaring individuals; property values can be associated with the declared individuals using a facts subclause, which contains a list of property value declarations.

For example, an expression to specify that an individual created from the contents of cell D2 ("CA") has a value of "California" for a data property value hasStateName can be written:

  Individual: @D2 
  Facts: hasStateName "California" 

Here, an individual will CA be created if necessary and associated with the data property hasStateName, which will be given the string value "California".

Using the standard Manchester Syntax, annotation properties can also be associated with declared entities.

For example, an existing string data type annotation property called hasSource can be used to associated the above declared California individual with the source document as follows:

  Individual: @D2 
  Facts: hasStateName "California" 
  Annotations: hasSource "DMV Spreadsheet 12/12/2010" 

Classes or properties can be annotated in the same way. For example, a class can be annotated with the hasSource annotation property as follows:

  Class: @D2 
  Annotations: hasSource "DMV Spreadsheet 12/12/2010" 

The Manchester Syntax also supports the use of OWL class expressions. In general, a class expression may occur anywhere a named class can occur.

For example, an expression to define a necessary and sufficient condition of a class Sale used the contents of cell D4 as the filler of an owl:HasValue axiom with the property hasAmount can be written:

  Class: Sale 
  SubClassOf: (hasAmount value @D4) 

In general, OWL entities named explicitly in a MappingMaster expression (as opposed to resolved through a reference) must already exist in the target ontology. In these examples, the classes SaleSalesItem and FlavouredGum, and properties hasAmounthasStateName and hasSource must already exist.

Specifying the Type of a Reference

In the expression

   Class: @A5 
   SubClassOf: Drug 

reference @A5 clearly refers to an OWL class. However, the reference type cannot always be inferred unambiguously.

For example, in the expression

    Class: Sale 
    SubClassOf: (@A3 value @D4) 

the reference @A3 could refer to an object, data, or annotation property, and reference @D4 could be either an OWL individual or a literal.

To deal with this situation, Mapping Master supports explicit entity type specification. Specifically, a reference may be optionally followed by a parenthesis-enclosed entity type specification to explicitly declare the type of referenced entity. This specification can indicate that the entity is a named OWL class, an OWL object, data or annotation property, an OWL named individual, or a data type. The MappingMaster keywords to specify the types are the standard Manchester Syntax keywords ClassObjectPropertyDataPropertyAnnotationProperty and Individual, plus any XSD type name (e.g., xsd:int).

Using this specification, the previous drug declaration, for example, can be written:

  Class: @A5(Class) 
  SubClassOf: Drug 

A declaration of an individual from cell B5 with an associated property value from cell C5 that is of type float can be specified as follows:

  Individual: @B5 
  Facts: hasSalary @C5(xsd:float) 

If the hasSalary data property is already declared to be of type xsd:float then the explicit type qualification is not needed. A global default type can also be specified for literals in the case where the type of the associated data property is either unknown or unspecified or if no explicit type is provided in the reference.

References to OWL properties and individuals can be qualified in the same way.

Reference Resolution

References may specify OWL entities (i.e., classes, properties, individuals, or datatypes) or literals. When a reference specified an OWL entity the reference value may resolve to an existing OWL entity or may be used to name an OWL entity that is created on demand.

Basic Reference Resolution

A variety of name resolution strategies are supported when creating or referencing OWL entities. The three primary strategies are to:

  • Using rdf:IDs to create or resolve OWL entities.
  • Use rdfs:label annotations to create or resolve OWL entities
  • Create OWL entities based on the location of a cell ignoring the resolved reference value.
With rdf:ID encoding, and OWL entity generated from a reference is assigned its rdf:ID directly from the resolved reference value. Obviously, this content must represent a valid identifier (spaces are not, allowed in rdf:IDs for example).

Using rdfs:label encoding, an OWL entity resolved from a reference is given an automatically generated URI and its rdfs:label annotation value is set to the resolved reference value.

With location encoding, an OWL entity generated from a reference also given an automatically generated URI but in this case the resolved reference value are unused.

The default naming encoding uses the rdfs:label annotation property. The default may also be changed globally.

A name encoding clause is provided to explicitly specify a desired encoding for a particular reference. As with entity type specifications, this clause is enclosed by parentheses after the cell reference. The keywords to specify the three types of encoding are mm:Locationrdf:ID, andrdfs:label.

Using this clause, a specification of rdf:ID encoding for the previous drug example can be written:

  Class: @B4(rdf:ID) 
  SubClassOf: Drug 

As mentioned, MappingMaster also supports entity creation where cell values are ignored. In this case, the keyword mm:Location can be used in parenthesis following a reference.

For example, an expression to create an individual for cell D4 while ignoring the contents of the cell can be written:

  Individual: @D4(mm:Location) 

By default, OWL entities names are resolved or generated using the namespace of the currently active ontology. The language includes mm:prefix and mm:namespace clauses to override this default behavior.

For example, an expression to indicate that an individual created or resolved from the contents of cell A2 (assuming rdfs:label resolution) should use the namespace identified by the prefix "clinical", can be written:

  Individual: @A2(mm:prefix="clinical") 

Similarly, an expression to indicate that it must use the namespace "http://clinical.stanford.edu/Clinical.owl#" can be written:

  Individual: @A2(mm:namespace="http://clinical.stanford.edu/Clinical.owl#") 

Explicit namespace or prefix qualification in reference allows disambiguation of duplicate labels in an ontology.

Reference Resolution Using Annotation Values

To support direct references to annotation values in expressions, MappingMaster's DSL adopts the Manchester Syntax mechanism of enclosing these references in single quotes.

For example, if the OWL class Product has an rdfs:label annotation value 'A sellable product' it can be referred as follows:

  Class: @B4 
  SubClassOf: 'A sellable product' 

A sellable product will be resolved through an annotation value to the class Product when this expression is processed.

Reference Resolution Configuration Options

Document the following options:

mm:defaultPrefixmm:defaultNamespacemm:defaultLanguagemm:ResolveIfOWLEntityExistsmm:SkipIfOWLEntityExistsmm:WarningIfOWLEntityExistsmm:ErrorIfOWLEntityExistsmm:CreateIfOWLEntityDoesNotExistmm:SkipIfOWLEntityDoesNotExistmm:WarningIfOWLEntityDoesNotExistmm:ErrorIfOWLEntityDoesNotExistmm:ProcessIfEmptyLabelmm:ErrorIfEmptyLabelmm:WarningIfEmptyLabelmm:SkipIfEmptyLabel

Processing Cell Content

The default behavior is to directly use the contents of the referenced cell. However, this default can be overridden using an optional value specification clause.

This clause is usually indicated by the '=' character immediately after the encoding specification keyword and is followed by a parenthesis-enclosed, comma-separated list of value specifications, which are appended to each other. These value specifications can be cell references, quoted values, regular expressions containing capturing groups, or inbuilt text processing functions.

Basic Cell Content Processing

For example, an expression that extends a reference to specify that the entity created from cell A5 is to use rdfs:label name encoding and that the name is to be the value of the cell preceded by the string "Sale:" can be written as follows:

  Class: @A5(rdfs:label=("Sale:", @A5)) 

Value specification references are not restricted to the referenced cell itself and may indicate arbitrary cells. More than one encoding can also be specified for a particular reference so, for example, separate identifier and label annotation values can be generated for a particular entity using the contents of different cells.

For example, we can extend the example above to assign the rdf:ID of generated classes to cell B5 as follows:

  Class: @A5(rdf:ID=@B5 rdfs:label=("Sale:", @A5)) 

If the assignment list includes only a single value then the opening and closing parenthesis can be omitted:

  Class: @A5(rdf:ID=@B5 rdfs:label=("Sale:", @A5)) 

The language includes several inbuilt text processing methods that be used in value specifications. At present, several methods are supported. These include mm:replacemm:replaceAllmm:replaceFirstmm:prependmm:appendmm:toLowerCasemm:toUpperCasemm:trimmm:reverse, and mm:printfmm:decimalFormat. These methods take zero or more arguments and return a value. Supplied arguments may be any combination of quoted strings or references.

An expression to convert the contents of cell A5 to upper case before label assignment can be written:

  Class: @A5(mm:toUpperCase(@A5)) 

A method can also have an explicit first argument omitted if the argument refers to the current location value. The previous expression can thus also be written:

  Class: @A5(mm:toUpperCase) 

Value processing functions can also used outside of value specification clauses - but only if these clause are not used in a reference, and only a single function can be used.

decimalFormat and printf

decimalFormat and printf support formatting of textual and numerical content. Their behavior follows the standard Java specifications for the DecimalFormat class and the String.formatmethod.

mm:decimalFormat can be used as follows:

  Individual: Fred Facts: hasSalary @A1(mm:decimalFormat("###,###.00", @A1))

When the value of cell A1 is "23000.2" this will render:

   Individual: Fred Facts: hasSalary "23,000.20"

Here is an example of mm:printf:

   Class: @A1(mm:printf("A_%s", @A1))

When value of cell A1 is "Car" this will render:

   Class: A_Car

Any parameter can be replaced with a reference clause. These functions will work with explicit rdf:ID and rdfs:label assignment too.

Note that if only one parameter is supplied the second is assumed to be the enclosing reference location.

So

   Individual: Fred Facts: hasSalary @A1(mm:decimalFormat("###,###.00"))

is equivalent to:

   Individual: Fred Facts: hasSalary @A1(mm:decimalFormat("###,###.00", @A1))

And

   Class: @A1(mm:printf("A_%s"))

is equivalent to:

   Class: @A1(mm:printf("A_%s", @A1))

Which is also equivalent to:

   Class: @A1(rdf:ID=mm:printf("A_%s", @A1))

Replacing Characters

The mm:replace and mm:replaceAll functions follow from the associated methods in the standard Java String class.

For example, to remove all non alphanumeric characters from a cell before assignment, the mm:replaceAll function can be used as follows:

  Individual: @A5 
  Facts: hasItems @B5(mm:replaceAll("[^a-zA-Z0-9]","")) 

Similarly, the mm:replace method can be used to replace commas with periods when processing literals:

  Individual: @A2 
  Facts: hasSalary @A3(xsd:float mm:replace(",", ".")) 

Prepending and Appending

The mm:prepend method can be used as follows to simplify the above example:

  Class: @A5(rdfs:label=mm:prepend("Sale:")) 

The expression can be further simplified by omitting the explicit rdfs:label qualification if it is the default:

  Class: @A5(mm:prepend("Sale:")) 

The append method works similarly.

For example, assuming default rdfs:label encoding, the string "_MM" can be appended to a generated label as follows using the mm:append function:

  Individual: @A2(mm:append("_MM")) 

Extracting Values Using Regular Expressions

A similar approach can be used to selectively extract values from referenced cells. A regular expression groups clause is provided and can be used in any position in a value specification clause. This clause is contained in a quoted string enclosed by square parenthesis. For example, if cell A5 in a spreadsheet contains the string "Pfizer:Zyvox" but only the text following the ':' character is to be used in the label encoding, an appropriate capture expression could be written as:

  Class: @A5(rdfs:label=[":(\S+)"]) 

Note that parentheses around the sub-expressions in a regular expression clause specify capture groups and indicate that the matched strings are to be extracted. In some cases, more than one group may be matched for a cell value, in which case the matched strings are extracted in the order that they are matched and are appended to each other.

Capturing groups can also be used to generate literals. For example, if cell A2 in a spreadsheet has a person's forename, middle initial, and surname separated by a single space, three capturing expressions can be used to selectively extract each name portion and separately assign them to different properties as follows:

  Individual: @A2 
  Types: Person 
  Facts: hasForename @A2(["(\S+)"]), 
         hasInitial @A2(["\S+\s(\S+)"]), 
         hasSurname @A2(["\S+\s\S+\s(\S+)"]) 

A similar example to separately extract two space-separated integers from a cell can be written as:

  Individual: @A2 
  Types: Person 
  Facts: hasMin @A2(xsd:int ["(\d+)\s+"]), 
         hasMax @A2(xsd:int ["\s+(\d+)"]) 

If the hasMan and hasMax properties are of type xsd:int then the explicit qualification is not required here.

Capturing expressions can also be invoked via the mm:capturing function:

  Individual: @A2 
  Types: Person 
  Facts: hasForename @A2(mm:capturing("(\S+)")

The syntax of capturing expressions follows that supported by the Java Pattern class.

Literals

Mapping Master currently supports the following datatypes:

xsd:stringxsd:booleanxsd:bytexsd:shortxsd:intxsd:longxsd:floatxsd:double,xsd:integerxsd:decimalxsd:dateTimexsd:datexsd:timexsd:Durationrdf:PlainLiteralrdf:XMLLiteral

IRIs

Mapping Master has several directives to customize the IRI creation process.

Directive Explanation
mm:iri Use the resolved reference value to generate an IRI. An error will be thrown if the generated value does not represent a valid IRI.
mm:camelCaseEncode  
mm:snakeCaseEncode  
mm:uuidEncode  
mm:hashEncode  

Missing Value Handling

To deal with missing cell values, default values can also be specified in references. A default value clause is provided to assign these values. This clause is indicated by the keywords mm:DefaultLocationValuemm:DefaultLiteralmm:DefaultLabel, and mm:DefaultID followed by an assignment to a string. For example, the following expression uses this clause to indicate that the value "Unknown" should be used as the created class label if cell A5 is empty:

  Class: @A5(rdfs:label mm:DefaultLabel="Unknown") 

Additional behaviors are also supported to deal with missing cell values. The default behavior is to skip an entire expression if it contains any references with empty cells. Four keywords are supplied to modify this behavior. These keywords indicate that:

  • An error should be thrown if a cell value is missing and the mapping process should be stopped (mm:ErrorIfEmptyLocation)
  • Expressions containing references with empty cells should be skipped (mm:SkipIfEmptyLocation)
  • Expressions containing references with empty cells should generate a warning in addition to being skipped (mm:WarningIfEmptyLocation)
  • Expressions containing such empty cells should be processed (mm:ProcessIfEmptyLocation).
The last option allows processing of spreadsheets that may contain a large amount of missing values. The option indicates that the language processor should, if possible, conservatively drop the sub-expression containing the empty reference rather than dropping the entire expression.

Consider, for example, the following expression declaring an individual from cell A5 of a spreadsheet and associating a property hasAge with it using the value in cell A6:

  Individual: @A5 
  Facts: hasAge @A6(mm:ProcessIfEmptyLocation) 

Here, using the default skip behavior action, a missing value in cell A5 will cause the expression to be skipped. However, the process directive for the hasAge property value in cell A6 will instead drop only the sub-expression containing it if that cell is empty. So, if cell A5 contains a value and cell A6 is empty, the resulting expression will still declare an individual.

Using a similar approach, more fine grained empty value handling is also supported to specify different empty value handling behaviors for mm:Literalrdf:ID and rdfs:label values. Here, the label directives are mm:ErrorIfEmptyLabelmm:SkipIfEmptyLabelmm:WarningIfEmptyLabel, andmm:ProcessIfEmptyLabel, with equivalent keywords for RDF identifier and literal handling. These are mm:ErrorIfEmptyIDmm:SkipIfEmptyIDmm:WarningIfEmptyIDmm:ProcessIfEmptyID and mm:ErrorIfEmptyLiteralmm:SkipIfEmptyLiteralmm:WarningIfEmptyLiteral,mm:ProcessIfEmptyLiteral.

Location Shifting

One additional option is provided to deal with empty cell values. This option is targeted to the common case in many spreadsheets where a particular cell is supplied with a value and all empty cells below it are implied to have the same value. In this case, when these empty cells are being processed, their location must be shifted to the location above it containing a value. For example, the following expression uses this keyword to indicate that call A5 does not contain a value for the name of the declared class then the row number must be shifted upwards until a value is found:

  Class: @A5(mm:ShiftUp) 

If no value is found, normal empty value handling processing is applied. Similar directives provide for shifting down (mm:ShiftDown), and to allow shifting to the left (mm:ShiftLeft), or to the right (mm:ShiftRight).

Iterating Over a Range of Cells in a Reference

Obviously, most mappings will not just reference individual cells but will instead iterate of a range of columns or rows in a spreadsheet. The wildcard character '*' can then be used in references to refer to the current column and/or row in an iteration. MappingMaster provides a graphical interface to specify these ranges. (They will soon be supported in the DSL.)

Example references using this wildcard notation include:

  • @A3
  • @A*
  • @**
For example, an expression that iterates over the grid D4 to G6 to create an individual of class Sale for each cell can be written:
  Individual: @** 
  Types: Sale **

This expression can be extended to assign property values to these individuals:

  Individual: @** 
  Types: Sale 
  Facts: hasAmount @**, 
         hasProduct @B*, 
         hasState @*2 

Manchester Syntax Coverage

The DSL does not support the entire Manchester Syntax. The following clauses are not currently supported:

  • OWL object property declarations
  • OWL data property declarations
  • OWL annotation property declarations
  • OWL datatype declarations
  • OWL literal type qualification
  • OWL disjoint classes
  • OWL equivalent and disjoint properties
  • OWL negative property assertions
  • OWL has key

Configuration Options

A set of global defaults can be specified for reference directives. The language has a number of clauses to specify these defaults.

The following examples illustrate the use of these clauses together with the current defaults.

  • mm:DefaultReferenceType Current default is Class. Other possible values include NamedIndividualObjectPropertyDataPropertyAnnotationProperty, and any XSD datatype.
  • mm:DefaultPropertyType Current default is ObjectProperty. Other possible value are DataProperty and AnnotationProperty.
  • mm:DefaultPropertyValueType Current default is xsd:string If we are expecting a (data or annotation) property value, use xsd:string
  • mm:DefaultDataPropertyValueType Current default is xsd:string. Other possible values include any XSD datatype.
  • mm:DefaultValueEncoding Current default is rdf:ID. Other possible values are rdfs:Labelmm:Literal andrdfs:Location.
  • mm:DefaultIRIEncoding Current default is mm:CamelCaseEncoding. Other passible values are mm:NoEncodemm:NoSnakeCaseEncodemm:UUIDEncode and mm:HashEncode.
  • mm:DefaultShiftSetting Current default is mm:NoShift. Other possible values are mm:ShiftUpmm:ShiftDownmm:ShiftLeft, and mm:ShiftRight.
  • mm:DefaultEmptyLocationSetting Current default is mm:WarningIfEmptyLocation.
  • mm:DefaultEmptyLiteralSetting Current default is mm:WarningIfEmptyLiteral.
  • mm:DefaultEmptyRDFIDSetting Current default is mm:WarningIfEmptyRDFID.
  • mm:DefaultEmptyRDFSLabelSetting Current default is mm:WarningIfEmptyRDFSLabel.
  • mm:DefaultIfOWLEntityExistsSetting Current default is mm:ResolveIfOWLEntityExists.
  • mm:DefaultIfOWLEntityDoesNotExistSetting Current default is mm:CreateIfOWLEntityDoesNotExist.
  • mm:DefaultLocationValue Current default is "".
  • mm:DefaultLiteralValue Current default is "".
  • mm:DefaultRDFID Current default is "".
  • mm:DefaultRDFSLabel Current default is "".
  • mm:DefaultLanguage Current default is "".
  • mm:DefaultPrefix Current default is "".
  • mm:DefaultNamespace Current default is "".

Summary

The MappingMaster DSL allows OWL axioms and entities to be created from spreadsheet content. The use of the Manchester syntax allows these OWL entities to be related to each other in complex ways.

Declaratively specifying mappings in this way has several advantages. The writing of these mappings does not require any programming or scripting expertise. These mappings can be shared easily using the MappingMaster GUI, which can save and load theese mappings. The mappings can also easily be executed repeatedly on different spreadsheets with the same structure.




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章