IMS Logo

IMS Query Services White Paper

Version 1.0

Copyright © 2005 IMS Global Learning Consortium, Inc. All Rights Reserved.
The IMS Logo is a registered trademark of IMS/GLC
Document Name: IMS Query Services White Paper
Revision: 06 June 2005


Date Issued:
06 June 2005
Latest version:
http://www.imsglobal.org/query/imsQueryServices.html
Register comments or implementations:
http://www.imsglobal.org/developers/ims/imsforum/categories.cfm?catid=17
IMS Global Learning Consortium has made no inquiry into whether or not the implementation of third party material included in this white paper would infringe upon the intellectual property rights of any party.


Recipients of this document are requested to submit, with their comments, notification of any relevant patent claims or other intellectual property rights of which they may be aware that might be infringed by any implementation of the white paper set forth in this document, and to provide supporting documentation.

THIS WHITE PAPER IS BEING OFFERED WITHOUT ANY WARRANTY WHATSOEVER, AND IN PARTICULAR, ANY WARRANTY OF NON-INFRINGEMENT IS EXPRESSLY DISCLAIMED. ANY USE OF THIS DOCUMENT SHALL BE MADE ENTIRELY AT THE IMPLEMENTER'S OWN RISK, AND NEITHER THE CONSORTIUM, NOR ANY OF ITS MEMBERS OR SUBMITTERS, SHALL HAVE ANY LIABILITY WHATSOEVER TO ANY IMPLEMENTER OR THIRD PARTY FOR ANY DAMAGES OF ANY NATURE WHATSOEVER, DIRECTLY OR INDIRECTLY, ARISING FROM THE USE OF THIS WHITE PAPER.

Table of Contents


1. Introduction
     1.1 Overview
     1.2 Scope and Context
     1.3 Why a Query Service?
     1.4 What is a Query Service?
     1.5 Single Target Query vs. Federated Query of Multiple Collections
     1.6 Terms and Definitions
     1.7 References

2. Composition of Query
     2.1 Query Language
           2.1.1 Query Lexicon
           2.1.2 Query Syntax
           2.1.3 Query Semantics
     2.2 Query Binding
     2.3 Abstract Versus Concrete Query Languages

3. Composition of Query Services
     3.1 Query Services Interface Specification
     3.2 Configuration
     3.3 Query Services Binding
           3.3.1 Concrete Bindings
           3.3.2 Abstract Bindings
     3.4 Security
     3.5 Diagnostics and Error Handling

4. Query Results

5. Description of Services
     5.1 Service Self-Description

6. Achieving Interoperability
     6.1 What is it Necessary to Agree About?
     6.2 Potential Conflicts and Their Resolutions
     6.3 Federated Query Issues

7. Query Specifications
     7.1 Description
     7.2 SQL
     7.3 CQL
           7.3.1 Query Syntax
           7.3.2 Query Binding
     7.4 OQL
     7.5 XQuery
           7.5.1 Implications of Using XQuery
     7.6 Z39.50 Type-1 (RPN)
           7.6.1 Scope Operators
           7.6.2 Attribute Sets
     7.7 Google

8. Query Service Specifications
     8.1 SPARQL
     8.2 SQI
     8.3 OKI Digital Repository OSID
     8.4 IMS Digital Repositories Interoperability
     8.5 ECL
           8.5.1 ECL Query Functionality
     8.6 SRW (Search-Retrieve Web Service)
     8.7 SRU (Search Retrieve URL Service)
     8.8 CGM
     8.9 Google
     8.10 ZOOM

9. Conclusions
     9.1 Query Language
     9.2 Query Service
     9.3 Self-Description

Appendix A - Use Cases
     A.1 - IMS Enterprise Service Use Cases
     A.2 - Learning Design Use Cases
     A.3 - Finding Learner-Preference-Appropriate Resources
     A.4 - Seeking Modality-Appropriate Content for Lesson Preparation
     A.5 - Timetable and Resource Availability
     A.6 - Interoperability Requirements of an Educational Broker
     A.7 - ARIADNE Federated Search Layer
     A.8 - ARIADNE Federated Search Layer
     A.9 - EdNA Federated Search for Web Resources

About This Document
     List of Contributors

Revision History

1. Introduction

1.1 Overview

This document is a white paper published by the IMS Global Learning Consortium that explores the concepts associated with queries, and is being written for the primary purpose of informing the development of IMS specifications that use queries as part of their service definition.

1.2 Scope and Context

The IMS Global Learning Consortium has established a Special Interest Group (SIG) for Query Services. The requirement for this work is currently being driven by unmet requirements for supporting queries identified by the IMS Enterprise [IMS-ESWS], IMS Learning Design [IMS-LD], and IMS Digital Repositories Interoperability [IMS-DR] specifications.

This document provides an outline of the context for the IMS Query SIG, some basic terms and definitions, and points to the potential scope of work of the SIG.

1.3 Why a Query Service?

Internet search engines, such as Google [GOOGLE], permit the easy retrieval of documents (e.g., html pages, rtf, or pdf documents) available on the Internet on the basis of the information contained in the documents themselves. Some resources cannot be directly searched by Internet search engines because they are neither digital (e.g., physical books) nor searchable using a text-based search engine (e.g., movies or multimedia documents). Making those resources searchable necessitates their representation by precise intentional descriptions, sometimes referred to as descriptive meta-data. Agreements on the descriptions are generally represented in structured form as an information model and expressed as schemas. Each instance of such meta-data is the description of an individual resource (e.g., the meta-data used to describe a movie stored in a digital library, the bibliographic notice describing a book stored in a library).

A prominent Metadata for Education FAQ [CETIS] notes that "search engines are effective precisely because they use metadata such as the number of appearances of a search term in a document and the number of links to a document to determine what results to display and in what order to display them. This kind of metadata does not, however, meet all the requirements of someone looking for resources in an educational or training context. For example, search engines are not effective for finding resources at a particular grade level, with a particular pedagogical approach, of a particular size, associated with a particular competency, designed for a particular community, etc. Search engines currently do not have the means to generate this type of metadata. Furthermore, different organizations may describe the same resource in different ways. Search engines do not have access to these different descriptions and, in the case of educational resources that are not on the public Web, may not have access to the resources at all."

This is one example of a Query Service, for the discovery of learning resources. Other types of Query Service include the provision of a query capability for an item bank of assessment items, or of a query service for locating persons based on criteria such as course membership.

Usually, four steps are necessary to obtain a resource:

  1. Querying and evaluating - selecting a resource that satisfies user needs on the basis of a query;
  2. Returning the results of a query;
  3. Resolving the location of the selected resource (on the basis of a resource identifier found in the descriptive meta-data); and
  4. Consuming the resource from its location.

This paper examines the elements involved when carrying out the first and second of these steps; i.e., when a Query Agent (the source of a Query) sends a query to a Query Service (the target of the Query) and, eventually, gets results.

While a specific implementation of a service may incorporate methods for resolving and consuming resources, as well as for executing queries and returning results, the definition of a Query Service within this paper only includes the first two steps of this process. This scope is defined in Section 1.4.

The IMS requirements that prompted this activity indicate that there is information other than content resource meta-data that needs to be queried, for example, information about people, groups and their memberships. This information can also be represented in meta-data schema. This type of information will generally not be available on the public Web, and the preliminary use cases indicate a very precise form of query is required in order to represent this information. In many instances the result of the query will be the goal of the transaction.

Some IMS specifications have moved to a behavioral model, with defined service interfaces. Some consequences of this change in approach include: there is generally no longer a single meta-data schema used to describe all the information that is required to complete a single query; and, it is increasingly apparent that there should be no expectation that the data that is exposed by the query service is held in the same schema in the application that is providing the service.

1.4 What is a Query Service?

For the purposes of this paper a Query is a set of criteria that can be used to discover resources and a Query Service is a specification for a service endpoint that, at a minimum, accepts a query and returns a set of results.

A Query Service is distinct from a Resolver Service (which takes an identifier or a set of meta-data and returns a locator for the resource) and a Read Service (which takes an identifier and returns the resource identified).

A Query Service is different to a Question and Answer Service (virtual reference service), where there is an expectation of authority in the response to a question, usually also including workflow management inferences. Query Services will almost certainly, but not necessarily, be component parts in any implementations that provide and manage Question and Answer Services.

1.5 Single Target Query vs. Federated Query of Multiple Collections

The focus of this white paper is that of precise and complex queries directed at a single target dataset (single collection). There are currently a number of activities in both the learning and information services communities that address federated query across multiple collections in a distributed environment. These federated query services need to address important issues that may be in addition or have different priorities to those of a single target query. Federated Query is addressed in this paper only to the extent to which consideration of federated query issues might impact the manner in which query is implemented.

1.6 Terms and Definitions

Concrete Query Language
A Query Language that permits queries that directly refer to elements of the queried data repository schema.
Criteria
An expression structured using Query Language against which each resource in the Search Scope of the Query Service is evaluated.
Query
A combination of a Query Scope, Criteria, Result Record Definition, Sort Specification, and Query Modifier as required by a Query Service.
Query Agent
An agent (software component or service) that initiates a Query using a Query Service.
Query Binding
A machine-readable description of the manner in which the syntax and semantics of the query are expressed, for example as a String, or in an XML structure, or in a binary form.
Query Binding Instance
A representation of a Query using a specific Query Binding.
Query Lexicon
The set of fundamental units (also called lexemes) of a Query Language.
Query Language
The combination of Query Syntax, Query Semantics, and Query Lexicon.
Query Modifier
An instruction that governs the execution of the Query but is not part of the Search Scope, Result Record Definition, Criteria, or Sort Specification. An example of a Query Modifier is a parameter that controls how much time the Query can be allowed to take before returning a result set.
Query Scope
A set of resources to be searched, such as a specific named collection.
Query Semantics
Determines what syntactically correct queries mean by defining agreements about any meaning or interpretation associated with the query.
Query Service
A service that, at a minimum, accepts a Query and returns a Result Set.
Query Syntax
A set of operators, together with rules that describe how those operators can be combined, and the 'entities' that operators defined by the Query Syntax can operate upon.
Question and Answer Service
A 'virtual held desk' or 'virtual reference service' that provides answers, either by human agents or automation, to questions.
Read Service
A service that takes an identifier and returns the resource identified.
Resolver Service
A service that takes an identifier or a set of meta-data and returns a locator for the resource.
Result, Result Set
The output from a Query Service in response to a Query, optionally augmented with other information describing the results of the Query as a whole, such as meta-data describing how the Query was processed.
Result Record
A description of a resource returned in a Result Set; a set of properties, and possibly other descriptive information, and may be expressed according to a definition provided in the Result Record Definition.
Result Record Definition
The specification of the structure of a Result Record.
Service Binding
The binding used to support the implementation of the Service Interface; for example, the SOAP binding of SRW.
Service Configuration
The definitions of the parameters that are used in a Query Service; for example, the Result Record Definition, Sort Specification, specification of number of records to return, and Query Modifier.
Service Interface
The definition of the behaviors (operations) supported by a Search Service.
Service Registry
A service that manages provides a Service Self-Description for a number of different services.
Service Self-Description
A mechanism for exposing the specification of a service; for example, to enable a Query Agent to be dynamically configured according to the specification of a Query Service.
Sort Specification
A specification of an ordering on the Result Records in the Result Set.
Search Scope
See Query Scope.
Term
A part of a Criteria expression, typically a single pattern, word, phrase, or operation.

1.7 References

[ANSI]
American National Standards Institute (1992), "ANSI X3.135-1992, Database Language SQL", http://www.ansi.org/
[CETIS]
CETIS Metadata for Education FAQ, http://www.cetis.ac.uk/metadatafaq/FrontPage
[CORNELL]
A Distributed Digital Library Of Mathematical Monographs, http://www.library.cornell.edu/mathbooks/workdocs.htm
[DC]
Dublin Core meta-data initiative, http://www.dublincore.org
[DMS05]
E. Duval, D. Massart, B. Simon, S. Ternier, and F. Van Assche (2005), Simple Query Interface (SQI) for learning repositories; Version 1.0 beta; Public Draft, http://www.prolearn-project.org/lori
[ECL]
EduSource Communication Layer, http://ecl.iat.sfu.ca
[FORTH]
The RDF Query Language (RQL), obtained from http://139.91.183.30:9090/RDF/RQL/
[GOOGLE]
Google, http://www.google.com
[GOOGLEa]
Advanced Google Search Operators, http://www.google.com.au/help/operators.html
[GOOGLEb]
Google Web APIs, http://www.google.com/apis/
[IETFa]
The TLS Protocol, RFC 2246, http://www.ietf.org/rfc/rfc2246.txt
[IETFb]
RFC 2617 - HTTP Authentication: Basic and Digest Access Authentication, http://www.faqs.org/rfcs/rfc2617.html
[IETFc]
RFC 1050 - RPC: Remote Procedure Call Protocol specification, http://www.faqs.org/rfcs/rfc1050.html
[IMS-ESWS]
IMS Enterprise Services Specification, http://www.imsglobal.org/es/
[IMS-LD]
IMS Learning Design Specification, http://www.imsglobal.org/learningdesign/
[IMS-QTI]
IMS Question and Test Interoperability Specification, http://www.imsglobal.org/question/
[IMS-DRI]
IMS Digital Repositories Interoperability Specification, http://www.imsglobal.org/digitalrepositories/
[INDEXDATA]
Chapter 8, Supporting Tools, http://www.indexdata.dk/yaz/doc/tools.tkl
[ODMG]
Object Data Management Group, http://www.odmg.org/
[ODP]
ODP - Open Directory Project, http://dmoz.org/
[OKI]
Open Knowledge Initiative, http://www.okiproject.org/
[JAVA]
Java Programming Language, http://java.sun.com
[SAML]
Security Assertions Markup Language, http://www.oasis-open.org/committees/security/
[SAMLa]
SAML Version 2.0 Metadata, http://docs.oasis-open.org/security/saml/v2.0/saml-metadata-2.0-os.pdf
[W3Ca]
Web Services Description Language Version 2.0 Part 0 Primer, http://www.w3.org/TR/2004/WD-wsdl20-primer-20041221/
[W3Cb]
SOAP Version 1.2 Part 0 Primer, http://www.w3.org/TR/2003/REC-soap12-part0-20030624/
[W3Cc]
SPARQL Query Language For RDF, http://www.w3.org/TR/rdf-sparql-query/
[W3Cd]
SPARQL Overview, http://www.w3.org/2005/Talks/12May-SPARQL/all.html
[W3Ce]
RDF, http://www.w3.org/RDF/
[W3Cf]
SPARQL Protocol for RDF, http://www.w3.org/TR/2005/WD-rdf-sparql-protocol-20050114/
[WSPF]
Web Services Policy Framework, http://www-128.ibm.com/developerworks/library/specification/ws-polfram/
[ZING]
z39.50 International:Next Generation, http://www.loc.gov/z3950/agency/zing/zing-home.html
[ZINGa]
Zeerex: The Explainable Explain Function, http://explain.z3950.org/
[ZINGb]
CQL - Common Query Language, http://www.loc.gov/z3950/agency/zing/cql/
[ZINGc]
CQL Context Set, http://www.loc.gov/z3950/agency/zing/cql/context-sets/cql.html
[ZINGd]
ZOOM: The Z39.50 Object-Orientation Model, http://zoom.z3950.org/
[ZINGe]
SRW Service Definition, http://www.loc.gov/z3950/agency/srw/service.html

2. Composition of Query

For a Query to take place, regardless of the application of use (bibliographic, directory, learning objects, or assessment item banks), there needs to be in place three key agreements between the Query Agent (the agent making the Query) and the Query Service (the agent handling the Query):

Informally, a Query is a question. Given a set of data instances, answering such a question consists of selecting a subset of the data instances that fulfill the query criteria. Queries can be seen as open sentences (i.e., logical predicates) that can be turned into meaningful statements when their variables are replaced by concrete or abstract references to the schema being queried.

A data instance is part of the Query Result when the information it contains turns the query predicate into a true statement.

Additionally, sometimes answering the question also consists of modifying the format of the selected subset of data (e.g., keeping only some of its elements and presenting them in a certain way) and/or deriving information from it (e.g., compute an average), or including data from a related set of data.

Queries are expressed in a query language. Depending on the expressive power of its Query Language, a query might specify its result format and/or derive information. In order to be capable of being processed, queries need to be in a machine-readable format defined by a binding.

The rest of this section is structured as follows: Section 2.1 reviews the main characteristics of query languages; Section 2.2 discusses the use of bindings; while Section 2.3 explores the differences between concrete and abstract Query Languages.

2.1 Query Language

Query Languages are characterized by their lexicon, syntax, and semantics.

2.1.1 Query Lexicon

The Query Lexicon is the set of fundamental units (also called lexemes) of a language. Each of these units belongs to a particular syntactic category and has a particular meaning (or semantic value). Among other syntactic categories, a Query Lexicon has references to elements of one or more data models expressed as either a concrete or abstract schema. As an example, the SQL query language permits constructing queries that refer to elements (e.g., tables, columns) of schemas in the relational model.

2.1.2 Query Syntax

The syntax defines the correct form for legal queries (i.e., how the fundamental units belonging to the different syntactic categories can be combined together to build valid queries).

The Query Syntax typically consists of a set of operators, together with rules that describe how those operators can be combined, and the 'entities' that operators defined by the Query Syntax can operate upon. For example:

Operators:

Entities:

2.1.3 Query Semantics

Query Semantics determines what syntactically correct queries mean by defining agreements about any meaning or interpretation associated with the query. Semantics may be defined within a query specification or be defined by 'out of band' community agreements such as profiles. The role of semantic agreements in interoperability terms is to ensure that a Query has retrieved valid result sets.

For example, the query "author=Scott" will not return a valid result set if the meaning of "author" has been defined in this context as actually containing a numerical reference to an index, in which case the query should be "author=1456767". In most cases, however, implicit agreement may be sufficient, especially with abstract query languages, to obtain sufficiently valid results.

2.2 Query Binding

A Query Binding is a machine-readable description of the manner in which the syntax and semantics of the query are expressed; for example as a String, or in an XML structure, or in a binary form. This is entirely separate to the manner in which the query is eventually transported using a Query Service, and instead refers to the manner in which the query itself is allowed to be constructed according to the rules of the query specification.

For example, the CQL (Common Query Language) specification has two main supported bindings: one a simple string representation, the other an XML model (XCQL).

These 'bindings' of the Query are independent of the binding of the Query Service, which may have its own further rules about the binding of the Query.

For example, in the SRW specification, XCQL is permitted only for providing echoed requests, while the string encoding is the only one supported for making search requests.

A Query Service may transform the query binding into some other query binding to support a particular Query Service binding; this is generally out of the scope of a Query Language specification.

For example, while an SQL query is typically encoded as a String by the author of the Query, it may be transformed to a binary representation by the database connectivity layer (ODBC or JDBC) transparently without user intervention.

A Query Binding Instance is an instance of a query binding; that is, it is the actual representation of a query in the binding language. For example a VSQL query in the XML format according to the VSQL XSD binding.

In addition, it can be necessary to choose an encoding for the query (e.g., UTF-8).

2.3 Abstract Versus Concrete Query Languages

A Query Language is said to be concrete when it permits queries that directly refer to elements of the queried data repository schema. That is, the query lexicon matches the units of the data storage method, such as the tables and columns of a relational database, or the elements of an XML schema.

A Query Language is said to be abstract when its queries refer to an abstract schema (i.e., a schema that does not correspond to the actual representation of the data in the repository). Queries written in an Abstract Query Language cannot be processed directly by a data repository. They first need to be translated into one of the concrete query languages supported by the query engine of the repository. The points of conduct and choreography of these translations will need to be addressed in the implementation of query services and directly relate to how applications consume and expose such services. However, the conduct of translations is not part of the service and is outside the scope of this paper.

Concrete Query Languages are defined so that the meaning of a query can only be interpreted in an unambiguous way. The situation is different with an abstract language with which the semantics of a query not only depends on the language specification but also on the way the query is defined in relation to the data being queried.

When forming a query using a Concrete Query Language, the user needs to know the structure of the data being searched. SQL and XQuery are examples of concrete syntaxes. In the case of SQL, a query of the form "select * from books where author = 'smith'" implies that there is a table called books which contains a column called author. The following example in XQuery implies that the XML document being searched has elements called "bib" containing elements called "book", which in term have sub-elements "author" and "title":

<results>
  {
    for $b in doc("http://bstore1.example.com/bib.xml")/bib/book,
        $t in $b/title,
        $a in $b/author
    return
        <result>
            { $t }    
            { $a }
        </result>
  }
</results>

Typically, Queries in a Concrete Query Language also define the structure of the returned results as a subset of the structure in the data being searched.

An Abstract Query Language, on the other hand, does not require specific knowledge of the structure of the data being searched. So a CQL query of the form "author = smith" does not imply that there is any element in the data being searched explicitly called "author". An Abstract Query Language does, however, require the query engine to (intelligently) decide how to perform the search.

Typically this involves translating the Query into the equivalent Concrete Query Language. So if a SQL database contained two tables called English_Books and French_Books, the abstract query "author = smith" may translate to the concrete SQL query "select * from English_Books, French_Books where English_Books.writer = "smith" or French_Books.auteur = "smith".

Typically, queries in Abstract Query Languages do not define the structure of the returned results.

3. Composition of Query Services

Query Services consist of the set of services involved when carrying out a Query. Once specified, Query Services need to be bound to a technology and deployed.

To execute a Query requires a number of agreements that need to be in place:

A means of discovering services is also crucial if query services are to be deployed in a distributed environment.

3.1 Query Services Interface Specification

Query Services can be seen as layers of abstraction on top of descriptive data repositories. The simplest possible Query Service takes a Query as an argument and returns a result; for example, SearchRetrieve (SRW), SELECT (SQL), Query (Google).

A relational database that accepts SQL queries and returns result sets is an example of such a very simple service.

A query service might support other types of operations, for example:

3.2 Configuration

Additional Query Services functions can be used to:

3.3 Query Services Binding

In order to be used, Query Services need to be bound to a technology. These bindings can be either concrete or abstract.

3.3.1 Concrete Bindings

Concrete bindings are bindings associated to a technology that can be directly used to implement the service behavior; for example a Java binding, i.e., a Java API of a specification whose semantics can be directly implemented in the Java [JAVA] programming language.

3.3.2 Abstract Bindings

Abstract bindings are bindings in abstract language - for example, WSDL [W3Ca] - that in turn needs to be bound into an implementation-specific technology - for example, SOAP [W3Cb].

3.4 Security

In the context of Query, security can refer to:

The mechanisms by which these security requirements are met are part of a Query Service rather than the Query. Security can be configured and managed at three levels:

For example, Transport Layer Security [IETFa] can be used to encrypt HTTP traffic; the key exchange required to initiate secure communication can also be used to establish trust between the Query Agent and the Query Service (that is, to verify their identity).

HTTP Basic Authentication and HTTP Digest Authentication [IETFb] are also examples of transport-level authentication mechanisms for identifying the user; whereas WS-Security Token Profiles, and the Security Assertion Markup Language [SAML] are examples of message-level authentication, independent of the transport itself (HTTP) but still part of the Service Binding.

Typically, security mechanisms tend to be binding-dependent, and are usually either specified as part of the Query Service binding, or left unspecified, to be determined as appropriate by the implementation environment.

However, some Query Service specifications, such as SQI, do include security-like operations within their Interface definition; in the case of SQI this is the createSession (username, password) operation, which aims to create an authentication context with the Query Service by passing it a plaintext username and password tuple.

A limitation of such an approach is that it may either preclude the use of, or circumvent, other security mechanisms at the endpoint or transport level, such as the use of digital certificates, or MD5 challenge-response (which avoids sending passwords over the connection).

3.5 Diagnostics and Error Handling

One of the features of a Query Service is the manner in which it handles situations where there are errors and unexpected outcomes, and how the service enables clients to ascertain possible causes via the provision of diagnostic information.

Most Query Services provide error handling capabilities, of lesser and greater degrees of specificity.

In Z39.50, diagnostics are used to indicate a number of situations in which it has not been possible to process either the session negotiation or various aspects of the query submission or result presentation. Any Z39.50 operation can produce a diagnostic response, specifically:

Init Response: This can return initialization diagnostics, for example user authentication errors and service unavailable errors.

Search Responses: Possible causes for diagnostic errors during this phase are invalid or unsupported combinations of access points and or search operators, invalid database names specified to search, etc.

Result Record Presentation: This provides a special case, with two different kinds of diagnostics:

Diagnostics are also helpful because they are 'code based' and can therefore be used with internationalized descriptions, thus helping interoperability from an internationalization perspective. Diagnostics can also contain a string based "addinfo" (additional information) field that can contain further information about the root cause of the diagnostic (Reason text).

Frequently, the additional information actually contains more useful data than the diagnostic code itself.

Diagnostic sets are extensible in the same way attribute sets are, both globally in the standards environment, and locally, by private implementer-specific extensions.

SRW makes much the same use of diagnostics, except during the service negotiation stage (init) as this is not a service provided by the specification, with authentication being left to other protocol layer services.

As can be seen from the description specific to Z39.50 and SRW, diagnostics are an essential interoperability mechanism for error handling in cross searching systems.

SQI also provides a set of standard exception codes for diagnostic purposes, and these can be bound to a Java Exception subclass, or returned in some other format such as a SOAP message.

4. Query Results

Although most Concrete Query Languages specify the result format of a Query in the Query itself, the concreteness/abstractness and the fact of specifying or not a result format are two orthogonal dimensions of Queries.

Results Format
Query Languages


Abstract
Concrete
Predefined
QEL
SQL, XQuery
Dynamic
CQL, VSQL


Figure 4.1   Abstractness/concreteness and the fact of specifying or not the result
format are two different dimensions of query languages.

Result Record definitions can be bound to a Query Language or can be defined independently. When bound to a Query Language, the Query defines the format and semantics of the results record. For example:

These languages are usable in a broader sense as they don't limit themselves to the scenario in which a query always yields a number of result records. As an example, the following XQuery statement returns the sum of two numbers instead of a list of result records.

let $five := 5 return $five + 4

Query Languages that come in an abstract syntax often do not specify the meta-data scheme or its binding:

5. Description of Services

In order to execute a Query effectively, a querying application may need to determine the characteristics of a Query Service by interrogating some sort of remote agency.

One of the main reasons for enabling run-time discovery of service capabilities for a known Query Service is to allow the Query Agent to adapt itself to changes made to the Query Service, such as indexes being renamed, parameters becoming deprecated, or additional result record formats being supported.

Another use for service description is to support manually configuring Query Agent applications prior to their deployment, effectively converting the remote description of a Query Service into a local configuration format understood by the Query Agent.

A further benefit to service description is to enable dynamic binding to the Query Service endpoint, such that the location (e.g., the URL) of the Query Service does not need to be configured into the Query Agent but can be discovered dynamically, so that services can be clustered on multiple machines or moved without breaking clients that depend upon them.

A description mechanism may be implemented as either a single self-describing method provided by a Query Service, or it may be a centralized service supporting descriptions of multiple Query Services. The former may be termed a Service Self-Description; the latter is typically one of the capabilities of a Service Registry, and is not described further in this paper.

5.1 Service Self-Description

A Service Self-Description is typically implemented as a meta-data record of some type that can be retrieved from the Query Service by the Query Agent. This meta-data can describe various properties of the Query Service. One of the more well-known examples of a Query Service Self-Description meta-data format is Zeerex [ZINGa]; however, here are also some more general meta-data formats for describing services:

The mechanism for obtaining this meta-data differs according to the type of service and its hosting environment; for example, a WSDL record for a service may be obtained by invoking a "wsdl" operation on the service endpoint, or by invoking the service without any parameters. Similarly, a Zeerex record may be obtained from an SRW service by sending the Explain() request to the service, or from an SRU service by invoking it without any parameters.

At a lower level, the "Portmapper" capability described in the IETF RPC specification [IETFc] provides service meta-data at a network-protocol level, indicating the ports used to provide different kinds of network services such as FTP, HTTP, SSH, and so forth.

6. Achieving Interoperability

6.1 What is it Necessary to Agree About?

In order for a source to be able to send a Query to target and get a useful result, it is necessary that they agree on:

In some applications of Query, this is definitely already the case, such as bibliographic Query, where:

To enable Query interoperability, parties need to agree on one or more supported Query Languages and Query Service definitions derived from these parts.

For other applications, the picture is less complete. For example, when working on the IMS Enterprise Web Services [IMS-ESWS] specification, it is clear that some parts are missing:

The same picture is true to a lesser or greater extent for other e-learning applications of query, such as querying assessment item banks, encoding query service access from within learning objects, and querying learning object repositories.

6.2 Potential Conflicts and Their Resolutions

There are three levels at which it is possible to agree on some of the dimensions of a Query: the Query Language, the Query Services, and external agreements. Potential conflicts arise when more than one of these levels can be used to select a value for one of these dimensions.

Result Record Definition is a good example of such a dimension that can potentially be determined at all of the levels. A result format can be defined at the query language level whereas another one was selected at service level and eventually a third one was part of a more general agreement. When such conflicts are detected they can be either solved by the user (i.e., by raising a fault) or by giving different priorities to the different levels: query language has the priority on services that, in turn, has the priority on general agreements.

6.3 Federated Query Issues

A Federated Search service acts as a middle-layer between a client, issuing a search and a number of repositories that execute the Query. This section elaborates on two possible architectures and relates them to Query Services.

In the first architecture, the client software is responsible for distributing the Query to the different repositories and collecting the results. The software that runs on the desktop of an end user maps the search to one or more Query Languages and sends these queries to several query services. One of the main advantages here is computational scalability. As the client distributes the query and collects the results, this load is spread nicely over all clients. The client on the other hand must be able to communicate with different kinds of Query Services, potentially offering differing Query Languages. As a consequence of this architecture, the services offered by the repository, are supposed to be reachable by the client. This approach works well with synchronous Query Services as for the asynchronous scenario, a client must be reachable by the server, which is often not the case.

A federated search layer that resides on a server hides the complexity of mapping queries and distributing requests for the client. In this scenario, an end user is usually presented a light-weight application (e.g., a web page) while the server takes communication with the back-end Query Service on its account. It is the responsibility of the server to collect and merge the results. In theory, communication with back-end repositories is possible both synchronously or asynchronously, while in practice, asynchronous communication tends to be more scalable and error proof.

Besides federated search, other interoperability strategies are possible:

7. Query Specifications

7.1 Description

This section describes how existing specifications map to the model presented earlier in this paper.

7.2 SQL

Structured Query Language [ANSI] is used to execute queries against relational databases. SQL is more suited to local enquiries than distributed queries. The idea of encoding, transporting, and then running an arbitrary SQL query would not be a terribly good one, or acceptable to most database administrators. SQL requires knowledge of the underlying relational database schema (tables, column headings).

SQL describes:

7.3 CQL

Common Query Language is used to define a Query independently of the storage method used by the target collection. CQL has the same advantages of abstraction that RPN has, while being readable, writable and fairly intuitive to understand. It is not tied to a Query Service or a specific encoding:

"CQL, the 'Common Query Language', is a formal language for representing queries to information retrieval systems such as web indexes, bibliographic catalogs, and museum collection information. The CQL design objective is that queries be human readable and human writable, and that the language be intuitive while maintaining the expressiveness of more complex languages - CQL's goal is to combine simplicity and intuitiveness of expression with the richness of Z39.50's type-1 query" [ZINGb].

7.3.1 Query Syntax

The CQL model for queries defines contexts which can be applied to all elements of the syntax (i.e., indexes, relations, relation modifiers, and Boolean modifiers). The default context set (or query semantics) is the CQL context set which is reserved for features which are broadly applicable across multiple domains or protocols. There are other defined context sets [ZINGc]. CQL is an example of an abstracted syntax (see 2.3).

When constructing a search clause an index, relation, and term are used, the term is mandatory and the relation and index are optional, but if present, both must be specified. If the index is not specified, its default context is decided by the server, whereas if the context is not supplied for the relation, the default is CQL. The search clauses can be linked by Boolean operators (and, or, not, and prox). For example:

dc.title = "help and bath.author = "jones"
dc.title any "cat" 
bath.author cql.exact "smith, j" 

All parts of CQL are case insensitive apart from user supplied search terms, which may or may not be case sensitive.

7.3.2 Query Binding

CQL can be bound as XML or as a simple string.

7.4 OQL

Object Query Language [ODMG] was completed in 2000 and is published as part of the Object Data Standard Version 3. It was originally developed for use with CORBA. OQL is more suited to local enquires, and the same issues as with SQL apply.

7.5 XQuery

XQuery shifts the same issues of concrete data representations from relational database table/column names into XML elements. While it is good for interrogating known documents in a specific schema, it suffers from the same lack of abstraction. There is no guarantee that the storage schema of a repository offering a Query Service is the same as the data interoperability schema.

XML is an extremely versatile mark-up language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories. The W3C XQuery language uses the structure of XML intelligently to express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. Because Query Languages have traditionally been designed for specific kinds of data, most existing proposals for XML Query Languages are robust for particular types of data sources but weak for other types. The W3C XQuery Language is designed to be broadly applicable across all types of XML data sources, including both databases and documents. XQuery is designed to be a small, easily implemented language in which queries are concise and easily understood. XQuery is also based on a strong formal model (see XML Query 1.0 Formal Semantics).

The principal forms of XQuery expressions types, including the following:

The XML representation of an XQuery is defined in the W3C document called XQuery 1.0 and XPath 2.0 Functions and Operators. It was created by mapping the productions of the XQuery abstract syntax directly into XML productions. The result is not particularly convenient for humans to read and write, but it is easy for programs to parse, and because XQuery is represented in XML, standard XML tools can be used to create, interpret, or modify queries.

7.5.1 Implications of Using XQuery

XQuery comes with a rich and flexible syntax and a rich mechanism for specifying results. However, as XQuery belongs to the 'concrete query language' category the following characteristics make its use problematic in the distributed infrastructures.

Firstly, the queries formed using XQuery make a direct reference to the underlying XML schema of the queried documents. This makes a direct use of XQuery problematic in distributed scenarios where potentially different schemas are used to capture semantically identical information. However, when coupled with the commonly accepted schema for a particular domain the limitation can be overcome with mappings to local schemas. This would effectively move XQuery into abstract syntax category.

Secondly, limitation comes from the requirement of having an XQuery processor on the data source side. The XQuery processors are currently not commonly available and they have to be added on top of other mechanisms providing XML data from internal (typically relational) storage mechanism. This can be effectively overcome by posing certain limitations on otherwise highly flexible XQuery syntax by specifying query patterns that can be easily parsed and transformed to queries used internally.

The Query patterns combined with commonly accepted schema make XQuery a strong candidate even for situations where XQuery is not supported by the back-end data sources which use different schemas to represent data.

7.6 Z39.50 Type-1 (RPN)

A Z39.50 Type-1 (RPN) Query uses abstract Query Semantics and does not rely on knowledge of the underlying data structure. However it is hard to transport natively outside of Z39.50; it needs to be rendered in PQF or other interpretations of the ASN.1.

Z39.50 Type-1 (RPN) queries are the most commonly transported query model in use by the Z39.50 protocol. The Type-1 Query is defined in terms of ASN.1 and is usually encoded as a BER octet stream. The ASN.1 defines a tree structure optionally composed of boolean set operators, with leaf nodes that specify a search term along with a list of optional access qualifiers which modify the behavior of the target system in searching for that term. It is not common to see type-1 queries in their native binary form, but frequently, they can be seen as CQL / PQF representations of the underlying abstract tree. Many Tools currently exist to map from CQL/PQF/XCQL into the basic Type-1 Query tree.

7.6.1 Scope Operators

Unlike SQL, the Z39.50 Type-1 Query pre-defines the possible access points available to a search client. This has the advantage that Query users can access a named access point, without the need to express and understand a database access path (for example, in a relational database, it may be necessary to join through a several tables in order to search for a document and one of its contributing authors (Document->DocumentPerson->Person). SQL gives a great deal of flexibility in this regard, as the scope can be reused to provide additional Query restrictions. Type-1 queries lack this scope reuse, but gain in the simplicity of an exposed abstract model of access points.

7.6.2 Attribute Sets

Z39.50 relies upon large pre-negotiated collections of abstract access points called use attributes. These specifications define, for a given domain, what the appropriate access points, as deemed by the community, should be. The access points are fully separated from the semantics of each implementation, being defined as sequentially numbered concepts (for example, in the bib-1 attribute set, attribute 4 is allocated to the concept Title, this mapping is consistent, even if the underlying system uses document name, description, "TitleOfCourse", etc. Different kinds of title (for example, a personal title (Mr., Mrs., etc.) would be identified by a different (non-bib) attribute set.

Attribute sets do not only specify access points but other matching options such as relation (Operator, for example =,>,<,>=,<=, etc.), structure, truncation, completeness, position.

7.7 Google

Google [GOOGLE] supports a simple, yet sophisticated, Query structure, although most of the sophistication in terms of collections and vocabulary support is hidden behind a very simple user interface, and is only really evident when using one of its Query Services such as the Google Web Service.

Google syntax uses a list of terms, with optional special modifiers, with an implicit "AND" between each of the terms unless the operator "OR" is used. Quotation marks enable multiple-word terms to be provided and enable phrase-matching, for example "Web Services" as distinct from Web [and] Services.

The special modifiers on terms can act as operators, and include "-" to exclude a term (i.e., NOT), "~" to search synonyms, ".." to search number ranges, "host:" to restrict the search at a specific location indicated by the term, and "link:" to restrict the search to items that are linked to the location indicated by the term.

Others include "cache:", "related:", "info:", and so on. Google publishes a complete list of the modifiers and operators supported in Google queries [GOOGLEa].

So, despite the user interface, relatively complex queries are possible, for example:

"formal system" OR ~semantics -encoding 1996..2000

This searches for either the phrase "formal system" or the term "semantics" (or any synonyms thereof) but not the term "encoding", and any numbers from 1996 to 2000 inclusive.

When people talk about the "simplicity" of Google this is typically in reference to the user experience, rather than the Google Query itself.

Most terms are evaluated against the full-text index of the collection. Google provides the following collections for search using its Query Syntax:

Some collections have additional syntax capabilities; for example Google Scholar supports the "author:" term modifier.

Queries are encoded as a String, and in the case of the URL-based Query Service, this is URL encoded, and looks like:

http://www.google.com/search?q=%22formal+system%22+OR+semantics++-encoding+1996..2000

The result record meta-data format provides a range of meta-data, including the resource location, title, abstract, snippet (a piece of contextual text from the resource), and any catalog information if the resource is classified in the ODP directory [ODP]. The meta-data format is fixed, and is not dependent on the query itself.

Google's Query model, while specific to its own collections, is nevertheless an abstract Query Language, and could be adopted to any number of other collections and storage models.

8. Query Service Specifications

8.1 SPARQL

The Resource Description Framework (RDF) is one of the key pillars of the Semantic Web [W3Ce]. RDF is an extensible means of representing information about (learning) resources. One of RDF's design assumptions is that resources are identified by a Unique Resource Identifier (URI), allowing various users and agents to make assertions about uniquely identified things. RDF is designed for representing meta-data about all kinds of digital and non-digital artifacts making it a powerful means of integration over disparate sources of information. The graph-based structures of RDF can be serialized in XML. XHTML 2.0 is currently under development, which will support a seamless integration of RDF-based meta-tagging in HTML.

The W3C has designed SPARQL as a Query Language for RDF. SPARQL is designed to meet the following requirements [W3Cd]:

The development was aligned towards the following design goals [W3Cf]:

SPARQL has three defined result formats: a simple Yes or No answer (Ask), an SQL-like tabular variable binding (Select), and an RDF Graph (Construct, Describe).

The working group has also specified a protocol for exchanging SPARQL queries and results, called SPARQL Protocol for RDF [W3Cf]. A WSDL description, as well as an HTTP and a WSDL binding are available. The SPARQL Abstract Protocol is made up of two operations: Query and GetGraph. Conceptually, the arguments to the Query operation are threefold: (1) the query; (2) an entity responsible for handling the query; and (3) the target or targets against which the query is to be executed. The optional GetServiceDescription operation returns an RDF graph that describes the protocol and query language options for a particular query target.

8.2 SQI

Simple Query Interface [DMS05] is an API that defines a Query Service independently of the Query construct or target data structure.

SQI describes:

SQI is agnostic with regards the Query itself.

The Simple Query Interface (SQI) [DMS05] is an Application Program Interface (API) for querying heterogeneous repositories of learning object meta-data. SQI is developed on behalf of the CEN/ISSS Workshop on Learning Technology. Its main characteristics are:

Considering two repositories sharing at least a common Query Language and a common meta-data format, the following steps are necessary to enable one repository (referred as the source of the query) to Query the other (referred as the target of the query) using SQI:

Class Diagram of the Simple Query Interface
Figure 8.1   Class Diagram of the Simple Query Interface.

The API itself is depicted in the class diagram of Figure 8.1. It consists of fourteen methods that can be grouped into four categories: session management, query management, synchronous query management, and asynchronous query management.

Actually, session management methods are not part of the SQI specification itself and can potentially be replaced by any other session management mechanism that would be considered more appropriate. Current methods permit to open anonymously (createAnonymousSession) or not (createSession) and to close (destroySession) a session with the target repository.

The query management methods permit the configuration of query parameters such as the query language (setQueryLanguage), the format of the results (setResultsFormat), the maximum number of results returned (setMaxQueryResults), and the duration of a query (setMaxDuration).

In a synchronous query, Query results are returned as the result of a query call (synchronousQuery). Additional methods permit the choice of the number of results returned by a call (setResultsSetSize), to know the total number of results of a query (getTotalResultsCount), and to request additional query results when the total number of results is bigger than the number of results returned by the first call (getAdditionalQueryResults).

In an asynchronous query, Query results are sent by the target to the source of the query by calling a listener implemented by the source (queryResultsListener). This implies that the source has to indicate the location of the listener to the target (setSourceLocation) before sending an asynchronous query (asynchronousQuery).

The fault mechanism provided by SQI is intentionally unsophisticated. It aims at simplicity rather than richness in order to offer the greatest opportunity for consumption by a variety of applications. When a failure occurs, each SQI method is able to report it by throwing a fault (SQIFault) that specifies a predefined error code1 and a free-text message.

8.3 OKI Digital Repository OSID

The OKI Repository OSID [OKI] is designed to be a general abstraction that the user facing application uses to interact with many of the other Query technologies listed. Having the application call through an abstraction instead of directly invoking a specific mechanism, allows the application to work with a wide range of Query technologies and repositories. The Repository OSID provides the abstraction for other functions as well as Query such as storing and retrieving the object and viewing and updating meta-data. The service does not specify the transport, and therefore allows consistent treatment of both local and remote resources. It allows for multiple search and data formats. The concept of Search Type is used to coordinate these separate agreements of query syntax and semantics.

8.4 IMS Digital Repositories Interoperability

The IMS Digital Repositories Interoperability [IMS DRI] specification defines a reference model for pairs of services exposed by repositories including Search and Retrieve. The DRI specification makes a number of suggestions for query including use of XQuery over SOAP and Z39.50. SRW/SRU was under development at the time the DRI Specification was prepared, and was considered to be a future candidate specification for Query.

A number of projects have implemented query services based on the DRI reference model including ECL and IKI DR. These projects needed to define many implementation details which were left undefined in the DRI Specification. Implementations of DRI may not be interoperable.

8.5 ECL

The EduSource Communication Layer (ECL) [ECL] is an interoperability platform for connecting learning services into the network with the following goals:

The ECL consists of ECL Protocol, ECL Connector (a connecting middleware), ECL Registry for discovery of services, ECL Gateway (a middleware framework for building bridges between ECL network and non-ECL services), ECL certification authority, and ECL attribute authority to manage ECL user security profiles.

ECL Protocol is the language for all ECL network members connected to a heterogeneous network. ECL protocol defines each request and response as XML schema with the following structure:

ECL Message Header


Message type


Communication Id


Protocol


Version


Sender endpoint


Receiver service urn


Error message


Status
ECL Message Body


Payload

The message type identifies the type of the service. Currently, the ECL defines four requests identified to IMS DRI functions (search, gather, submit, and request) and their corresponding responses. The receiver service URN is the pointer to the handler that ECL will use to process the payload. The handlers are implemented by the service providers and are dynamically loaded into ECL environment. Other parameters in the header support message processing. The payload is defined for each type independently in the form of an XML schema. It should be noted that ECL can be used for any type of message processing Web services. New service types and corresponding payload definitions can be added to the protocol with all the benefit of ECL security infrastructure and without changing the infrastructure itself.

To connect to the ECL service providers have to implement a service handler with single method processEclMessage that receives ECL Protocol message serialized as an XML string.

ECL Connector is a component that implements the ECL protocol and standard IMS DRI services. The connector hides the complexity and provides a standard API for new and existing repositories to provide IMS DRI services and client applications to connect and use the services on the ECL network. ECL Connector requires institution repositories to implement connector handlers only for those services they want to expose to others. This is far simpler than implementing and deploying every service in each institution. The ECL Connector also facilitates version synchronization during the protocol evolution.

The ECL Registry is a registry of available ECL services. ECL registry is currently implemented in UDDI (using JUDDI as an implementation of standard UDDI). The ECL services are typically registered during the ECL Connector configuration process. Each record in the ECL registry describes a service from the end-user perspective and represents a resource collection rather than a repository. The records are classified using one of provided subject taxonomies. The ECL Connector provides a programming API to search the registry. The tool developers can use this API to develop end user tools that enable end-users to select repositories they want to.Once the user selects the repositories, the connector directs queries to the selected repositories.

The ECL connector is also used to connect end-user tools to ECL Network. As mentioned above, two APIs are provided: (1) to access ECL services and process the responses from the network, and (2) to search ECL registry. ECL Connector supports multithreaded federated search to and provides multi-user support for deployment into multi-user environments (such as Learning Management Systems).

One of the goals of ECL is to connect to repositories and networks that use other protocols. The ECL Gateway is a middleware framework that enables to build bridges to other protocols easily. The bridges to several other protocols and networks have been developed such as OAI, SRW/SRU (RDN network in UK), EdNA (Australia), SMETE (USA), SQI (Ariadne (Europe)), LionShare (gnutella based P2P network). All bridges for connected networks are registered in the ECL registry as any other ECL repository and are available to ECL users. In addition, the code for mapping between protocols is included into the next version of the ECL Connector and when the request for the querying bridged service is identified the code is loaded and request mapped directly at the client instead of the bridge (which is used when new bridged service is queried from older version of ECL enabled clients).

The ECL Security adds a layer on top of the ECL protocol that supports both security and privacy for ECL users. ECL security uses the latest WS-Security standards and enables developers to deploy services which require authentication and authorization decisions. Three main security profiles are support free access (standard ECL), repository managed security through user name and password mechanism, and federated security profile compatible with Shibboleth approach. All these profiles can be signed and encrypted. The records in ECL registry keep information about supported security profiles for each service.

8.5.1 ECL Query Functionality

The ECL Query functionality follows the IMS DRI recommendation and continues to support XQuery as one of its primary Query Languages. Unfortunately, not many repositories support XQuery, and furthermore, building XQuery is complex and user un-friendly. To deal with these problems, ECL also supports a Boolean tree query structure that is easy to construct and can be mapped to XQuery as well as to SQL query statement. The structure has two parts: text query, which is considered to be a basic search query, and advanced XML element query. Repositories, that do not support complex XML query, can apply text query and use ECL XML query filter to filter out unwanted results.

ECL also treats XQuery in a specific way. First, the ECL uses IEEE LOM as a reference model for the meta-data but treats the IEEE LOM model as an abstract model. This allows for mapping XQuery queries to the underlying repository data representation using the semantics of the IEEE LOM model. Patterns also enable to specify and use other query formats instead of XQuery. For example, most repositories in the ECL network use and/or tree query pattern.

Second, the ECL defines 'query patterns' as predefined Query schemas expressed in a valid XQuery format. The Query pattern defines the structure of the Query and the results format. In systems not supporting XQuery the Query pattern can be effectively mapped into other format such as and/or tree representing the Query. The system with the XQuery support can execute the pattern in the regular way. The information patterns supported by the repository are stored in the service description in the ECL registry and can be used by an ECL client tool to filter repositories that do not support patterns required by the tool.

ECL is a stateless protocol with optional resumption functionality for returning Query results in batches. The ECL client can specify the number of results to be returned as a parameter to the search request. If the number of results is larger than specified then the ECL provider includes a resumption token into the results message. Similarly, the ECL provider can be configured to limit the number of results sent in one batch. To get the next batch of results the ECL client sends an identical Query with the same Query ID and includes the resumption token from the previous batch. ECL Connector also includes caching capability for Query results which can be optionally enabled.

ECL Connector provides simple API and several utilities for preparing queries and processing results. These include on the client side complex query buildup using setBasicSearch() (for text search) and setAdvancedSearch() (for additional conditions on XML elements) functions, extracting and transforming query in different formats (XQuery, and/or tree, hashmap), meta-data cross-walking (between IEEE LOM, DC, IMS LOM 1.2, and CanCore), filtering dead links for the resources, etc.

8.6 SRW (Search-Retrieve Web Service)

SRW defines three operations in the Service Definition:

[ZINGe]

The SRW Query Service defines an encoding mechanism based on http and xml for transmitting queries and retrieving the results of these queries. The XML structure submitted as the query and the XML structure contained in the response are both defined using a WSDL definition for the web services protocol specific to SRW.

The underlying Query can be expressed in either CQL or XCQL. The response to a Query can either include result records or simply be a pointer to a named 'result set' which allows the retrieval of result records in subsequent operations.

The format of the result records is not tightly defined within the SRW definitions and therefore there is scope for an SRW service to support many different result record formats.

8.7 SRU (Search Retrieve URL Service)

SRU is very similar in operation to SRW. Its primary difference is the encoding of the Query. In SRU the Query is encoded in a URL using name value pair parameters and the response is returned as an XML structure.

The underlying Query should be expressed as CQL. The response to a Query can either include result records or simply be a pointer to a named result set which allows the retrieval of result records in subsequent operations.

The format of the result records is not tightly defined within the SRU definitions and therefore there is scope for an SRU service to support many different result record formats.

8.8 CGM

CGM is a protocol for distributed search. The acronym stands for "Cornell, Goettingen, Michigan" after the partners in the joint NSF project that developed the protocol, initially for searches across mathematical monographs [CORNELL].

CGM conflates the Query Service and Query, and isn't expressive enough to do arbitrary queries. On the other hand, it is not dependant on the data structure.

8.9 Google

Google [GOOGLE] supports several service bindings for Query operations. The most well known is its URL-based service which returns results in HTML. However, Google also provides a Web Services binding.

Google provides the following operations for its web service, which mirror the main functions also available through its website:

The Query operation accepts the query as a String, using the same format as the website interface (see section above), plus the start index of the results, and optional filtering parameters (for example, by language or for adult content).

The response message from the Query operation provides the total results (either estimated or accurate, as indicated with a Boolean), the collection of Result records, the start and end index values of the results, and the time taken. Uniquely, Google also returns a "search tip" that can be provided to users to help them improve their searching.

There is no means to vary the result record, or the query structure; they must always conform to the Google Query specification the Google record structure respectively (see Section 7.7).

The service binding provided by Google is to SOAP (using http), and the semantics of the service are described using a WSDL and set of XML schemas [GOOGLEb].

8.10 ZOOM

ZOOM [ZINGd] is another output of the Z39.50 Next Generation [ZING] project, like SRW and SRU. Unlike SRW and SRU, ZOOM is intended to provide a binding of a Query Service to various programming languages. ZOOM supports C, C++, Java, Python, and many other languages.

The ZOOM API model itself follows the general Z39.50 model, with operations for connecting, querying, scanning, and for working with records and result sets. The level at which these operations are described is very similar to SQI.

The core ZOOM model does not explicitly support asynchronous operation, although documentation exists in the Perl implementation of ZOOM that describes how this functionality is achieved.

9. Conclusions

Executing a Query requires that a set of agreements have been made between the Query Agent and the target Query Service. This paper has examined a number of the technologies and standards that can form part of this set of agreements. The main task from an IMS perspective and interoperability viewpoint is to identify the best fit of existing specifications for the task, identify gaps or poor fits that need to be addressed, and to identify where common Grammars can be agreed in each e-learning application of Query (e.g., Enterprise, Learning Design, Digital Repositories).

In general, while it would seem relatively simple to map from one Query Service to another (for example, from SQI to SRW, or from ECL to ZOOM), the same cannot be said for Query Languages. Specifically, the issues of mapping across syntaxes, and across data models, require a more in-depth understanding of both the target data source and the structure of the query syntax and lexicon.

This would indicate that the most important task for interoperability in Query is to limit the choices of query languages to enhance interoperability.

9.1 Query Language

As we have seen, Query Languages come in two 'flavours': concrete and abstract. For interoperability purposes, Abstract Query Languages have definite advantages, in that they free the Query Agent from requiring prior knowledge of the implementation of the target collection. It also enables the standardization of semantics, in the form of agreed data models and profiles, such as Dublin Core [DC].

That said, there are some advantages that can be gained from Concrete Query Languages for leveraging the capabilities of a particular implementation technology; for example, inference and reification in RDF, or structural relationships in XML.

For this reason, while supporting a standard Abstract Query Language is preferred, a Query Service specification should support a choice of Query Languages, for situations where a specialized Query technology is needed.

For supporting an Abstract Query Language, the most compelling choice is the Common Query Language [ZINGb], which satisfies all the requirements for an Abstract Query Language, has strong support and tool availability, and is extensible with any number of possible "context sets" for different data models.

Under some circumstances, for example when integrating existing heterogeneous Query Services, it can be acceptable first to focus on connecting these services and then to unify their query languages. Although initially this solution only provides partial interoperability, it enables the barriers to entry in a federation to be lowered, and the results it provides may encourage to progressively work on a better solution. The same approach can be adopted when adding support to a common query language such as CQL by first supporting subsets of it (e.g., a keyword search) before supporting the full-fledged language.

9.2 Query Service

For Query Services, for the most part the technologies available have strong similarities, and differ most widely not in their capability so much as their affordances; that is, they make some tasks easier to accomplish than others. This is mainly in the domain of non-functional requirements such as stateful and asynchronous behavior.

For example, SQI explicitly supports state management at the Query Service end of a transaction, although this can be easily implemented using SRW, which supports state management in the Query Agent.

Another consideration is the level of binding for a Query Service. Most of the technologies surveyed may be bound to an API (e.g., Java or .Net) or to a messaging protocol, such as SOAP via WSDL.

This enables a Query Service to operate at two levels; for example, SQI can be bound as an API, which is then implemented as an SRW SOAP-based service. Alternatively, the OKI DR OSID can be bound as an API and implemented as an ECL service.

Combining an API binding with an adapter for a protocol would seem to be a valid strategy for maintaining flexibility with regard to supporting Query Services.

9.3 Self-Description

Self-description of Query Services is an important capability for two reasons. Firstly, it enables developers and deployers of Query Agents to determine the configuration settings for utilizing a Query Service. Secondly, it can be used to enable the dynamic reconfiguration of a Query Agent at runtime.

The self-description of the Query Service may be delivered externally to the Query Service itself, perhaps using a Service Registry or separate "Explain" service. Alternatively the self-description may be deployed as part of the Query Service.

In either case, a Query Service should support self-description, whether deployed separately or not; and the self-description of the service should occur at several levels, enabling description of the supported Query Languages, data models and profiles, and also references to the Service Binding description (e.g., WSDL).

Appendix A - Use Cases

A.1 - IMS Enterprise Service Use Cases

Querying a Person, Group, or Membership using Creation Date - The learning system wishes to synchronize with the human resource management system (HRM) or student information system (SIS). It needs to acquire any newly created employee/student data from the HRM System.

Query a Person, Group, or Membership using Update Date - The learning system wishes to synchronize with the HRM / SIS system. It needs to acquire any recently updated employee/student data from the HRM / SIS.

Query a Person, Group, or Membership using Deletion Date - The learning system wishes to synchronize with the HRM/SIS system. It needs to acquire any recently deleted employee/student data from the HRM/SIS.

General Group Queries: A tutor wishes to get the current class list with name and brief details of each member of the class.

A.2 - Learning Design Use Cases

Access to a generic search service - A unit of learning (or sequence of resources) contains an instruction to make available to the learner a search service. The service could be specified to become available at a particular point in the delivery of the unit of learning, or it could be specified to be always present as a supporting service.

Access to a pre-constrained search service - This use case differs from case above, in that the search parameters have been pre-constrained by the designer of the learning design or sequence. For example, the search service may be required to constrain results to those within specific subject taxonomies, or to specific types of media. As a sub-case, it may be possible for the user interface to present the search fields with the predefined parameters visible and changeable by the learner. In the default case, however, the fields with predefined parameters are hidden from the learner.

Using a search service to dynamically deliver appropriate content - Instead of exposing the user interface of a search service to the learner, this use case involves the dynamic presentation of a resource within a sequence using a search service to discover the most appropriate resource. Here, the search parameters are predefined within the unit of learning or sequence, and the first search result returned by the service is presented to the learner.

Using a search service to deliver a choice of content - The search service returns a set of search results based on pre-defined parameters within the unit of learning or sequence, and the set of results is presented to the learner. There is no search interface presented to the learner, only the results of the search.

Sub-Cases: Location of service - The search service could be defined in advance within the learning design or sequence, for example using a URI to identify a specific repository. Alternately, the service could be described in generic terms, and the LMS or VLE then provide an appropriate search UI based on its configuration.
These sub-cases apply to all of the above Learning Design Use Cases.

A.3 - Finding Learner-Preference-Appropriate Resources

Submitted by:
Anastasia Cheetham, a.cheetham@utoronto.ca
Adaptive Technology Resource Centre
April 12, 2005
ATRC-003

Players

Assumptions

Description

The learner is accessing learning content through the LMS. The LMS has determined, by examining the meta-data associated with a learning object and comparing it to the learner's preferences, that the learning object does not meet the learner's requirements. The LMS searches the available content repositories for appropriate content that does, indeed, meet the learner's requirements. Upon finding an appropriate resource, that resource is delivered to the learner.

A.4 - Seeking Modality-Appropriate Content for Lesson Preparation

Submitted by:
Anastasia Cheetham, a.cheetham@utoronto.ca
Adaptive Technology Resource Centre
April 12, 2005
ATRC-004

Players

Assumptions

Description

An educator is aggregating various learning objects into a lesson plan. One of the learning objects is visually intensive and would be inappropriate for users requiring text-only resources. The educator searches the available repositories for learning objects that are equivalent to the visually intensive resource, but that provide alternatives to the visual content. The search tool process the available ACCMD to locate matches. The results include a learning object that the educator feels would be appropriate to assist learners in meeting the desired learning objective, but uses text predominantly. The educator adds this new resource to her lesson plan as an equivalent for the visually intensive resource.

Variations: The same scenario arises with resources that would be inappropriate for users who are unable to deal with other modalities: auditory, text or tactile.

A.5 - Timetable and Resource Availability

Submitted By:
Dr. Wolfgang Greller, Head of Learning Environments, UHI Millennium Institute, Scotland, UK;
wolfgang.greller@lews.uhi.ac.uk
Head of Learning Environments, UHI Millennium Institute
Lews Castle College
Stornoway
Isle of Lewis HS2 0XR, Scotland
Tel: +44 (0)1851 770-421
Fax: +44 (0)1851 770-001


Players

Learning Design, learning system, SIS - timetabling system managing availability of resources/rooms

Assumptions

Aresource required by a learning session has scheduled availability managed by an institution.

Description

(1) Institution uses a blended learning approach which combines videoconferencing (vc) studios and online delivery. (2) A course uses a remotely controlled utility for learning. (3) A course uses an offline resource, e.g. a horse for a veterinary science dentistry session.

The learning unit uses a timetabled resource and requires knowledge of its availability or when it will next become available. The learning system queries the timetabling application.

Transactions (optional) - If the result of a query is that a resource is currently unavailable, a proximity search from that point forward should show when a resource will become available next.

Exceptions

Zero entries might look as if the resource were available, when in fact the system does not know about it.

A.6 - Interoperability Requirements of an Educational Broker

Submitted by:
Dr. Bernd Simon  Bernd.Simon@wu-wien.ac.at
Homepage: http://nm.wu-wien.ac.at/people/simon.html
Department of Information Systems & New Media
Vienna University of Economics and Business Administration
Augasse 2-6, A-1090 Vienna, Austria  http://nm.wu-wien.ac.at
Tel: (+43-1) 31 336 x4328,      Fax: (+43-1) 31 336 x904328

The Vienna University of Economics runs a European portal called EducaNext.org for the purpose of providing a learning resource brokerage service to academia and research. Since 2003 the portal has collected over 700 high quality learning resources from its more than 1700 subscribers.

For the purpose of increasing the number of learning objects available in a world wide academic network of exchange portals called GLOBE, the portal provides a query interface. In order to lower the barrier of integration, the network has a quite simple semantic model also referred to as ontology as common denominator, basically consisting of a few meta-data attributes like title, description, etc. As a result of that the federated search is solely based on keywords forwarded to the various nodes of the GLOBE Network, including EducaNext. The nodes take the keywords received via the query interface as