Semantic Web Technologies

Semantic Web Technology extends ds9 technology with Ontology Management, Tagging and Semantic Search capabilities based on RDF/SPARQL based linked data resources.

Interaction of Semantics and Text

ds9 stores RDF based Ontologies in a Triple Store against which it executes SPARQL queries. Special RDF filter types add semantics to documents by Tagging and Classification based on recognized entities in the natural language content of the documents. We provide filter type templates that explain how to build filter chains to automatically build Ontologies, add thesaurus capabilities to a taxonomy or to simply use an Ontology to add context and structure to a document. Tagged concepts or their concept classes can be used as facets for faceted search in the ds9 Viewers and ds9 Semantic Search RDF based thesauri can be used for search term expansion in our viewers.

Tagging - Semantic Annotation

Using an RDF based thesaurus ds9 Semantic Web Technology is semantically annotating content of retrieved documents.

The RDF Tagger, a ds9 filter type, is processing a set of input documents, adding customized annotations to all chunks of text that match a concept or any of the synonyms of the concept provided in a thesaurus. The annotated concepts can be used in subsequent process steps for document classification based on occurrences of annotated concepts, to build document meta data information or to store normalized document information.

Concept Classification

As can be seen in the screen shot of the RDF Tagger filter type, the tagger supports hierarchical concepts. Its parameters require a root element of the concept hierarchy, the concept class and a relationship which describes the is-a-relationship.

Entity Recognition

Depending on the ontology that is used for tagging, the matched lexical terms and all defined synonyms can represent semantic entities that are recognized in the natural text of the document. These entities can be extracted from the document by subsequent filter steps for further analysis or statistical assessment.

Recognition of multiple types of entities can be achieved by processing several RDF Taggers subsequently. The RDF relations used to define the synonym-of relationship can be defined freely, thus there are no constraints as to what RDF based ontologies to use.

Ontology Management

ds9 filter types cannot only read from the triple store, they can also write to RDF containers, thus manage RDF ontology data.

By extracting concept data and meta information from websites or databases, e.g. special purpose Wikis, ds9 can automatically create ontologies or enrich existing ontologies very much like dbPedia is built and maintained by extracting concept information from Wikipedia.

Semantic Search

Semantic Search in ds9 implements query side semantics as well as content side semantics to improve the user's search experience and optimize search results:
  • At search time query terms are expanded with synonyms defined in RDF ontologies
  • Annotations in the connected SEARCHCORPUS® are used to build facets for a faceted search
Extracted concept classes can be used by the Semantic Search to classify search results and to build search facets based on identified concept classes allowing the user to drill down along the concept hierarchy.

Semantic search can use Recognized Entities, e.g. location, industry, ..., to build facets for a faceted search as well. Search results can be either accumulated by selecting multiple facets or further restricted.

Query Term Expansion

By defining RDF ontologies for Query Term Expansion, the user can automatically expand search terms by their synonyms. While the user is typing a search term, the Query Term Expansion queries the RDF files for matching concepts. Once the user selects a concept, this concept is marked for Query Term Expansion.

When the user hits expand query, all marked concepts are automatically expanded and converted into a valid search query. The user can edit this query manually before submitting it.