SWIG-UK Special Event: Graham Klyne Building a Semantic Web accessible image publication repository

Graham begins by offering a little background information on why they want to be able to publish images using SemWeb technologies.

Previous approaches involved general purpose image databases based on conventional relational technology which was useful and worked but died due to licensing restrictions on the data.

With the semantic web technologies emerging a Semantic Image database was created using native RDF Storage. There was some success, but they had to develop all the heavy lifting, and fragility due to tightly coupled components. Graham commented on how this touched on Ian Davis’s talk and how useful a platform might be and how difficult it is to build one.

Their current approach is to based in Data Web Philosophy : the idea of linking available web data rather than creating new application stores. They based this on Southampton Uni’s EPrints. They use Jena and Joseki to provide a SPARQL endpoint for the metadata.

What does semantic web accessible mean?

Image metadata is accessible and queryable from multiple sources using SPARQL and images should be accessible using simple HTTP requests.

EPrints is an “OAI repository” which uses a common metadata ( e.g. Dublin Core ).

Graham goes on to talk about Global vs Local access – when accessing metadata from multiple repositories – they want to be able to get away from a single global coordinated index to more local and uncoordinated ones.

The problem with this is people will use different schemas, so over time connecting data together becomes and issue or concern. So they are looking at developing strategies to address this and over time perhaps a common schema might emerge. Currently there are two strategies for metadata conversion:

  • Meta Data Re-writing – which involves making an extra copy of the data.
  • Query Re-writing – instead of changing or copying data they change the query.

Graham believes a combination of the two is required. He goes onto describe an overview of their implementation – collect data using OAI and then use Joseki. They have had to modify the EPrints software as well as modify the database to accommodate domain metadata. What they have though is not a generic solution but they have achieved the creation of a platform ( not a semantic web platform ) within 6 weeks which is different to their previous experiences.

They are currently looking at implementing user interfaces – so they are looking at tools that can do faceting, and have a done some experimentation with JSPace and have implemented a facetted browser that uses a Joseki endpoint. They are looking at trying to use mspace, from Southampton, but haven’t been able to get a hold of the software yet. Graham goes onto show some screen shots of their User Interface.

Lessons learned :

  • Available tools do support Semantic Web accessibility, Joseki has been key to their progress.
  • Creating effective user interfaces is not easy.
  • Importance of loose coupling.

Wish list:

  • Sparql Update support to Joseki, will facilitate doing meta data updates from external hosts.
  • More generic web data publishing tool ( eg. METS )
  • Query Distribution
  • Merging data from multiple uncoordinated sources. ( FlyWeb )
  • Improving user interface will be an ongoing task

My Thoughts

What strikes me is that their requirements aren’t a million miles away at all from some of the basic services that the Platform provides. I’d be curious to see what they might be able to achieve if they had a Platform store which combines metadata and content together, or whether the same problem could be solved differently on the platform.

Questions: Can you give us some indication of the user tasks your trying to support? Graham describes the process scientists currently go through. From the lengthy description he provides the Interaction Designer in me wonders with taking a UCD design approach would be better to evolving an interface that would support them.