I’ve gone Mac

It’s been a busy month for a number of different reasons – mostly I’m still trying to come to terms with the death of my father – I’m not entirely sure if burying myself in work is the best way of dealing with it but so far it seems to be working, everyone @ Talis has been really supportive and the current R&D project I’m working on with a small team has helped me to totally immerse myself in a single problem and that’s made it easier to deal with things … plus what were working on is very innovative and so it feels really rewarding at the moment.

Anyway, as the heading of this post suggests I’ve gone Mac! and I love it!! When I returned to work this year I had a shiny new 17″ Mac Book Pro waiting for me. I have never used a Mac before I’ve always been firmly entrenched in the PC world, and for most of my development needs I would often run flavours of Linux inside of VM’s. The problem with this though is Windows has a host sucks and there’s only so many VM crashes I can put up with. Many of my colleagues chose to go down the route of wiping Windows off their laptops and installing Ubuntu. I seriously considered doing this but was convinced, primarily, by Rob and Chris and pairing with them or watching them do development work on their Mac Book Pro’s that Mac’s are a great alternative.

I spend a lot of time inside a terminal window and with Mac you have a fully featured bash shell which makes a huge difference in terms of productivity, on Window’s to get anywhere close I had to run CYGWin, or work in a Linux VM, … anyone who thinks that the Windows Command Shell is comparable needs to seriously seek help!

I spent a fair bit of time getting development tools installed and getting used to how different Mac OS X is to Windows or anything else I have used. So far Leopard has been a pleasure to use there’s been the odd quirk now and again, but nothing worth mentioning. Rob published a wonderful list of tool’s he installed on his Mac, which I basically used as a check list to get up and running. To his list I’d like to add the following:

CCMenu 1.0
Displays the project status of CruiseControl continuous integration servers as an item in the Mac OS X menu bar.

Lab Tick
Have you ever been annoyed by the fact that you could not turn on your PowerBook or MacBook Pro’s keyboard illumination in daylight? If so, here’s your solution. Lab Tick gives you total control over the backlit keyboard.

iComic Life
Only really started using this recently, but it’s a wonderful tool for quickly storyboarding scenario’s as Comic Strips. If you do choose to you this you might also want to download this set of stock images produced b Sun’s User Experience Team.

BatchResize’em all 1.1
A great little tool for quickly resizing a batch of images.

Dock DR
Wonderful little utility for customising your dock on Leopard.

There’s lots more which I’ll post up from time to time. If there’s one thing I do miss though, its Windows LiveWriter, which for offline blog editing was a wonderful tool and sadly isn’t available on the Mac. Instead I’m using Ecto which is good but nowhere near as simple to use or nice as LiveWriter was. Sad isn’t it? That’s honestly the only thing I miss … after spending the last few weeks developing on my Mac I don’t think I will ever go back to a Windows based machine.

MARC, RDF and FRBR

My colleague Rob has spent a couple of weeks putting together some his thoughts on how we can structure bibliographic data semantically using RDF. The paper was written as a submission for the Linked Data on the Web 2008 workshop @ WWW2008. For me what’s great about this paper is that it provides an insight into the excellent work Rob has been doing in trying to find relationships in MARC data, his work allowed us to do some pretty amazing things with Bibliographic Data last year and we are now, as an organisation, building upon that in some of our commercial applications. At it’s, core though, the issues that the paper covers are not specific to just MARC Data, in order to create linked data we need to start using Links and not just Literals. Whilst Rob was writing this paper, I was busy trying to take some data we had been given and import it into our platform, with Rob’s help I found that by applying the techniques that he describes in the paper and thinking about the granularity of the data itself and the linkages I could create by generating URI’s to represent identities an concepts rather than simply storing those elements of the data as Literal’s – I found that I could exploit relationships in the data that I would not have been able to otherwise.

If anyone has any thoughts or feedback on the paper please let us know.

Talis and Creative Commons launch new Open Data licence

Yesterday we, at Talis, announced some wonderful news – Talis has been working in partnership with the Science Commons project of Creative Commons and we are all pleased to announce the release  of the new Open Data Commons Public Domain Dedication and Licence.

As an organisation Talis have been interested in the licensing issues surrounding Open Data for quite some time now, we’ve been talking about Open Data at conferences and also writing about many of these issues. In 2006 we began this process by launching our own attempt at an Open Data licence called the Talis Community Licence – this helped to shape some of our initial thoughts. Earlier this year we even convened a special workshop on Open Data at the World Wide Web conference in Banff which helped us to understand the direction we wanted to move in and who we needed to work with to make this a reality.

This new licence represents a real milestone for us. For the Semantic Web to succeed there needs to be more data coming online marked up for linking and sharing in this web of data, hopefully the licence can serve as a tool that enables more of us to share and contribute data.

Talis gets some nice Semantic Web coverage

Last week Talis has had some interesting coverage on Read/Write Web, a popular semantic web blog. It started off with the article ‘Ten Semantic Web Apps to watch‘, by Richard MacManus. I enjoyed reading Richard’s article and it was interesting to see who else he thought was worth watching in this space. Richard seemed to capture the essence of some of what we are trying to achieve with our platform quite well …

They are a bit different from the other 9 companies profiled here, as Talis has released a platform and not [just] a single product. The Talis platform is kind of a mix between Web 2.0 and the Semantic Web, in that it enables developers to create apps that allow for sharing, remixing and re-using data. Talis believes that Open Data is a crucial component of the Web, yet there is also a need to license data in order to ensure its openness. Talis has developed its own content license, called the Talis Community License, and recently they funded some legal work around the Open Data Commons License

That’s exactly right, by building applications on our platform the data that these applications rely on is stored in a way that allows it be easily re-used and re-mixed. What’s more is that the platform does hide away some of the underlying complexities inherent in Semantic Web technologies by presenting developers with a easy to use, RESTful API that allow you to store, query and manage heterogeneous data. Whilst the platform will continue to grow and evolve it’s already matured to the point where we are building and deploying commercial applications on it i.e. Talis Engage. For me personally the last twelve months have been very exciting and challenging as we’ve seen the Platform mature to the point where we can do the very things we’ve been talking about for ages.

Soon after Richard’s article was posted, my colleague Paul Miller was interviewed by Read/Write Web’s Marshall Kilpatrick. Paul provided more of an insight into the platform and Talis and offered some of his own views on the future of the Semantic Web which is well worth reading.

And then yesterday Andreas Blumauer, of the Semantic Web Company, posted this nice little piece up on his blog.

Talis is a “domain-agnostic” technology platform which supports developers to build applications on the principles of “mass collaboration”. It is a new breed of a distributed programmatic interface heavily deploying all opportunities the Web of Data may offer …. Talis tries to establish a new way of organizing information flows throughout the Web of Data. Since it relies on open standard protocols like RESTful Web Services a lot of applications will use Talis technologies. Talis as a company has a well founded background since it has been provided services for governmental organizations or libraries for the last 30 years. Some of the people working at Talis rank among the best semantic web thinkers.

… Is it wrong to admit that reading that gave me a nice warm fuzzy feeling … ?

Benefits of Open Sourcing Code

Open Source Developers @ Google Speaker Series: Ben Collins-Sussman and Brian Fitzpatrick

What’s In It for Me? How Your Company Can Benefit from Open Sourcing Code

As the open source community continues to clamor for more companies to open source their code, more and more executives are asking themselves just what open source can do for their company. There are a number of ways for a company to open source an internal project: from tossing code over the wall on the one hand to running a fully open development project on the other to any combination of the two.

This talk will discuss the costs and benefits associated with each method as well as how to successfully launch your new open source project.

I really enjoyed this tech talk about the benefits of open sourcing code. and Brian briefly summarise what motivates people working on projects to make them open source:

  • a desire to create better software,
  • or to create a relationship with you users,
  • or in some cases it’s simply good PR,
  • or perhaps it’s simply goodwill on the part of some techies,
  • or it can be a way to get free labour .
  • or it can be a way to change or subvert an entire industry ( take over the world )

They also provide a useful set of criteria with which to measure the health of an open source project, you can do this by measuring the health of the community:

  • Lots of usage ( not users! )
  • A number of active developers
  • Constant improvements and releases
  • No community == dead software

I liked their descriptions of the various differnt approaches organisations take when open sourcing software, these range from the Fake Approach, where organisations rather cynically decide to Open Source their code but do so without using a license approved by the open source initiative, this is little more than a PR exercise and in real terms means the code isn’t really open source and it can alienate both users and developers.

The second approach is to Throw Code Over The Wall, this basically means you remove any names from the code files, you add the appropriate licenses, you tar the whole thing up, post it and then simply walk away. This generates PR and is relatively effortless but it still doesn’t create a community nor does it really attract real techies. You often find that organisations that no longer wish to continue maintaining some piece of software use this approach.

Then there is the Develop Internally, Post Externally. You have a public code repository, where you develop in house but allow the external world to see what your doing. This allows occasional volunteers to submit patches but really there’s no incentive for outsiders to get involved because your not really giving anyone outside the organisation a sense of ownership … this can lead to mistrust, and creates barriers..

Next you have the Open Monarchy, where there are public discussions, there is a public repository, but committers are mostly employees and occasionally individuals outside the organisations. However in this approach one organisation or one individual rules the project and makes all the key decisions. This approach has the benefit that it will garner more credibility from the technical community and you probably will find more volunteers stepping forward to participate in the open discussions your having and to sometimes contribute. But the reality is that this is virtually the same as the previous approach except the discussions are taking place in public and as thus people can participate even though the corporate agenda always wins.

Finally there is Consensus Based Development, in which almost everything is public, all decisions are based on a consensus of the committers – project is its own organisation it exists independently of any organisation. In order to join the community you have to earn your access in other words you have to earn commit privileges. The advantage of this approach is that you build a long term, sustainable community, with a passionate following of committed developers which invariably results in better software.

I found the talk to be extremely informative and it raised my awareness or certainly made me re-think what my definition of open source actually is. In fact this is all particularly relevant to me at the moment given that our development group at Talis is beginning to Open Source some of our software, and I’m wondering what the best way of doing this might actually be.

This is an excellent video to watch and it will challenge your definition of what constitutes an Open Source project.

SWIG-Uk Special Event: Alberto Reggiori & Andrea Marchesini – The BBC Content Aggregator for the Memoryshare Service

Alberto and Andrea presented some of the work they have been at Asemantics. Most notably they showed how they are using Semantic Web technologies on a developing a new generation of feed aggregators for the BBC’ MemoryShare service which is described as an archive of memories and events from around 1900 to today.

One of the key messages that Alberto tried to convey was around the adoption of RDF and the difficulties around trying to use it solve the various problems that are faced by the SemWeb community. In his opinion  RDF is …

  • Complex, because it tries to solve too many problems at once
  • Search is hard
  • Granularity management, Read/Write is hard
  • We currently have a poor software tool chain

He argues that the solution is to combine existing Web 2.0 technologies with RDF, and actually hide RDF, and instead present data in formats that are more widely accepted and entrenched, because customers don’t get The Semantic Web or RDF. I think Alberto got the biggest laugh of the day when he likened the adoption of RDF to the Resurrection and summarily pronounced on one slide that “RDF is Dead” only to have it resurrected three days later!

One of the things that Alberto and Andrea presented was some current work they are doing on an specifying and developing SPARQL to Objects (S2O) which a SPARQL Extension that maps RDF Graphs to JSON Objects. Whilst the output format seems pretty friendly I’m not convinced I like how it binds the semantic of the output to the semantics of the query – but guess there are some advantages to this approach.

I enjoyed Alberto’s talk and was able to spend a bit of time chatting to him during one of the breaks, he’s a passionate researcher with some interesting ideas. He recently did a podcast with my colleague Paul Miller as part of our Talking with Talis series, which you can listen to here.

SWIG-UK Special Event: Leigh Dodds, Facet Building Web Pages with SPARQL

You can view the slides from Leigh’s presentation here.

Leigh is CTO at Ingenta. They have built a web framework, called Facet,  for building web applications on top of RDF. In their opinion there was no good system for integrating RDF repositories with an existing web framework  in Java . Although the framework does have some limitations it seems to me that it is quite simple and perhaps even elegant.

It appears that by embracing some limitations in RDF Modelling Leigh has succeeded in building a framework that, on the face of it, provided a fairly flexible means of building web pages from an RDF Repository, and because of way it’s designed and built it lends itself to being integrated very easily into existing templating environments ( JSP, Velocity etc. ).

Leigh was asked several questions by the audience and his answers provided further insight

Question: how do you use this for searching when you get a list of results back?

Answer: Not using this for searching.

Which to me makes perfect sense each of the queries that are configured returns a sub graph, or lens that is effectively a view of data that you can pass to a templating engine for rendering.

Question: Is the schema annotation mechanism for a known data set rather than in general?

Answer: yes its application specific and configurable at application level.

Again I thought Leigh had made this clear during the presentation and therefore should have been obvious. Whilst some might consider this to a limitation, I wouldn’t necessarily view it as such.

Question: have you considered how your framework might work with Rich Clients, Ajax etc?

Answer: That’s why they support JSON output. Only currently doing basic AJAX lookups at the moment.

This is one of the features of the framework that does pique my interest, as we move more and more towards building richer client interfaces on the web there is an expectation that web frameworks and web services should support outputting data in JSON. At the moment our Platform doesn’t formally support this, but it is something we are definitely intending to do.

It makes sense to provide data back to the client in the format they need it rather than a fixed format that the application then has to process and convert. I’ve seen the problem when building desktop widgets, whilst XML is great and portable, most widget frameworks are based on ecmascript and understand JSON natively so wouldn’t it be nicer if web services would return JSON.

Anyway ldodds++ 🙂

Question: Will you open source it?

Answer: Hopefully, it will be, need to be dis-entangled but wanted to share the ideas here today so people can get a sense of the value.

I’m hoping that they do, I’d like to have a play around with the framework and possibly even extend it.

All in all I was actually pretty impressed with Leigh’s talk, I’ll be keeping an eye out for Facet.

SWIG-UK Special Event: Graham Klyne Building a Semantic Web accessible image publication repository

Graham begins by offering a little background information on why they want to be able to publish images using SemWeb technologies.

Previous approaches involved general purpose image databases based on conventional relational technology which was useful and worked but died due to licensing restrictions on the data.

With the semantic web technologies emerging a Semantic Image database was created using native RDF Storage. There was some success, but they had to develop all the heavy lifting, and fragility due to tightly coupled components. Graham commented on how this touched on Ian Davis’s talk and how useful a platform might be and how difficult it is to build one.

Their current approach is to based in Data Web Philosophy : the idea of linking available web data rather than creating new application stores. They based this on Southampton Uni’s EPrints. They use Jena and Joseki to provide a SPARQL endpoint for the metadata.

What does semantic web accessible mean?

Image metadata is accessible and queryable from multiple sources using SPARQL and images should be accessible using simple HTTP requests.

EPrints is an “OAI repository” which uses a common metadata ( e.g. Dublin Core ).

Graham goes on to talk about Global vs Local access – when accessing metadata from multiple repositories – they want to be able to get away from a single global coordinated index to more local and uncoordinated ones.

The problem with this is people will use different schemas, so over time connecting data together becomes and issue or concern. So they are looking at developing strategies to address this and over time perhaps a common schema might emerge. Currently there are two strategies for metadata conversion:

  • Meta Data Re-writing – which involves making an extra copy of the data.
  • Query Re-writing – instead of changing or copying data they change the query.

Graham believes a combination of the two is required. He goes onto describe an overview of their implementation – collect data using OAI and then use Joseki. They have had to modify the EPrints software as well as modify the database to accommodate domain metadata. What they have though is not a generic solution but they have achieved the creation of a platform ( not a semantic web platform ) within 6 weeks which is different to their previous experiences.

They are currently looking at implementing user interfaces – so they are looking at tools that can do faceting, and have a done some experimentation with JSPace and have implemented a facetted browser that uses a Joseki endpoint. They are looking at trying to use mspace, from Southampton, but haven’t been able to get a hold of the software yet. Graham goes onto show some screen shots of their User Interface.

Lessons learned :

  • Available tools do support Semantic Web accessibility, Joseki has been key to their progress.
  • Creating effective user interfaces is not easy.
  • Importance of loose coupling.

Wish list:

  • Sparql Update support to Joseki, will facilitate doing meta data updates from external hosts.
  • More generic web data publishing tool ( eg. METS )
  • Query Distribution
  • Merging data from multiple uncoordinated sources. ( FlyWeb )
  • Improving user interface will be an ongoing task

My Thoughts

What strikes me is that their requirements aren’t a million miles away at all from some of the basic services that the Platform provides. I’d be curious to see what they might be able to achieve if they had a Platform store which combines metadata and content together, or whether the same problem could be solved differently on the platform.

Questions: Can you give us some indication of the user tasks your trying to support? Graham describes the process scientists currently go through. From the lengthy description he provides the Interaction Designer in me wonders with taking a UCD design approach would be better to evolving an interface that would support them.

SWIG-UK Special Event: Ian Davis on the Talis Platform

You can view the slides for Ian’s presentation here:

Ian begins by describing the platform as a multi-tenant database with a REST based API. There are pools of content and metadata called Stores, which you can add content to and search and retrieve data and binaries from.

We want to bring the platform to as many developers as possible.

We use REST but also adopt existing protocols such as RSS this is so that we can re-use data formats and protocols where they exist, create and document where not. Any data stored in the platform is still your data.

Ian describes the API next, he talks about how you can use the API to

  • Add Content to a store using POST ( http://api.talis.com/stores/mystore/items )
  • Search Content in a store using GET ( http://api.talis.com/stores/mystore/items )
  • Adding Metadata POST RDF/XML to add RDF In Bulk ( http://api.talis.com/stores/mystore/meta ), you can also POST Change Sets which are lists of reified triples with a common subject.
  • Search Metadata using SPARQL ( http://api.talis.com/stores/mystore/services/sparql? ) this is limited to searching the metabox for a given store. Each store has a multisparql service to search multiple graphs.
  • Augmentation ( http://api.talis.com/stores/mystore/services/augment ) supply an RSS feed and augment it with additional triples. In other words take a search from one store and chain it with augmentation from another.
  • Faceting  ( http://api.talis.com/stores/mystore/services/facet ) uses indexed metadata to build facets for search terms.
  • OAI ( http://api.talis.com/stores/mystore/services/oai-pmh ) standard archiving and harvesting protocol,.
  • Snapshots – Can programmatically request a snapshot of your store. Produces a tar file accessible by HTTP, which contains all items from content box, all rdf etc.
  • Security – Coarse gained capability model, uses authentication via HTTP digest, with URI based identities.

Ian then goes onto talk about some of our future plans:

  • Relevance ranking for RDF – use relations between resources to influence ranking, as well as discover resources based on text search of their associated resources.
  • Personalisation and recommendation services – resources that are similar to X tend to have y, trails and suggestions based on usage.

Ian describes the architecture of the platform and some of technologies that it is built upon , for example Jena. Ian also talks about our goals in terms of scaling and resilience, our aim for zero downtime

Ian goes onto describe Marvin which a development project we are working on to deal with parallel data processing., the idea being that all content submitted to platform is processed in parallel.

Ian also talks about Majat, which is another development research project to that looks at Distributed storage and search .

Ian then goes on to show some examples of how the platform is currently being used by showing some of the applications we have built.

  • Talis Engage – a community information application that uses SKOS, SIOC and FOAF
  • Talis Prism – Library catalogue search
  • Project Zephyr – Academic resource/reading list management. Ian Also demo’d our relationship  browser which is embedded in Zephyr and allows users to explore data in the platform.

Question and Answers

Question: What SemWeb capabilities are customers warming to? It’s still early days.

Question: Are you doing reasoning in the platform.? Not yet.

Question: How much risk is involved in exposing SPARQL Service? Some risk, someone could write a horrible SPARQL query.

Question: Would you consider releasing this as a product and not a service? No, we are offering the platform as SaaS

Question: Can you categorise the kinds of apps this is best suited for? Any applications that are information rich.

Semantic Web Interest Group – Special UK Event

I was fortunate enough to attend Friday’s SWIG-UK Special Event hosted at HP Research Labs in Bristol. It was a wonderful day full of some very interesting talks from a pretty diverse range of speakers talking about how they are using SemWeb technologies to solve problems. Naturally we Talisians were there talking about The Talis Platform, what it is and showing some of the commercial applications we have built upon it. The work we are doing at Talis and the progress we have made in the development of our platform was received very well, which I have to say was a great feeling.

The day was also about meeting and making contacts amongst the SWIG community and from that point of view the day was a great success I got the chance to meet some very interesting individuals who are working on some amazing projects. I got the distinct impression that there was certainly a great deal of potential in the idea of letting some of these individuals try out their ideas on our Platform and that’s something that I am really excited about.

I have made notes on a number of the presentations from Friday which I will post up over the next couple of days.