Tim Berners-Lee: The next Web of open, linked data

20 years ago, Tim Berners-Lee invented the World Wide Web. For his next project, he’s building a web for open, linked data that could do for numbers what the Web did for words, pictures, video: unlock our data and reframe the way we use it together.

This is an inspiring talk by Tim that goes to the heart of the work that we are doing at Talis with our Platform and the new generation of products we are building on the platform, such as Talis Aspire and Talis Prism.

    "Data is relationships!"
    
    "The really important thing about data is that the more  things
       that you have to connect together the more powerful it is."       

A wonderfully simple and succinct way of describing the importance of Linked Data. It’s a great talk and well worth watching.

2009 an update …

During a conversation with a certain cat loving friend of mine earlier on in the week it was suggested to me that its been a while since I’ve blogged anything other than short pieces highlighting bits of news or content out there on the web. She’s absolutely right. So here’s an update on 2009 so far, and what the next few months hold …

The last three or four months have been particularly busy for me. I took a long holiday before Christmas in order to give myself time to reflect on everything that happened in 2008. As far as year’s go it pretty much sucked! I had a lot of personal stuff to deal with most of it around coping, or failing to cope, with the deaths of a number of people who were close to me – including my father. I’m pretty good at burying myself in work as a way of not having to deal with other things unfortunately that only works for so long, in fact I’m surprised I actually got to the end of the year before finally accepting that things were broken inside. I’m lucky though, I have a lot people around me who keep an eye on me, and care enough to give me a kick when I need it – and they did. I was convinced to switch off from work and everything related to work and focus on dealing with the things that I knew I needed to.My hiatus over christmas was spent with my family trying to understand everything that had happened last year which inevitably meant finally accepting that I needed to grieve.

I sometimes hold it half a sin
To put in words the grief I feel;
For words, like Nature, half reveal
And half conceal the Soul within.

     from In Memorium, Alfred Lord Tennyson

So I wanted to firmly place the events of 2008 in the past and move forwards again. 2008 was painful and difficult yet I also enjoyed a number of personal and professional successes. In 2009 I want build on those successes, and leave the past firmly where it is.

In terms of my personal life there are already some big changes I’m in making but I’ll leave discussing that for another day, suffice to say that I think I’m happier now than I have been in years :).

Professionally I was appointed Head of Development for our Xiphos Division at Talis. I’m still trying to settle into the role which brings its own challenges πŸ™‚ . However leading up to christmas our division had successfully entered into piloting a new product called Aspire at Plymouth University and since then it’s also been deployed as part of a wider pilot at Sussex University. Functionally Aspire is a resource lists product that helps lecturers and students make best use of the educational material for their courses. Technically Aspire is a Linked Data application built directly on top of our Talis Platform, a platform that provides the infrastructure for building Semantic Web applications. I’m loving the work, its technically very challenging there’s so many different things that need to be considered. Lots of people and organisations are talking about the semantic web but there’s only a relatively small number of organisations that are actually building real world products and solutions using these technologies – products and solutions that are actually targeted at end users – for me this is primarily why the work is both exciting and hugely rewarding.

Building Aspire is forcing us to innovate and explore ideas and possibilities that we might not have otherwise considered. A case in point is the way in which we have embedded RDFa into the our list page and our editing tool manipulates this model directly within the HTML DOM simplifying the process. This is discussed in a W3C Case Study, and was commented upon by Ivan Hermann last month. Much of the work we are doing at the moment is around adding more features to Aspire during our beta phase. Whilst part of this will be around specific features aimed at users, we are also looking at linking to other data sets and exploring what we else can do within this ecosystem of rich semantic data.

Finally, I mentioned on several occasions last year about work I was doing in my own time around building a tool that aided in visualising and exploring the socio-semantic web. That work got shelved towards the end of 2008 largely because I couldn’t focus on it with everything else that was going on. However a new year, a new beginning means that project now has a new lease of life … and it finally has a name: Omnius. It’s named after a thinking machine from the Legends of Dune series which I was re-reading around the time I was thinking about a name for this project. I had actually wanted to call it Erasmus but that name had already been taken on google code ( and erasmus_browser sounded sucky! ). I’ve created home for this project on google code, I’ll be adding more information very soon. However please remember I am only working on this during my spare time, for me its both a hobby and an interesting technical (and UX) diversion … it hasn’t yet turned into an obsession so the rate at which I’ll be adding to it is limited in terms of the time I’m able to devote to it πŸ™‚

… so watch this space.

Linked Data and Scientific Publishing

Tim Berners-Lee gave a talk at TED2009 on Linked Data. The slides for the talk can be found here. TED have not made this talk available for viewing yet. However part of the focus of the talk was around linking raw scientific data with existing linked data sets already out there. I’ll post a link to the talk when and if it becomes available.

What is good to see though is how closely this aligns to the work that our Xiphos division is doing and the ideas that we are experimenting with.

Is LinkedData really more important than the Large Hadron Collider?

I’ve just read Daniel‘s recent post entitled Linked Data is more important than the Large Hadron Collider. Like Daniel I am also a passionate advocate of Linked Data and am currently working on deploying number of real world Linked Data applications along with my colleagues at Talis. Sadly though I have to confess that I found myself cringing whilst reading his piece.

Like many other scientific endeavors the Large Hadron Collider project attempts to provide scientists with huge quantities of data that might help them answer questions about the origin of the universe. As a project in it’s own right it is massive, combining the efforts of thousands of scientists from around the world.

To dismiss it, as Daniel has done, because it’s “too expensive”, or because “it wont find the cure to cancer, or HIV”, or question its relevance because “we’re still going to be here whether or not the Large Hadron Collider was successful”, is bad enough but to then use those rather specious arguments as a prop to advocate Linked Data is absolutely ridiculous.

Worse is that it overlooks the rather obvious rebuttal which is that Linked Data wont cure cancer, it wont cure HIV, and we’ll all still be here whether we have Linked Data or not :-). Even more importantly though … should anyone in our Community and by that I mean the Linked Data community really be questioning the value of any project that’s sole purpose it generate data? To then say this …

Just imagine a world where you can easily browse through the history of the atom, and then delve into the science found on the atom, and then go deeper into the subatomic level, and then browse back out into the historic realm, finding out about experiments that happened and whether it had any impact on society.

… completely misses the following point: the data to do this exists, not because of you and I Daniel, but because of the fact that since man appeared on this planet his thirst for knowledge is what has driven him forward to the point where people like you and I can sit here and say … “if you format your data like this, and give everything a dereferncible uri – that’ll be really useful!”. I’m serious … Linked Data is not a radical technology change, nor is the Semantic Web, both represent a paradigm shift, a new understanding, a new way of doing things but the fact is that the technology has been around for ages, we are only now understanding the importance of being more open, of having common vocabularies to describe things, and the importance of linking concepts together in this web of data.

The absolute last thing we want to do is to start saying to scientists, not matter how obscure ther field of research is, or how relevant we consider that research to be (personally), that it’s somehow less important than what we, as a community, are doing … because it absolutely isn’t. Are you really sure you want to be asking people to believe that answers about the origin of the universe and our existence in it are less important than an “interesting browsing experience?”

Curating the Dark Data in the long tail of science


ABSTRACT

There is a wealth of scientific data that is almost impossible to see. This is science’s dark data. Much of this data resides in the long tail of science or “small” data collection efforts. Instrumentation has made it possible to develop large collections of relatively homogeneous data, be it from space sensors or high throughput gene sequencers. The monolithic collections are easy to find and search. Dark data on the other hand may constitute the larger mass of scientific information. The collections that make up the dark data of science are much smaller but also much more numerous, being generated by thousands of scientists, on a much broader number of scientific questions, and in a complex array of formats. Unfortunately, it is also more prone to be overlooked and lost over time. Using new technology, the economics of the internet, and change in the sociology of science it is possible to make greater use of this data than was possible in the past. Data curators are the people who develop and use these technologies and procedures to make this data more useful, insuring a more efficient return on investment in the enterprise of science.

This is a really interesting tech talk given by P. Bryan Heidorn from the National Science Foundation Division of Biological Infrastructure and Associate Professor, University of Illinois.

I found the talk to be particularly useful, I’ve never come across the term Digital Curation before, and surprised to learn that it is defined as:

Digital curtaion is the acquisition, management, appraisal, and serving 
of data to maximise it's usefulness.

Curation embraces and goes beyond that of enhanced present day
re-use, and of archival responsibility, to embrace stewardship that adds
value through the provision of context and linkage: placing emphasis
on publishing data in ways that ease re-use and promoting accountability
and integration. (Rusbridge et. al, 2005)

What surprises me is that the goals of these curators are not too dissimilar to the goals of those of us working in the Linked Open Data movement, and I’m wondering whether these two communities should work more closely together … very interesting indeed.

Bibliographic Ontology 1.0 released

After months of development the first version of Bibliographic Ontology was published today. This represents an incredibly important milestone for this project, it’s been discussed, developed and evolved over a number of months in order to make sure that this ontology was expressive enough to handle all kind of scenarios for all kind of bibliographic projects. It’s been particularly relevant to us and some of the work we are trying which I’ll be commenting on over the next couple of weeks.

BBC Opening Up

The BBC is opening up and making its data accessible to development teams outside the beeb – they are also following the Linked Data approach …

We have been following the Linked Data approach – namely thinking of URIs as more than just locations for documents. Instead using them to identify anything, from a particular person to a particular programme. These resources in-turn have representations, which can be machine-processable (through the use of RDF, Microformats, RDFa, etc.), and these representations can hold links towards further web resources, allowing agents to jump from one dataset to another.

They have designed and published a simple but versatile ontology for describing Programme data which can be accessed here.