Pezholio

21 Oct, 2009

Adventures in SPARQL

Posted by: Pez In: Uncategorized

In case you’ve been living under a rock for the past month or so (or actually have a life), then you’ll have noticed that the cabinet office have started to release a big load of open data to a select few developers to have a play.

Much of this data is still in very ‘flat’ formats (such as Excel docs and the like), but some data is being released in RDFa format, the preferred format of the semantic web.

The most interesting of these datasets has been the education one, giving me (and other developers) the ability to get information about schools in an open format. I’ve been playing with this data for a bit, and now (I think) I might have something useful.

The key way of interacting with it is using a language called SPARQL, which is very similar to SQL, but with the web of data in mind. An example query for the education dataset is as follows:


prefix sch-ont: <http://education.data.gov.uk/ontology/school#>
SELECT ?name WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:districtAdministrative
<http://education.data.gov.uk/placeholder-id/administrativeDistrict/Lichfield> ;
}
ORDER BY ?name

This gets the names of all the educational establishments (schools, nurseries etc) in Lichfield District. You can give it a try yourself on the public facing SPARQL endpoint.

What I wanted to do though, was to get all the primary or secondary schools within a 2 mile radius of a postcode. This was a little harder, but perfectly doable.

As well as general details (such as address, name of school etc) and more detailed information (such as capacity, inspection details etc), the dataset also has the easting and northing of the school. Therefore, to get all Primary Schools within a 2 mile radius of a point with an easting of 411021 and a northing of 307291, I needed to do the following query:


prefix sch-ont: <http://education.data.gov.uk/ontology/school#>
SELECT ?name ?address1 ?address2 ?postcode ?town ?easting ?northing ?reference WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:easting ?easting ;
sch-ont:northing ?northing ;
sch-ont:uniqueReferenceNumber ?reference ;
sch-ont:phaseOfEducation <http://education.data.gov.uk/ontology/school#PhaseOfEducation_Primary> .
FILTER (?easting < 414239 && ?easting > 411021 && ?northing < 310509
&& ?northing > 307291)
}

This gets the name, easting, northing and reference number of every school within a 2 mile (3201-ish metre) box, with the original easting and northing as the midpoint.

I’ve given it a test here (a bit of naughtiness is required using the Google Maps API to get the latitude and longitude of the postcode, but it’s the only way at the moment!), so please feel free to try it out.

Once you submit the form you’ll get a list of all the primary and secondary schools in that area, as well as a link to more info on the Edubase site.

I’d be interested to know your thoughts, so please feel free to comment! :)

There’s also a few more example queries on the Talis Blog, so if you’re that way inclined I heartily recommend having a play.

7 Responses to "Adventures in SPARQL"

1 | Andrew Beeken

October 21st, 2009 at 9:46 pm

Avatar

That’s pretty funky there, Mr Harrison! P’rhaps a demo of the technique on Friday ;)

2 | Matthew

October 22nd, 2009 at 9:47 am

Avatar

Your SPARQL box query has the example point you give as the bottom-left corner, not the midpoint. I don’t know SPARQL, but is there not a way of doing a bit of Pythagoras:
(411021 – ?easting)^2 + (307291 – ?northing)^2 < 3218^2
as that would then really be "schools within 2 miles" rather than a box.

3 | Pez

October 22nd, 2009 at 10:13 am

Avatar

Ah nice! I did knock the query up in a bit of a hurry, so probably messed it up!

I like the Pythagoras method, although I did think of a potential problem for small villages and the like, which might not have schools (particularly secondary schools) within two miles.

The method I’d really need to do is use a formula to calculate the distance and then order by distance and limit the results. I’ve done something similar with lat longs and the haversine formula in SQL , so I’m sure it’s possible with SPARQL. Need to get my thinking cap on!

5 | Pez

October 22nd, 2009 at 5:03 pm

Avatar

Based on these comments (and a bit of help from the government open data mailing list) I’ve made a few changes to the query – this query below now gets the 5 closest primary schools regardless of distance.


prefix sch-ont: <http://education.data.gov.uk/ontology/school#>
SELECT ?name ?address1 ?address2 ?postcode ?town ?easting ?northing ?reference WHERE {
?school a sch-ont:School;
sch-ont:establishmentName ?name;
sch-ont:easting ?easting ;
sch-ont:northing ?northing ;
sch-ont:uniqueReferenceNumber ?reference ;
sch-ont:phaseOfEducation <http://education.data.gov.uk/ontology/school#PhaseOfEducation_Primary> .
}
ORDER BY ASC(
((411021 - ?easting)* ( 411021 - ?easting))
+ ((307291 - ?northing)* ( 307291 - ?northing))
)
LIMIT 5
OFFSET 0

6 | Nodalities » Blog Archive » data.gov.uk and the Talis Platform

November 17th, 2009 at 6:19 pm

Avatar

[...] traffic measurements can be queried in interesting ways. Its been exciting to see people begin to pick up the technology and creating reporting tools to explore the data, but also fantastic to be able to easily view data [...]

7 | Pezholio » Blog Archive » Adventures in SPARQL Part 2 – Now with added KML!

November 23rd, 2009 at 5:18 pm

Avatar

[...] been a few weeks now since I posted my first foray into the Edubase dataset, and since then, there’s been a few changes to the dataset, so I thought I’d give it [...]

Comment Form

Twitter / @pezholio



Desperate monetisation attempt…