Open data and the news

By Leslie Young

It’s generally accepted among journalists that in Canada, it’s very hard to get an interesting news story out of the data that governments make available for download on their various “Open Data” sites.

Quite simply, it’s usually very dry material.

Locations of public drinking fountains and city trees might be nice for app developers, but rarely makes for a story that has any impact or tell us something we’d like to know.

So it’s always refreshing to see a fun data set on an open data government site. I recently came across sales data for B.C. Liquor Stores on the Data BC site. Even better, it was broken down by region and drink type. Immediately, I thought that this would make a fun map that people might like to play with.

I downloaded the data set and went looking for population data and geography.

Luckily, this was B.C., so the excellent B.C. Stats site became my source. In my experience, B.C. has in this website the best and easiest-to-navigate source for basic demographic and geographic data in the country. It is much, much easier to find regional populations and boundary shapefiles for B.C. than for Ontario, for example.

On the B.C. Stats site, I was able to find population estimates for those aged 19 and up (I’m assuming everyone’s obeying the law here) as well as the geography files. So I put together a map showing average litres of alcohol purchased per person. Fun!

This is the second map I’ve done using B.C.’s open data. (The first is here.) I really hope other governments take note of the work Data BC is doing, and that Data BC continues to add interesting datasets to its collection.

Tags: maps open data

Open data: Transport Canada vehicle recalls

by Leslie Young

There’s a new addition to the federal government’s data.gc.ca portal: vehicle recalls from Transport Canada.

This is a dataset I had filed an Access to Information request for a while ago (see the interactive app I built with it here) and after the story was published, I had contacted the Treasury Board’s data.gc.ca people to let them know that this data should be publicly available for download. They replied, saying that they agreed.

And now it’s up. There are two sets: one for all recalls, and one for those within the last sixty days. The former is updated monthly, the latter daily.

We had already uploaded the dataset to our own Open Data page, but it’s good to have updated versions coming out.

It’s certainly useful stuff and just the kind of thing that belongs on an open data site.

Tell the government about your apps!

By Leslie Young

I blogged recently about an online application I built using Transport Canada’s Road Safety Recalls Database. The underlying data for my story is already available online, but was not easily available for download.

I called their media department to try to obtain a copy of the database that was powering their already public recall search tool, but was told that this was impossible. I ended up filing an ATIP request for the information. This is really overkill when the information is already public, just not in a convenient format… but it seemed the only way to get a copy of the database. 

After we released our story on vehicle recalls, I received a tweet suggesting I submit my app to the federal government’s open data portal. They have a form that lets you tell about how you used data.gc.ca data. Here is what I wrote:

Application Name : Interactive: Vehicle recalls

URL : http://www.globalnews.ca/Pages/storyFullWidth.aspx?id=6442664891

Description : This is an interactive web application built using Transport Canada’s Road Safety Recalls Database. It allows users to look up and compare recall information on different vehicles. This data was not on data.gc.ca. Although a searchable version of the database is on Transport Canada’s website, I was required to file an ATIP request for a copy of the data. I would argue that this data should be made available on data.gc.ca, as it would allow this kind of application to be constantly updated with the latest recall information, rather than just be a snapshot. There are many similar databases on GoC websites, which are public and searchable, but do not allow users to download the raw data. These should be added to the open data site.

 The next day, I received this response:

Good afternoon,

Thank you for contacting the Government of Canada’s Open Data Portal.

We would like to thank you for bringing this application to our attention. We agree that this data, as well as its application, should be made available on the Open Data Portal.

After receiving your email this morning, we contacted Transport Canada to suggest that they provide the raw data on the portal. We anticipate that this dataset will be available on the Government of Canada’s Open Data Portal in the future.

Thank you,

Open Data

Government of Canada

Based on this experience, I would suggest to people that they write the Open Data site and share their apps, and even suggest new datasets. It seems they read their email, and maybe we’ll start seeing useful data appearing on the site.

In the meantime, you can download my copy of the vehicle recalls data here.

Freedom of Information and the PDF

By Leslie Young

PDFs are a fact of life in data journalism. Most of the time, when you request “an electronic file” or “an electronic document” in your Freedom of Information request, the end result is a PDF.

While it’s a step up from simply getting a stack of papers as your response, a PDF response is annoying in a number of ways. It’s hard to work with. Unlike a spreadsheet format, (Excel, .CSV, etc.) you can’t analyze a PDF. You often can’t copy and paste or export the data – sometimes Acrobat won’t even recognize the document as text!

Some departments do respond with an Excel file. Some are even nice enough, when I ask, to email me the Excel version of the PDF response that they had previously sent me. They don’t have to do this, and I really appreciate it.

But it seems like the default electronic file format is the PDF, which means that I will spend hours trying to force the information into a friendlier format. It doesn’t stop me from doing the story, it just makes it more difficult.

So why do many departments seem to favour the PDF? I decided to ask the Treasury Board Secretariat, the body charged with administering the federal government’s Access to Information legislation.

Here’s what they said.

Question:

I would like to know why, when a requester requests an electronic document, the response is usually provided as a PDF.

Why do departments seem to prefer releasing information as PDFs instead of a more open electronic file format, such as an Excel spreadsheet? This is particularly relevant in the case of a request for information from a database, which since it’s a table filled with numbers, would be more useful to a journalist in an Excel or other format.

Answer:

Our government is committed to openness and transparency which is why we are pursuing the Open Government initiative that will continue to make government data freely available, and currently requires all completed ATI summaries to be posted online within 30 calendar days of being readied.  Current Access to Information regulations direct departments to provide information in the format requested wherever possible, and our government continues to update and add to the already hundreds of thousands of data sets and the amount of information available to Canadians online in various formats.  Where alternative formats are not available or suitable, the government will respond with a pdf version in order to ensure that requests for information are still carried out effectively.

So it seems all you need to do is ask – very specifically. It’s a valuable lesson. Next time, I will make sure to ask for a .CSV, and see what happens.

B.C. classroom sizes

by Leslie Young

We recently published a map of the classroom sizes at every public school in B.C.

To me, the interesting thing wasn’t just the map itself, which would be useful information for a local parent, but the fact that all the information for this map is open data.

I follow DataBC, the provincial government’s open data office on Twitter. I spotted a tweet that they had just added several education datasets onto their site.

The timing is interesting - the information was posted as the provincial government is in the middle of a dispute with the teachers’ union over various issues, including class sizes.

With a bit of digging through the DataBC catalogue, I found classroom sizes for 2010 as well as a geography file for all the school locations.

Bingo - the ingredients for an easy Fusion Tables map.

This is the first time I’ve created a map using only open data. Most of the time, government datasets do not really have much of an obvious news application. The geographic files are generally good, but the information released tends to be bland.

I’m hopeful that this will change.

Tags: maps open data BC