SemStats 2013 Call for Challenge

Authors: Franck Cotton¹; Richard Cyganiak²; Armin Haller³; Alistair Hamilton⁴; Raphaël Troncy⁵

¹INSEE, France
²DERI, Ireland
³ANU, Australia
⁴ABS, Australia
⁵EURECOM, France

Document ID: http://semstats.org/2013/call-for-challenge

Published: 2013-04-28

Modified: 2013-07-03

License: CC BY 4.0

Keywords

Event: 1st International Workshop on Semantic Statistics co-located with 12th International Semantic Web Conference (ISWC 2013)
Location: Sydney, Australia
Date: October 11, 2013

Abstract

A challenge is organized in the context of the SemStats 2013 workshop. Participants are invited to apply statistical techniques and semantic web technologies to census data describing the Australian and French populations by geographic zone, age group, sex and current activity status.

The deadline for participants to submit their short papers is September 6. For any questions on the challenge, please contact semstats2013@easychair.org.

Note: It is strongly suggested to all challenge participants to send contact informations to semstats2013@easychair.org in order to be kept informed in case of any changes in the data provided.

Introduction

A challenge is organized in the context of the SemStats 2013 workshop (see SemStats 2013 Call for Papers). Participants are invited to apply statistical techniques and semantic web technologies to a collection of datasets provided by the organizers.

The data provided are census data from Australia and France, describing the population by geographic zone, age group, sex and current activity status.

The datasets and complete data structure definitions are delivered in Data Cube format.

Important dates:

Paper submission deadline: September 6
Short list annoucement: September 23

What is Expected?

The participants will provide a short paper describing the usage made of the data sets and the main results or findings obtained. This paper should conform to the rules applicable to all papers submitted for the SemStats 2013 workshop (see SemStats 2013 Call for Papers). Please select the « Challenge papers » category (in the « Title, Abstract and Other Information » section) when submitting your paper.

Links to other material (visualizations, numeric results, software packages, etc.) can also be provided to illustrate or complement the paper.

Entries should focus on demonstrating new potentialities allowed by semantic webs models and technologies; this can be done for example by linking two or more datasets provided in the challenge together (eg international comparisons) and/or linking one or more challenge datasets to other data sources.

While it is permissible to link challenge datasets to data previously analysed by the author, entries will only qualify for the short list if at least one dataset from the challenge makes a central contribution to the demonstration.

Evaluation process

All entries will be evaluated by the same panel comprising professional statisticians as well as experts in semantic web technologies. The evaluation criteria are detailed below.

An initial assessment based on short paper submissions will decide which entries should be shortlisted to be presented during the workshop. A second assessment will be based on the presentation made during the workshop and select the challenge winner. It is possible for two or more entries to be nominated as joint winners of the prize if their scores are tied.

The following criteria will be applied, both for the paper selection and during the workshop presentation.

Demonstration of practical added value from applying semantic technologies to use of statistics: Entries should provide a compelling practical illustration of the value that can be realized from applying semantic technologies to statistics. It is sufficient for entries to simply clearly and convincingly demonstrate the potential of the approach, it is not essential that the specific statistical conclusions which can be drawn from the work deliver new and profound insights and correlations.; This criterion will have a 50% weight in the evaluation.
Originality/Innovation: Entries should aim to provide fresh insight into how semantic technologies can be applied to data and the value added.; Entries which simply apply well established techniques and/or demonstrate commonly reached conclusions, but in regard to the challenge data rather than some other data, will not score highly on this criterion.; This criterion is not intended to discourage well-established techniques being applied. The aim, however, should be to demonstrate some new and specific insight through applying them.; This criterion will have a 25% weight in the evaluation.
Presentation of results: Results should be presented in a professional and engaging manner. Innovative approaches to presenting or illustrating results are welcomed, as long as they engage and educate the audience. The presentation should be fit for the audience, which will include professional statisticians in addition to experts in semantic technologies.

Data

Australia

The 2011 Census data being contributed to the challenge was extracted in SDMX-ML format via the publicly accessible TableBuilder application which is made available by the Australian Bureau of Statistics (ABS). The content was then transformed, using an experimental process, to the RDF Data Cube Vocabulary format.

Three datasets are provided. Each dataset provides population counts by Sex by Age (grouped) by Labour Force Status (grouped) for different levels of Geography

State/Territories [9 regions]
Statistical Area Level 4 (SA4) [106 regions]
Statistical Area Level 3 (SA3) [351 regions]

Australia’s statistical geography is explained fully in the Australian Statistical Geography Standard (ASGS).

Files containing the data structure definition and the code lists are provided together with the data files in this archive. All use Turtle notation. A detailed documentation is available here.

France

In connexion with the SemStats 2013 challenge, the French National Statistical Institute (Insee) publishes in RDF a selection of statistical results from the Population Census. These results concern population estimates as of January 1, 2010 by fine-grained geographic areas, age, sex and activity status. The data will be structured according to the Data Cube vocabulary and available with their data structure definitions and codes lists on Insee’s web site as 7z-zipped Turtle files.

The data and associated documentation are available from this page.