SemStats 2017 Call for Challenge

5th International Workshop on Semantic Statistics co-located with the 16th International Semantic Web Conference (ISWC 2017)
Vienna, Austria

Important dates

  • Submission deadline: September 15th, 2017, 23:59PM Hawaii time
  • Notifications to authors: September 22nd 2017, 23:59PM Hawaii time

Challenge data

The SemStats 2017 data challenge offers different possibilities:

  • Results of the French Census
  • The French business register (Sirene)
  • International classifications
  • Open track

These options are detailed below.

Census track

Since 2004, the French population census has been based on annual surveys, with total national coverage achieved in five-year cycles. The goals of this new method are to produce results more regularly and to distribute the workload over time better than with the previous exhaustive census procedure (see details on Insee's web site). This new methodology also implies that, although new population figures are produced every year, valid temporal comparisons can only be made between results separated by 5 years.

The legal populations are amongst the most important data published from the census. Insee has been publishing the legal populations as RDF for five years now, so that a full 5-year cycle is available. All the data are available as Turtle on Insee's RDF site (reference years 2010, 2011, 2012, 2013, 2014).

For this challenge, the submissions are expected to realize original and statistically meaningful uses of the legal population RDF data.

Sirene track

Sirene is the official database of French enterprises (legal units) and establishments (local units). Created and managed by Insee since 1973, this database now includes 10 million units and is continuously updated. Since the beginning of 2017, Sirene is available as open data.

The register can be downloaded following the links on the Sirene web site. For example, the complete file dated July 1st, 2017 can be directly downloaded from A detailed documentation and a FAQ are also available.

This challenge consists in proposing a RDF modelization for Sirene data. The reuse of existing vocabularies is encouraged. The submissions will be evaluated on the relevance of the proposed model and on the clarity of its documentation.

Classifications track

This challenge track was proposed form SemStats 2016. The data made available as RDF include the central economic classifications published by the UN and Eurostat, as well as some national classifications that are articulated with the central classifications. Previous versions of the central classifications are also included.

More detailed information is available here.

Suggested use cases for the classification data include: checking coherenceand quality, link to other metadata sets, propose visualization tools, etc.

Open track

An open track is also available for the SemStats challenge. For this track, submissions can be based on any publicly available data sets and propose any uses relevant for the SemStats workshop. Novelty and sound methodology will be considered for evaluating the submissions in this track.


See the main Call for Contributions page for details on the challenge submissions and on the submission process and deadline. Challenge articles should be no longer than 12 pages. The challenge data can also be used as a basis for the realization of application and demo articles (up to 6 pages). Remember that these two categories of submissions are awarded specific prizes!

September 15th, 2017
September 22nd, 2017

If you are interested in submitting a contribution but would like more preliminary information, please contact