SemStats 2020 Call for Contributions

Authors: Sarven Capadisli¹; Franck Cotton²; Armin Haller³; Evangelos Kalampokis⁴; Raphaël Troncy⁵

¹TIB Leibniz Information Centre for Science and Technology, Germany
²INSEE, France
³ANU, Australia
⁴University of Macedonia, Greece
⁵EURECOM, France

Document ID: http://semstats.org/2020/call-for-contributions

Published: 2020-02-16

Modified: 2020-02-16

License: CC BY 4.0

Keywords

Event: 8th International Workshop on Semantic Statistics co-located with the 19th International Semantic Web Conference (ISWC 2020)
Location: Athens, Greece
Date: November 2, 2020 or November 3, 2020

Important dates

Contributions deadline: June 26, 2020, 23:59 Hawaii time
Notifications to authors: July 17, 2020, 23:59 Hawaii time
Camera-ready version: September07, 2020, 23:59 Hawaii time

Workshop Summary

The goal of this workshop is to explore and strengthen the relationship between the Semantic Web and statistical communities, to provide better access to the data and metadata held by statistical offices. It focuses on ways in which statisticians can use Semantic Web technologies and standards in order to formalize, publish, document and link their data and metadata, and also on how statistical methods can be applied on linked data. This is the eigth workshop in a series that started at the International Semantic Web Conference in 2013 (SemStats 2013) and runs since every year at ISWC (2014, 2015, 2016, 2017, 2018 and 2019).

The statistical community shows more and more interest in the Semantic Web. In particular, initiatives have been launched to develop semantic vocabularies representing statistical classifications, discovery metadata, business models, etc. Tools have been created by statistical organizations to support the publication of dimensional data conforming to the Data Cube W3C Recommendation. But statisticians still see challenges in the Semantic Web: how can data and concepts be linked in a statistically rigorous fashion? How can we avoid fuzzy semantics leading to wrong analyses? How can we preserve data confidentiality? How can we use linked statistical data in machine learning models?

The workshop will also cover the question of how to apply statistical methods or treatments to linked data, and how to develop new methods and tools for this purpose. Except for visualization techniques and tools, this question is relatively unexplored, but the subject will obviously grow in importance in the near future.

Motivation

The interest of the statistical community for linked data continues to increase:

The UNECE High-level group for the modernization of official statistics, a group of directors of national and international statistical institutes (NSIs) around the world, increases its activities on linked statistical metadata: after the work on semantizing classifications and statistical models, the HLG is currently developing a core ontology for official statistics.
Eurostat has conducted a major strategic project focusing on digital communication, user analytics and innovative products, which contained different tasks related directly to linked data. In particular, an exploratory project called “Linked Open Statistics” produced valuable deliverables presented during a very successful final event in November 2019. Eurostat is now working on the publication of semantic assets (statistical classifications and glossaries) as linked metadata.
Insee, the French NSI, has organized in September 2018 a successful linked data hackathon which attracted more than 50 participants. Insee has subsequently decided to initiate an overhaul of its data dissemination chain based on the use of linked data.
In January 2019, the London workshop on Linked Data organized by the ONS (the British NSI) has also proved highly successful.
Several NSIs (Canada, Italy, France…) have recently set up a working group on linked statistical metadata. The work plan includes use cases for revisions in Data Cube and XKOS and reflexions on the the representation of process or quality metadata, between others.
The NSIs of Japan, Ireland, and Italy along with the Scottish Government and DCLG in the UK have opened up their statistical data using linked data technologies.
There is also a significant interest in exploiting linked statistical data inside public administrations in order to create innovative public services for citizens and businesses: see for example the EU-funded H2020 OpenGovIntelligence project.
Time series and statistical data is a common resource in today’s era of knowledge graphs empowering machine learning models for AI applications.

This growing interest is a tremendous opportunity for the SemStats community to leverage the work done in the previous years and to continue to elaborate solutions needed for these initiatives.

Topics

The workshop will address topics related to statistics and linked data. This includes but is not limited to:

How to publish linked statistics?

What are the relevant vocabularies for publishing statistical data (including spatio-temporal data) and metadata (code lists and classifications, descriptive metadata, provenance and quality information, etc.)?
What are the existing tools? Can the usual statistical software packages (e.g. R, SAS, Stata) do the job?
How do we include linked data production and publication in the data lifecycle?
How do we establish, document and share best practices?

How to use linked data for statistics?

Where and how can we find statistics data: catalogues, dataset descriptions, discovery?
How do we assess data quality (collection methodology, traceability, lineage, etc.)?
How to perform data reconciliation, ontology matching and instance matching with statistical data?
How can we apply statistical processes on linked data: data analysis, descriptive statistics, estimation, correction?
How to build machine learning models with linked statistics data?
How to intuitively represent statistical linked data: visual analytics, results of data mining?

How to use statistical methods on IoT data streams?

How can statistical processes be applied to Sensor streaming data at runtime and how can the results of these processes be stored and accessed?
How can statistical and machine learning algorithms be used on time series data produced by IoT devices?

Contributions

This workshop is aimed at an interdisciplinary audience of researchers and practitioners involved or interested in Statistics and the Semantic Web. All contributions must represent original and unpublished work that is not currently under review. Contributions will be evaluated according to their significance, originality, technical content, style, clarity, and relevance to the workshop.

At least one author of each accepted contribution is expected to attend the workshop. Workshop participation is available to ISWC 2020 attendants at an additional cost.

A prize will be awarded to the best contribution of the workshop by the CASD.

Full and Short articles (up to 12 and 6 ‘pages’)

The workshop will welcome long and short scientific articles related to the topics mentioned above. Long articles refer to mature research work, where ideas have been implemented and evaluated. Short articles refer to brave new ideas or position statements describing a vision for the Semantic Statistics community.

Challenge articles (up to 12 ‘pages’)

The workshop will also feature a data challenge based on a census and employment data that will be made available on the SemStats web site by the end of May. It is expected that data from at least Bulgaria, Ireland, France, and Italy will be available. The challenge will consist in the realization of mashups or visualizations, but also on comparisons, alignments and enrichments of the data and concepts involved.

Application and Demo articles (up to 6 ‘pages’)

This year, the workshop calls for contributions more generally. This includes interactive demonstrations of applications, or useful and relevant software library and repository, described in short articles. All application and demo articles should include a link where readers can experiment with the live software. Additional pointers such as source code repository are also welcomed.

Writing your contribution

All contributions must be written in English and must be formatted according to the information for LNCS Authors (see http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0). Please note that HTML+RDFa contributions are also welcome as long as the layout complies with the LNCS style. Authors are welcome to use dokieli (source) or similar systems. Contributions are not anonymous. Please share your contributions through Easychair and before June 26, 2020, 23:59 Hawaii Time. All accepted articles will be archived in an electronic proceedings published by CEUR-WS.org.

Contribution date: June 26, 2020
Notifications: July 17, 2020

If you are interested in sharing a contribution but would like more preliminary information, please contact semstats2020@easychair.org.

Open Peer Review

The workshop adopts an open peer review policy with the following criteria:

The reviewers and editors for each article will be attributed and publicised by default on the workshop website. However, reviewers and editors can also explicitly opt-out - attributed to "Anonymous".
All content pertaining to (meta)reviews will be made public on the workshop website, and use Creative Commons Attribution 4.0 International (CC BY) license. The reviews can be publicly archived.
Reviewers and editors may independently self-publish their responses to the articles. The review must be licensed with CC BY, and the URL it is made available from must be publicly archivable.

Program Committee (to be confirmed)

Stefano Abruzzini, EC - DG Connect
Ghislain Auguste Atemezing, Mondeca
Oscar Corcho, Universidad Politécnica de Madrid
Cinzia Daraio, University of Rome "La Sapienza"
Miguel Expósito Martín, Instituto Cántabro de Estadística
Peter Haase, metaphacts
Paul Hermans, ProXML
Areti Karamanou, Univesity of Macedonia
Laurent Lefort, Australian Bureau of Statistics
Andrei Melis, Eau de Web
Albert Meroño-Peñuela, VU University Amsterdam
Jindřich Mynarz, MSD IT
Bill Roberts, Swirrl IT Limited
Hideaki Takeda, National Institute of Informatics

Organizing Committee

Sarven Capadisli, TIB Leibniz Information Centre for Science and Technology, Germany
Franck Cotton, INSEE, France
Armin Haller, ANU, Australia
Evangelos Kalampokis, CERTH/ITI and University of Macedonia, Greece
Raphaël Troncy, EURECOM, France