SemStats 2019 Program

Document ID
CC BY 4.0


7th International Workshop on Semantic Statistics co-located with the 18th International Semantic Web Conference (ISWC 2019)
Auckland, New Zealand


  • 9:20-09:30: Opening by the Chairs
  • 9:30-10:10: Keynote Talk
    TitleThe library/archive semantic world versus the statistics world: parallel or complementay universes
    SummaryThe GLAM (galleries, libraries, archives, museums) world has embraced the semantic world and implemented linked data into their daily work. The statistics world seems slower to embrace linked data at a practical level. This talk will examine how different cultures and different objects affect the uptake of semantic approaches. It will highlight case studies which bring the two worlds together and look at ways in which adopting ideas from the GLAM world could improve linked data in the statistics world.
    Speaker  Claire Stent (Statistics New Zealand)
    BioClaire Stent is a Senior Advisor – Data and Information Management at Stats NZ. She manages the Stats NZ DDI repository (DataInfo+) and the Stats NZ Store House. DataInfo+ houses metadata about Stats NZ statistical outputs. The Stats NZ Store House houses technical publications, questionnaires and links to conference papers and research using data from the IDI (Integrated Data Infrastructure).
    Claire spent the first part of her professional career working for the New Zealand National Bibliography creating discovery metadata for books, art prints, sound recordings and maps. At Stats NZ, she enabled historical Stats NZ data to be re-used by leading a project to digitise the New Zealand Yearbooks from 1893 to 2012 in XML and transforming it into HTML.
  • 10:10-10:50: Research Papers
  • 10:50-11:20: Coffee Break
  • 11:20-11:40: Franck Cotton: International initiatives
  • 11:40-12:40: Invited Talk
    TitlePresentation on the Integrated Data Infrastructure (IDI)
    SummaryNew Zealand's Integrated Data Infrastructure is a world leading data resource that holds information about people and households. Government and academic researchers use the IDI to gain insight into society and the economy that can help us understand complex issues affecting New Zealand.
    This session will explore the IDI and the challenges involved in creating a single linked record for individuals from disparate administrative sources.
    The IDI is built using probabilistic record matching methods to join up information about individuals held in a range of administrative and survey datasets (such as tax records, immigration records and health records, for example). Stats NZ links the data together using five key linking variables (First Name, Last Name, Date of Birth, Sex and Address). After linking, data is de-identified and made available in a secure environment to approved researchers.
    The IDI relies on messy data, collected across a range of circumstances and environments. Challenges that complicate data linking include: a lack of shared data definitions; lack of common data concepts; overlapping concepts; simple errors and mistakes in data; limited metadata; and changes in data over time.
    Semantic approaches may have the ability to improve the quality of our linking and the quality of the final IDI. Workshop participants will be invited to discuss the IDI as an existing real world example, and consider how semantic approaches could be used to improve the way the IDI operates.
    SpeakerHamish James (Statistics New Zealand)
    BioHamish is General Manager, Customer Channels at Stats NZ where he leads teams responsible for customer facing services and products, including New Zealand's Integrated Data Infrastructure. Over the last 14 years, Hamish has worked in a variety of roles related to information management, strategy and customer support at Stats NZ. Hamish began his career working on quantitative history projects at the University of Otago, before spending a number of years in the UK, working at the UK Data Archive and at the Arts and Humanities Data Service.
  • 12:40-14:00: Lunch Break
  • 14:00-14:40: Keynote Talk
    TitleUse of knowledge graphs and relational machine learning in the Australian Bureau of Statistics
    SummaryGovernments increasingly need more targeted information solutions to assist decision-making in difficult areas of policy formulation, service delivery, regulatory compliance, and infrastructure investment. However, there are two significant challenges to meet. The first is that almost all of the important and enduring problems now confronting governments arise from the dynamics of complex economic, social and environmental systems that are deeply interconnected at a regional, national and global level. Such problems are notoriously difficult to isolate, decompose and objectively specify. The second challenge is that diverse data sources – including those associated with Internet-mediated digital connectivity and interaction – need to be brought together to create a dynamic and purpose-specific evidence base.
    In this talk, Ric will outline innovative work in the Australian Bureau of Statistics (ABS) to build next generation analytical capability on the use of knowledge graphs and relational machine learning. This approach will be illustrated through a discussion of four interconnected policy themes – export success, job creation, a skilled workforce, and social welfare – that emerge from what is essentially a single complex system of systems, and so reflect distinct but correlated sets of concerns for government.
    Speaker  Ric Clarke (Australian Bureau of Statistics)
    BioRic Clarke is the Director of Machine Intelligence and Novel Data Sources (MINDS) in the Australian Bureau of Statistics (ABS), where he leads a multidisciplinary research and development team in advancing the use of new methods, technologies and data sources in official statistics. Ric has over 20 years of public sector work experience in a range of technical and business roles: data science, systems development and support, strategic planning, program management, enterprise architecture, and client relations. Originally a theoretical physicist, Ric also has postgraduate qualifications in computer science.
  • 14:40-15:20: SemStats 2019 Challenge
  • 15:20-16:00: Coffee Break
  • 16:00-16:40: Research Papers
  • 16:40-17:20: Awards and Next Steps

The SemStats workshop is sponsored by CASD, a secure data hub for researchers.