Repository updated
In this project, we briefly explored the TERMDAT API and collected some basic data from each Swiss government level and department, comparing the results with Wikipedia and media sources. Our data and Python notebooks can be found in the Sources.
See also:
Challenge
Show the pulse of what the Swiss Government is working on, by combining up-to-date terminology databases with texts published by the Government.
Words, phrases, and terms in a governmental context have one special feature: As they often relate to legal concepts and are the base of far-reaching decisions, standing terms (also called “named entities” in computer science) need to be well defined, often by a legal base. To be able to work with this domain-specific vocabularies, these standing terms are organized in terminology databases and can be accessed through TermDat (BkTermdatUi ). The goal of this challenge is to harness the expressivity and freshness of the terminologies provided by TermDat to create a high-quality map of what topics the Swiss Government is currently working on.
"The Swiss Confederation currently has almost 40k employees (~36k full time equivalents). It is not only difficult for the citizens of Switzerland to grasp the breadth and depth of which topics are worked on, the same is valid for the employees within the Swiss Confederation. Therefore, it is important to have a good high-quality overview of the ongoing work and the change of focus in regard of ongoing themes globally. Conversely, the specialists working on the terminology database are not able to read all new texts being published every day and would welcome a tool which would allow them to harvest newly coined terms or terms being used in a new context or with a different translation. Finally, end-users of texts might want to be able to click on a given (technical) word and receive a definition of that word."
Organization: BK Federal Chancellery
Prepared Material:
- 200GB of WebArchive in JSON format from *.admin.ch
- TermDat API: https://api.termdat.ch/swagger/index.html
- BigData Machine with Jupyter Notebooks (1.5TB RAM)
Living Topics (Proto Alpha)
TL;DR: In this hackathon experiment, we examine data describing Swiss government functions, comparing results with Wikipedia and media sources. Some notebooks with initial set-up for machine learning, along with results of crawl, can be found in this repository.
See also:
Hackathon journey
This code project is based on the "Living Topics" challenge, proposed at #GovTechHack23 on March 23, 2023. Here is the gist of it:
The goal of this challenge is to harness the expressivity and freshness of the terminologies provided by TermDat to create a high-quality map of what topics the Swiss Government is currently working on.
Looking at this problem statement, we first have to take a step back: what is the structure of the Swiss government, what is the scope of 'topics', where would you start - in other words, what would be the high-level 1:1000000 map of the administrations?
1:1 Million Map of Switzerland (swisstopo shop)
After some discussion, we came up with a slightly more accessible version of the Living Topics challenge: instead of bottom up - at the current topic levels as originally stated - let us begin at the top level of government, obtain definitions of the functions and responsibilities of government departments. The more detail we have, the better we would be able to classify a topic as belonging to one or another office. From here, step by step we would be able to identify specific current affairs.
Organigram from the 2020 edition, see also 2023 update.
We begin at the top. Helpfully, the Federal Chancellery produces an illustrated guide to the political and administrative system (ch-info.swiss) in Switzerland, available in print, online and in an app. This gives a brief overview to the departments, with some detail of their function. We could unfortunately not find the source code or any way to bulk-download from this website. We keep searching.
172.010.1 Government and Administrative Organization Ordinance
The FEDLEX service provides us the legal documents that serve as the mandated basis for the administrations. We find the interface clumsy, and the document layouts not machine-readable. Even when we export the XML version, we get impractical HTML tables inside. Nevertheless, our discussion leads us to explore the State Calendar as an alternative source of hierarchical structure, which leads us to quickly updating a long overdue public bodies open data source.
What does 'the Internet' have to say about all this?
Screenshot of ChatGPT by OpenAI.
Hmm, wonder where 'the Internet' gets this data from?
Screenshot of four language editions (EN-IT-FR-DE) of a Wikipedia article.
The Wikipedia page Federal administration of Switzerland provides a similar overview. We found that the very complete content in the German edition to be somewhat out of date, the English language nearly as complete, the French significantly shorter, and Italian practically empty. Using the Mediawiki API - also via handy Python wrapper - it is possible to quickly get the contents of Wikipedia pages. And in an Edit-a-thon, we could update them and improve the translations.
Screenshot of successive Linguee.com searches.
What else could we try? A series of searches on Linguee (a dictionary service that is part of DeepL) provided some clues about various government websites and media repositories describing responsibilites of the federal, cantonal and municipal government.
Screenshot of Nicht Sache der Kantone (NZZ 2009).
Finally, we explore the media landscape. At other hackathons like the recent Rethink Journalism event, we had a chance to work with press databases - some of which would be excellent resources to understand expectations and questions about the function of government from the outside in. We leave this avenue for a future foray, though we trust that the web services of the Confederation would be the best starting point.
Which brings us to the point of departure of the hackathon - the I14Y Interoperability Platform. We decide to use the API of TERMDAT to in sequence understand the main levels and units of government, though all three of the available endpoints, like News Service API, are interesting:
Screenshot of I14Y Interoperability Platform
Continuing with the questions we explored above, we first explore the relatively straightforward web interface, punching in some test searches, that seemed to give promising even if limited results:
Screenshot of TERMDAT
It becomes clear that we would need to be very precise, and correct, in our queries. First, we create a simple folder structure: bund
(Federal), kantone
(Cantonal) and gemeinde
(Municipal) for the three levels of government. Then bk
, uvek
, edi
... for the main government departments. In these folders we can put text files (termdat.txt
, wikipedia.txt
, ..) that help us to create a classifier for topics related to these departments.
We write a simple aggregator to repeatedly query the TERMDAT API and save the descriptions (or any available notes) about the departments into these folders. One of the issues we experienced were minor inconsistencies in the data schema (missing description
fields), which our code works around.
Screenshot of API docs, Jupyter notebook, search results.
At this point, we look into the question of how to best classify these texts. Using a Sentence Similarity model like gBERT-large-sts-v2, which has a fine-tuned version by Deutsche Telekom, we can utilise a cloud-based API - or run our own inference service to work out the appropriate department. We have some initial code, but could not get results until a few hours after the deadline.
Screenshot of sentence-transformers notebook
We are nevertheless motivated to continue on this idea, and would be happy to hear feedback & suggestions via GitHub Discussions.
License
MIT
Event finished
Hallo, Welt
Initial commit
Get
Saw a GitHub error today? Read security notice, rotate your keys
Find
We worked with the TERMDAT API, found minor inconsistencies, got help from Raphaël (BK) to collect the data we needed.
Ask
Joined the team
Event started
Edited content version 7
Edited content version 5
Joined the team