Active projects and challenges as of 22.12.2024 18:51.
Hide full text Print Download CSV Data Package
Article boundary detection
We extract article headlines based on the font size and perform topic modeling based on the TF-IDF score.
Our multilingual diachronic corpus contains more than 120 years of the digitized Credit Suisse Bulletin magazine issues. In order to extract the wealth of linguistic information from this data we need to have proper alignment. It means that for each article in German we need to be able to automatically find the same articles in French, English, Italian, and Spanish. First of all we need a reliable article boundary detection algorithm. This is the challenge that we present to the public.
Our team decided to tackle the problem from two angles. On one hand, we are writing an algorithm to detect article titles. Our starting point is the font size. We find this information in the rich layout XMLs produced by ABBYY reader during the processing of the original scans with the Optical Character Recognition software.
Sometimes articles are published out of order which creates additional challenges with alignment. Therefore our second approach is an attempt to overcome this difficulty. We implement topic modeling to be able to align articles based on themes rather than position. We extract the textual data from files and then calculate each word's relevance to the topic with the TF-IDF score. The initial evaluation would happen on the manually separated articles. If the idea proves itself the algorithm could be extended to automate the article boundary detection.
Once both scripts succeed to produce desired results, we can combine them into one robust algorithm for the automatic article boundary detection.
dicziunari
Eine App des "Pledari Grond" für Android und iOS, die die Datenbank (unter einer freien Lizenz verfügbar) bereits enthält und somit auch offline funktioniert
Kulturprofi
Entwicklungsstufen interkultureller Sensibilität sichtbar machen
Schülerinnen und Schüler begegnen während eines Austausches "kritischen" Ereignissen (critical incidents) welche sie mit den kulturellen Unterschieden der eigenen und der Gastkultur konfronieren. Gemäss dem Developmental Model of Intercultural Sensitivity (DMIS) von Milton J. Bennett, lassen sich der Umgang mit solchen Situtation in 6 Entwicklungstufen unterteilen. Die auf der Plattform Typefom.com erstellten Fragen, zeigen auf, wo ein/e SchülerIn in ihrer Entwicklung steht.
Milton J. Bennet http://www.buildingthelifeyouwant.com/blog/intercultural-sensitivity-is-not-natural https://woca.afs.org/education/m/icl-for-afs--friends/6799/download
Language Adventure
Kooperatives Game zur Förderung von Austausch und Sprachkompetenzen in mehrsprachigen Gruppen
Kooperatives Game, in dem gemischtsprachige Gruppen über eine Applikation gemeinsam Missionen erfüllen müssen. Die Lösung der Aufgaben erfordert Absprachen innerhalb der Gruppe, wobei die aktiven und passiven Fremdsprachenkompetenzen trainiert werden.
Einsatzbereich: Fremdsprachenunterricht, Klassenaustausch, Language-Café
Sprachversionen: -Alle Spieler spielen in ihrer Fremdsprache (einsprachige Gruppe) -Alle Spieler spielen in ihrer Muttersprache (mehrsprachige Gruppe)
Romansh dictionary in Wikidata
Add Romansh dictionaries to Wikidata.org
Take the database behind http://pledari.ch/, convert it to Wikidata lexemes, upload to wikidata.org so that lexical data for Romansh variants become available to the general public as open data under Wikidata’s CC0 license.
#röstigrabenism
Explore news data to help understanding trump clichées
More details about this project, aggregated results and discussion at the School of Data CH forum
This is a project started at the #plurilinguism hackathon, in response to challenge #5 Röstigrabendetektor.
We set out to use the open web and machine learning tools to together explore the dimensions of social geography. What is the röstigraben? It is a kind of meme used to distinguish interests between the two largest language regions in Switzerland. One of those curious things about Switzerland that you learn about in time. Whether or not a röstigraben exists, what character it has gets hotly debated (nzz.ch), typically with statements like this:
"In social and foreign policy, the Romands tend to favour government regulation (influenced by the centralistic political mentality prevailing in France) and an active foreign policy (somewhat discarding Switzerland's neutrality), especially in relation to the European Union." -- https://en.m.wikipedia.org/wiki/Röstigraben
There is also the concept of a Polentagraben with the Romansh/Italian-speaking regions, which could be explored in the same way. For more background on the topic, we suggest reading the Swissinfo article, that links to further analysis and books.
Additional inspirations for this project:
- Parliament Impact Project (opendata.ch)
- What one artist learned about America from 19 million dating profiles (ted.com)
- How Connected Is Your Community to Everywhere Else in America? (nytimes.com)
- Why Journalists Should Talk About Geography (lse.ac.uk)
- How East and West think in profoundly different ways (bcc.com)
- Wikinews on Newsworthiness (wikinews.org)
The hack
The Jupyter notebook we wrote at the event, coded in the Python programming language, explores interaction with the TextRazor API which performs language detection and entity extraction on free-form text. They even have support for classifiers from the IPTC NewsCodes ontology, support semantic metadata out of the box, etc. It's a pretty cool API, easy to get started with, though not completely open (and there are open alternatives to explore as well), and has a fee starting from 500 requests per day.
The output provides Freebase identifiers, which are easy to use to filter the list to people, locations and organizations, and Wikidata identifiers (such as Q214086 for Suisse Romande). We expand these through the Wikidata open API to obtain geographic coordinates of headquarters or birthplaces. Through a simple calculation at the end we obtain a score indicating how far röstigrabenised the article is.
After providing a link to the article, the tool (through a Web interface, Twitter/chat bot, etc.), runs and provides visual results of the analysis. Additionally the user should be able to see the specific entities in the text that the score is based on, and decide to ignore them in the calculation - to filter out false positives - or even add their own opinion.
The end result should look something like this sketch:
Ultimately we should be able to crowdsource responses about a variety of news sources, and construct a map of their polarisation towards or against a cultural bias.
Scroll down to see our hackathon project in action.
Note
The current version uses the old fashioned Wikidata API service instead of Sparql queries, and could be improved using a query like this one possibly linked to Q214086 (Suisse Romande). An example project that uses this is wiki-climate.
We also considered expanding the reach of our classifications using a tool for Social Network analysis (see O'Reilly, socnetv).
Team
- Celine Zund
- Karlen Kathrin
- Oleg Lavrovsky
"Tinder" für Sprachaufenthalte
Plattform für Organisation von Austauschen
De nombreuses personnes des différentes régions linguistiques de la Suisse souhaiteraient faire un échange. Une plateforme leur permettant de mettre en contact deux partis intéressés dans l'idée de faire un échange rendraient le processus beaucoup plus efficient. Chaque personne voulant organiser un échange avec un binôme d'une autre région linguistique pourra s'inscrire sur le site en ligne et proposer un projet d'échange que les autres personnes trouveront sur la plateforme. Il pourra lui aussi rechercher un projet lui correspondant, grâce à un filtre sélectionnant les projets lui correspondant le plus.
Voici le lien vers le prototype du site web, la page d'atterrissage: https://ariane847.wixsite.com/exchange
Voici le lien vers le prototype du site web, page de profil: https://ariane847.wixsite.com/exchange3
Voici la présentation du projet:
Challenges
Artistes francophones en Suisse alémanique
Promotion of French speaking artistes in German speaking part of Switzerland
What problem are you solving?
Create a network between Cultural Associations in CH. See challenge #9 Franco Luzern
How are you solving it?
In two steps:
- By visiting associations in Switzerland using Franco Luzerns' networks to "sell" the project
- By creating an online platform
What have you accomplished during the Hackathon?
We developed the project (CoArt), we planned the next steps and we bought a new domain
Add any demo links, screenshots, documentation etc.:
Who's in your team?
Taisa Mara Schlatter et Serge Robert
Emojis per Rumantsch (RG)
Translation of the standardized Emojis into Rumantsch Grischun.