Challenge view
Back to ProjectGateway to Legal Data
A unified search entry point into today's highly fragmented legal database landscape and a one-stop shop for legal data.
I. The project
The project aims to create legal data showcase, in other words:
- a unified search entry point into today's highly fragmented landscape of legal databases, and
- at the same time a low-threshold, accessible one-stop-shop for legal data.
Traditionally, libraries have been the gatekeepers for access to legal data, especially legal texts, but also legal data in the broadest sense. Libraries not only made this data spatially accessible, but also added metadata that made the data itself searchable and discoverable. This role of libraries has changed significantly in recent years. Today, legal data are often made available in databases by different actors, with different access and accessibility.The current fragmentation of access to legal data affects national and international research and its visibility. The project "Gateway to Legal Data" tries to create a counterbalance. Beyond the existing and desirable diversity of data sources, a unified search entry as well as a one-stop-shop for legal data shall be created. Its architecture can be described as follows:
II. The Challenge
A running prototype can be found here: www.scigate.online. The system is in part modularized and should be further modularized. In particular, data sources should be extended, and data aggregation added while supporting more search functionality. The linchpin of scigate.online are so-called proxies, whose task is to address data sources, translate their response and homogenize as far as possible the data to allow a unified search and access via scigate.online.
- Part of the challenge will be to build more proxies to connect additional data sources, such as fedlex and other legal data sources, to the platform. This data will be harmonized as much as possible so that it can be made available via a uniformed API. In the future, this should minimize the need to write a new scraper for each legal data research project.
- Another part of the challenge will be to present the data as search results on the platform. The proxies currently collect three lines for each entry plus a link to display the entry. The selection of what should be displayed for each entry, how it could be displayed and what existing functionality of the source systems might be used to render the search as user-friendly as possible, could be optimized. The search could also be extended by including facets or auto completion.
- Finally, the retrieved hitlists and documents (until now only for entscheidsuche) could be used to provide additional functionality. They could be fed into AI to mark the most relevant passages, to have an automated summary or to answer a natural language query.
https://challenges.openlegallab.ch/project/54
III. Resources
Running prototype: scigate.online
The different code bases can be found here:
- The common search interface: https://github.com/Velofisch/crossrecherche-ui
- The proxies to connect the different search engines: https://github.com/Velofisch/Crossrecherche-proxies
- The API to bulk download results: https://github.com/Velofisch/Scigate-API
- The UI of entscheidsuche: https://github.com/entscheidsuche/entscheidsuche-vue
Questions can be addressed at joern@erbguth.net
Crossrecherche
Crossrecherche nach juristischen Inhalten in akademischen Datebanken
Purpose and Requirements
This is a prototype to have a combined search in different academic databases with a focus on legal content. The requirement can be found in "Leistungsbeschrieb.pdf" in this repository. A proposition of layouts can be found here: https://www.figma.com/proto/9ZuPfkKwlPOt3vIWMW1FVr/Sci-Gate?node-id=2%3A2&scaling=min-zoom&page-id=0%3A1&starting-point-node-id=5%3A716 The first layout implemented will be the tabs view (page 3). The layouts on page 2 (parallel view) and page 5 (mixed view) might be added at a later stage.
Architecture
A serverless and web-application will access proxies for the different search engines. These proxies are availabe for Boris, Zora, Swisscovery and entscheidsuche.ch.
Proxy-API
These procies have a JSON REST-API:
General usage
- address: http://v2202109132150164038.luckysrv.de:8080/
- Input JSON
- type: search|hitlist
- engine: entscheidsuche|swisscovery|boris|zora
- Output JSON
- status: ok|error
- error: Error message (only present if status=error
type=search
(Only additional parameters described here)
- Input JSON
- term: search term or search terms, no syntax translation is currently done for the search engines
- Output JSON
- hits: number of hits for the search
type=hitlist
(Only additional parameters described here)
- Input JSON
- term: search term or search terms, no syntax translation is currently done for the search engines
- start: position in the hitlist to start (default=0) a position beyond the length of the hitlist will generate an error
- count: number of hits to fetch (default=10). There is no maxmimum
- Output JSON
- start: offset where the list starts
- searchterm: searchterm used
- hitlist: list of hits every hit has the following attributes
- description: list of 3 strings describing the hit. As markup can be included -Tags with the classes hl1 and hl2 for bold and italic.
- url: URL to the hit at the search engine. Can be opened without context and should be opened in a new tab
Previous
Open Legal Lab 2023
Next project