European Map with the 11 EUniWell partner universities
European Map with the 11 EUniWell Partner Universities

European University for Well-Being

Die "European University for Well-Being" (EUniWell) ist eine von 50 europäischen Hochschulallianzen und besteht aus elf Partneruniversitäten aus Deutschland, Frankreich, Großbritannien, Italien, Schweden, Spanien, Ungarn und der Ukraine. Zentrales Ziel der Allianz ist es, durch gemeinsame Aktivitäten in Forschung, Lehre und Transfer, auf ein höheres Wohlergehen und Wohlbefinden (engl.: well-being) aller hinzuwirken.

Opportunities and Events (to be announced soon)

Turning PDFs into Research Data

Wann
Dienstag, 27. August 2024
4:30 bis 5:30 Uhr

Wo
Online

Veranstaltet von
BERD@NFDI

Vortragende Person/Vortragende Personen:
Jack Collins

Do you ever feel that the data you need for your research is accessible but it’s not in a convenient table, such as company reports or building plans?

Perhaps the information you need is spread out across many different documents?

If only we could read and extract structured data from thousands of written documents. 

In this course, we explore how to accomplish this task by combining web scraping, Optical Character Recognition (OCR), and Natural Language Processing (NLP). Over four weeks, we provide online lessons and interactive sessions to learn the fundamentals of these key technologies.

Topics

  • Methods for extracting text and files from websites using tools such as Selenium and how to avoid common pitfalls.
  • Methods for extracting text from images, such as scans of written documents. 
  • Exploring technologies that can help automate data extraction from harvested text and a critical review of common data quality issues. 

Format

This is an online course. 

  • Week 1: Watch pre-prepared video lectures about relevant theory and demonstration of example exercises. The topic is web scraping and OCR  (~45 min). Interactive Online Session (~60 min).
  • Week 2: Applying last week’s lessons to the example coding exercise or your own project (~30 min). Interactive Online Session (~60 min).
  • Week 3: Watch pre-prepared video lectures about relevant theory and demonstration of example exercises. The topic is NLP and common data extract issues (~30 min). Interactive Online Session (~60 min).
  • Week 4: Applying last week’s lessons to the example coding exercise or your own project (~30 min). Interactive Online Session (~60 min).

Weekly Meetings

The course includes 4 live Online Meetings, in which you will discuss the week’s contents with the instructor and fellow participants:

Meeting 1: Aug 27, 2024, 4:30pm – 5:30pm CEST
Meeting 2: Sep 03, 2024, 4:30pm – 5:30pm CEST
Meeting 3: Sep 10, 2024, 4:30pm – 5:30pm CEST
Meeting 4: Sep 17, 2024, 4:30pm – 5:30pm CEST

Prerequisites

  • Basic programming knowledge (R, python, …)
    • Note that the course will be in Python, but if you only know R, this is still ok! The code examples are simple and will run entirely on Google Colab, meaning you will not have to install anything. This course will make a good opportunity to try Python for the first time and you can also try the self-paced BERD introduction to Python course
  • Willingness to learn new technical skills
  • A Google Account

 

Further information and registration

Kontakt

Dr. Daniela Kromrey
Coordination European Universities Alliance
Office C 338

E-Mail schreiben