Skip to content

Latest commit

 

History

History
37 lines (27 loc) · 3.13 KB

File metadata and controls

37 lines (27 loc) · 3.13 KB

Processing Complex Data

Lecturer: Javier Garcia-Bernardo, Assistant Professor of Social Data Science, Department of Methodology & Statistics, Utrecht University.

About the course

Contrary to what most introductory data science and statistics courses teach, real-world scientific data come in an enormous variety of formats, sizes, structures, and procedures — from simple tables to spatiotemporal arrays, normalized relational schemas, nested API responses, raw scraped web pages, networks, sampled time series, and domain-specific scientific standards. This course gives students hands-on experience with handling, processing, and modelling several families of complex data, in a hackathon-style format where each group goes deep on one data type and teaches the rest of the class.

The narrative spine of the course is from raw traces to defensible claims. Each group works through a single pipeline: raw source → operationalized clean object → baseline model with one sensitivity check → presentation.

Course materials

Lectures

Week Title Lecture
1 What Makes Data Complex? TBD
2 From Complex Data to Clean Data TBD
3 Scaling Up Modeling TBD
4 Communicating Research TBD
5 Presentations
6 Short Report

Group projects

Variant Data family Example research question
Geospatial projects/geospatial.md What is the relation between municipal land use and population composition?
Networks projects/networks.md What is the relationship between gender and cross-program relations in high school?
Messy web text projects/messy_web_text.md Do company sustainability pages differ linguistically from public-interest climate information pages?
Relational database projects/relational_database.md Which driver, constructor, grid, circuit, and season characteristics are associated with F1 finishing points?
Scientific programming projects/scientific_programming.md In one NSD session, are mean beta responses different in V1 and hV4?
Time series projects/time_series.md Is gaze movement lower while an NSD target image is on screen than nearby periods when it is not?
API data projects/api_data.md Which study attributes are associated with completed versus ongoing clinical trials?