Aarhus University

21-22 May 2019

9 am - 4 pm

Instructors: Adela Sobotkova

Helpers: Joachim Holmen Frost

General Information

Thank your for your interest in the Fundamental Data Skills workshop.

The ever-increasing digital nature of research requires researchers, postgraduate students, and research-support staff to equip themselves with the skills to create, manipulate and manage data in digital format. This can involve complex research data management techniques.

Today, researchers and students can perform simple to complex data management through open source tools and techniques which do not require highly specialised skills.

This workshop will assist researchers, postgraduate students, and research-support staff to learn more about such tools and techniques

The workshop is organised and funded as part of the Center for Digital History Aarhus (CEDHAR) in cooperation with the Carpentries.

Workshop Aims

This workshop aims to provide arts, humanities, and social scientists with a broad introduction to the following concepts and tools

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Building 1485, room 218, Jens Chr. Skous Vej, Aarhus University. Get directions with OpenStreetMap or Google Maps.

When: 21-22 May 2019. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Data Carpentry's Code of Conduct.

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organisers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch (using contact details below) and we will attempt to provide them.

Contact: Please email adela@fedarch.org for more information.


Schedule

Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey

21 May 2019

09:00 Data organization in spreadsheets
10:30 Coffee
12:30 Lunch (catered)
13:00 OpenRefine for data cleaning
14:30 Coffee
16:00 Wrap-up

22 May 2019

09:00 Introduction to R
10:30 Coffee
12:30 Lunch (catered)
13:00 Data analysis and visualization in R
14:30 Coffee
16:00 Wrap-up

We will use this collaborative document for chatting, taking notes, and sharing URLs and bits of code.


Syllabus

Data Carpentry

Data Organisation in Spreadsheets

  • Data organisation and management
  • Good data formatting practices
  • Avoiding formatting mistakes
  • Quality control and data manipulation in spreadsheets

Data Cleaning with OpenRefine

  • Introduction to OpenRefine
  • Importing data
  • Basic functions
  • Advanced Functions
  • Reference...

Introduction to R

  • R and R Studio
  • Reproducibility in R

Starting with data in R

  • Describe what a data frame is.
  • Load external data from a .csv file into a data frame in R.
  • Summarize the contents of a data frame in R.
  • Manipulate categorical data in R.
  • Change how character strings are handled in a data frame.
  • Format dates in R

Data aggregation with dplyr

  • Select certain columns in a data frame with the dplyr function select.
  • Select certain rows in a data frame according to filtering conditions with the dplyr function filter.
  • Link the output of one dplyr function to the input of another function with the ‘pipe’ operator %>%.
  • Add new columns to a data frame that are functions of existing columns with mutate.
  • Use the split-apply-combine concept for data analysis. Use summarize, group_by, and tally to split a data frame< into groups of observations, apply a summary statistics for each group, and then combine the results.
  • Reshape a data frame from long to wide format and back with the spread and gather commands from the tidyr package.
  • Export a data frame to a .csv file.

Data visualization with ggplot2

  • Produce scatter plots, boxplots, and time series plots using ggplot.
  • Set universal plot settings.
  • Describe what faceting is and apply faceting in ggplot.
  • Modify the aesthetics of an existing ggplot plot (including axis labels and color).
  • Build complex and customized plots from data in a data frame.

Setup

To participate in a Data Carpentry workshop, you will need access to the software described below. In addition, you will need an up-to-date web browser.

We maintain a list of common issues that occur during installation as a reference for instructors that may be useful on the Configuration Problems and Solutions wiki page.

OpenRefine

You can download OpenRefine from here. OpenRefine 3.1 is recommended. There are versions for Windows, Mac OS X and Linux.

Windows or Linux

When you download OpenRefine for Windows or Linux from the address above, you are downloading a zip file. To install OpenRefine you unzip the downloaded file wherever you want to install the program. This can be to a personal directory or to an applications or software directory - OpenRefine should run wherever you put the unzipped folder. The location has to be a "local" drive as problems have been reported trying to run OpenRefine from a Network drive.

Mac OS X

If you are downloading OpenRefine for Mac, you are downloading a 'dmg' (disk image) file which you can open, and then drag the OpenRefine application to an appropriate folder on you computer.

R

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.

Windows

Video Tutorial

Install R by downloading and running this .exe file from CRAN. Also, please install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

Mac OS X

Video Tutorial

Install R by downloading and running this .pkg file from CRAN. Also, please install the RStudio IDE.

Linux

You can download the binary files for your distribution from CRAN. Or you can use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R). Also, please install the RStudio IDE.