Our tutorial focuses on blind spots in social data. This figure presents a small example of real blind spots. Staring at the plus cross causes the colored dots to disappear due to the natural blind spots in human vision.
Image Source: http://www.psy.ritsumei.ac.jp/~akitaoka/kieru-e.html.

Overview

In order to build machine learning models that automatically make accurate predictions about the world around them, we need to train these models with representative datasets. When this condition is not met, major errors can occur. These errors can manifest in camera software that over-predicts Asians as "blinking" [1], to models that over-predict African-Americans as likely criminals [2]. Non-representative datasets can also lead us to errors when interpreting social media data [3]. In this tutorial we will present different ways in which a dataset can be biased, and the potential issues that can arise from using biased datasets in a predictive setting. Finally, we will present solutions to this problem, ranging from adjusting existing datasets to approaches to collecting data so that bias is minimized in the resulting dataset.

This tutorial it targeted at [is designed for] researchers and practitioners in the area of artificial intelligence. After attending this tutorial, the attendees will leave with an understanding of various sources of bias, the effects of using biased data to train machine learning models, and approaches for mitigating and correcting bias in their own data.

References

Timeslot

Saturday, February 4th from 4:15 - 6:00 PM.

Syllabus

Section Duration
Introduction - Blindspots and Bias in Social Data
  • Blindspot identification exercise
  • Machine learning examples
25 minutes
Bias in Social Media Data
  • Overview of bias in social media data.
  • Social Data Bias
  • Correction techniques
  • Mitigation techniques
50 minutes
Social Bias in Other Areas of Machine Learning
  • When to stop collecting data?
  • When are learned patterns valid?
  • The cost of false positives
15 minutes
Conclusion and Q&A 10 minutes

Slides

[Tutorial Slides]

Handouts and Reference Materials

[Tutorial Handout]

Presenters

Page last updated February 3, 2017.
Real Time Web Analytics