Assignment 3: White Hat/Black Hat Visualization

It is tempting to think of data and data visualization as a neutral actor. An emphasis on a minimalist aesthetic — particularly through the use clean, precise geometric lines — lends an air of objective, transparent reporting that masks visualization's persuasive power. Given the growing ubiquity of visualization as a medium for recording, analyzing, and communicating data, we have a responsibility to examine how our design choices can influence the way a visualization is read, and what insights a reader walks away with.

In this assignment, we will grapple with these ethical concerns by visualizing a single dataset from two different perspectives: the "white hat" and the "black hat." These terms originated in the symbolism used by early Western (genre) movies: the heroes wore white hats, and the villains wore black hats. They have since been adopted in computer security to refer to two different kinds of hackers: a white hat hacker uses their skills for good (e.g., to uncover the vulnerabilities in software as a means to draw attention and fix the issue), whereas a black hat hacker violates computer security for malicious ends (e.g., their own personal gain).

For this assignment, we will consider a white hat visualization to be one where:

  • The visualization is clear and easy to interpret for the intended audience (often the general population);
  • Any data transformations (e.g., filters, additional computations, etc.) are clearly and transparently communicated; and
  • The sources of the data, including potential bias, is communicated.

A black hat visualization, on the other hand, exhibits one or several of the following characteristics:

  • The visual representation is intentionally inappropriate, overly complex and/or too cluttered for the audience;
  • Labels, axes, and legends are misleading;
  • Titles are skewed to intentionally influence the viewer’s perception;
  • The data has been transformed, filtered, or processed in an intentionally misleading way; or
  • The source and provenance of the data is not clear to the viewer.

Although we might never imagine ourselves to be (nor aspire to be) black hat hackers, we are going to temporarily don this hat to better appreciate the extent of the rhetorical force of visualization, and build our critical reading skills.

Part One: Design

You will be working with a single dataset, one of three, which we have assigned to you. These datasets are intentionally chosen to cover politically charged topics for the simple reason that these are typically the type of data where ethical visualization is important. Note that you do not have to visualize the entire dataset (i.e., you may choose a subset of the data to visualize).

The three datasets are the following:

  • The DEA Pain Pills Database. The Washington Post has published a significant portion of a database maintained by the Drug Enforcement Administration (DEA) that tracks every opioid from their manufacturer, through to distributors, and into pharmacies in towns and cities across the United States. This is an enormous database, and you can choose to work with it at any level of detail (e.g., state-wide, individual counties, or national summaries).

  • Greenhouse Gas Emissions 1990–2017. The Organization for Economic Co-operation and Development (OECD) has compiled data for the emissions of all participating countries broken out by the pollutant (e.g., carbon monoxide, methane, etc.) and by different sources (e.g., energy, agriculture, etc.). The linked interface can be a little difficult to use, but you can access various slices of the data by either choosing alternate themes in the left-hand side menu, or by customizing the pollutants and variables in the dropdown menus in the main view.

  • Gender Equality Indicators 1960–2017. The World Bank tracks a number of different measures including fertility rate, literacy, employment and ownership of businesses, and wages to study the extent of gender equality around the world. The linked dataset curates a smaller subset of the overall set of gender indicators which you are welcome to use as well.

Deliverables

You will be visualizing your dataset from two perspectives: the white hat and black hat. As a result, you will be generating two static visualizations – one for each hat. We construe "visualization" broadly (e.g., a single visualization may comprise several small multiple views). You are free to use any visualization technique and any visualization tool and you do not need to use the same tools/techniques to generate both visualizations.

As with prior assignments, you should carefully consider not only visual encoding decisions but also how you might transform your data (e.g., groupings, aggregations, and log transforms), and what annotations and labels might help best convey the message from a particular perspective. Document all of these decisions and describe your rationale in a short write-up (no more than 4 paragraphs per visualization).

Grading Criteria

This part of the assignment will be scored out of a maximum of 10 points. We will use the following rubric to grade your assignment. Note, rubric cells may not map exactly to specific point scores.

Hat Component Excellent Satisfactory Poor
White Marks & Encodings All design choices are effective. The visualization can be read and understood effortlessly. Design choices are largely effective, but minor errors hinder comprehension. Ineffective mark or encoding choices are distracting or potentially misleading.
Data Transformation More advanced transformation were used to extend the dataset in interesting or useful ways. Simple transforms (e.g., sorting, filtering) were primarily used. The raw dataset was used directly, with little to no additional transformation.
Titles & Labels Titles and labels helpfully describe and contextualize the visualization. Most necessary titles and labels are present, but they could provide more context. Many titles or labels are missing, or do not provide human-understandable information.
Write Up Your write up is well-crafted and provides reasoned justification for all design choices. Most design decisions are described, but rationale could be explained at a greater level of detail. Missing or incomplete. Several design choices are left unexplained.
Black Marks & Encodings Subtle ineffective choices for marks or encodings require close and careful reading to identify. Ineffective marks or encodings were chosen but could be immediately identified. Design choices are largely effective, and can read and understood effortlessly.
Data Transformation More advanced transformation were used inappropriately or to intentionally mislead. Simple transforms (e.g., sorting, filtering) were primarily used. The raw dataset was used directly, with little to no additional transformation.
Titles & Labels Titles and labels subtly skew reading the visualization. Titles and labels leave out important information, but in such a way as to raise flags for an astute reader. Titles and labels are largely present, visible, and facilitate reading the visualization.
Write Up Your write up is well-crafted and provides reasoned justification for all design choices. Most design decisions are described, but rationale could be explained at a greater level of detail. Missing or incomplete. Several design choices are left unexplained.
Creativity & Originality You exceeded the parameters of the assignment, with original insights or a particularly engaging design. You met all the parameters of the assignment. You met most of the parameters of the assignment.

Submission Details

You must work individually for this part of the assignment. It is due by noon on Monday, 3/2. Submit your assignment using this form. The form expects your visualization to be a single image (either a .png or .jpg). Please make sure your image is sized for a reasonable viewing experience – readers should not have to zoom or scroll in order to view your submission.

Resubmissions. You will have 7 days after grades are released to resubmit this assignment. Resubmissions will be regraded by teaching staff, and you may earn back up to 50% of the points lost in the original submission. To resubmit this assignment, please follow the same submission process described above. Include a short 1 paragraph description summarizing the changes from the initial submission. Resubmissions without this summary will not be regraded. Slack days may not be applied to extend the resubmission deadline. The teaching staff will only begin to regrade assignments once the Final Project phase begins, so please be patient.

Part Two: Critical Reading (In Class, Mon 3/9)

Design, however, is only half the story. To ensure we do not succumb to their allure of authority, we must hone our critical reading skills and to determine whether a visualization has been designed in a clear and transparent way (i.e., from a white hat designer) or whether it seeks to mislead the reader (i.e., created by a black hat designer).

This part of the assignment will be conducted in class, Monday, 3/9. You can work on this part of the assignment individually or in pairs.

At the beginning of class, we will assign you 5 visualizations from a dataset that neither you nor your partner worked with. For each visualization, your task is to:

  1. determine whether it is an example of a white hat or black hat visualization
  2. provide a short (4-5 sentence) description of why

Submission Details

Please submit this form, once per visualization. If you do not finish analyzing the 5 visualization within the in-class session, please submit the remainder by noon on Friday, 3/13 Monday, 3/30. Resubmissions will not be available for this part of the assignment.

Grading Criteria

This part of the assignment will be scored out a maximum of 5 points. Each visualization will be worth 1 point, and we will grade you based on how well you justify why your determination that the visualization was white hat or black hat.