Study Protocol (60 min.)

Note

Actions made by the study investigator
Script read to participants

Pre-study preparation

Make sure the scoring rule explanation document is in a tab for the participant.

Introduction (3 min.)

Hi, I’m __________ and I’ll be leading the study today.

Confirm that the participant filled out the consent form. If they have not, ask if they have any questions before filling it out.
Change participant’s zoom name to an anonymous participant ID and turn off their video.
Ask to record.

In this study you will be given a set of visualizations and asked to answer some questions about the data. Your goal is to answer questions about the data as accurately as possible. While you will earn $25 for completing the study alone, you can earn additional payment during the study. There is an upper bound of $30 on the additional payment—please note that it is an upper bound, so you may only receive a portion of that payment depending on your answers.

Before we begin, I’ll give you some additional background related to the task and walk you through a tutorial question. You can ask me questions along the way.

Tutorial (7 min.)

Oftentimes data analyses are conducted on sensitive or private data (for example, health data). Today I’ll show you a tool for analyzing such data in a privacy-preserving manner.

Specifically, the tool displays privacy-preserving graphs of some data. These graphs are created by injecting statistical noise into the data to protect confidentiality. So, each graph shows the data with some amount of noise added, meaning the data you see will be somewhat inaccurate. We think of each graph as a “measurement” of the true data.

In conducting a data analysis, there are often questions you want to answer. How much you care about the accuracy of the answers you get might differ across questions. Therefore, the tool gives you the option of “remeasuring” a graph to get slightly better accuracy. Each time you remeasure, you will increase the accuracy of the graph.

Share screen (for the interface).

This is a tutorial question. In the main study you’ll be asked to answer multiple questions, but for now let’s look at just this one. The question is listed at the top and below you’ll find two visualizations that will help you answer the question. This dataset is based on the Census Current Population Survey.

The errors represent the root mean squared error (RMSE) of the estimate. These can help you get a sense of how far from the actual value the estimate is.

You can select bars on either visualization to filter the data. Clicking a bar on the left means that the visualization on the right will only show data for the clicked bar. You can clear the filter either by clicking that bar again, or by clicking this “clear filter” button. Note: You can filter for the left AND right.

I’ll pause here to let you think about how you might answer this question.

To improve accuracy of the estimates, you can “remeasure” the visualization. The investigator clicks the remeasure button Notice that there are now gray error bars beneath the solid black error bars. These represent previous errors before remeasurement. Currently they are centered around the new estimate.

However, if you want to see the previous errors centered around the previous estimate, you can use this toggle button.

You can also change the y-axis scales to get a more “zoomed in” view of the data.

You can hover bars to get exact numerical information about estimates and errors.

Notice that we specify how to provide an answer. In this question we ask for an interval in which you are 95% confident that the true answer lies.

Give participant remote control of screen and allow them to test out the tutorial question.

Do you have any questions?

Tasks (45 min.)

Now we’ll move on to the main portion of this study. There will be three sections. Each section will have four questions, and an upper bound of $10 you can earn. Each question is worth the same amount. We’ll spend about 10-12 minutes on each section. There isn't a strict time limit on sections but I'll be giving you reminders about how much time you should plan to spend before wrapping up so you finish the study on time.

Each section will ask you questions about a different dataset. Your task is to answer these questions as accurately as possible using a limited number of remeasures – you’ll get six remeasures per section. You may allocate remeasures across visualizations however you like. For example, you can spend all remeasures on only one question, or you could split them more evenly across questions.

There will be two question types that you’ll see, quantitative and binary. Quantitative questions will be like what you saw in the tutorial (“how many…”?) while binary questions will ask for the probability you believe the answer is yes vs no. For quantitative questions make sure your answer is within 0, 1000 because there are only 1000 records in each dataset. And for binary make sure the probabilities sum to 1.

Explain scoring rules at a high level & point them to the additional document

Your goal is to decide how to use your remeasures so as to get the best possible score, and therefore increase the additional payment you receive.

These tasks are challenging, so we don’t expect anyone to answer everything perfectly. Just do your best. Please also think aloud as you complete the tasks. It’s especially helpful for us to hear your decision-making process. If you are struggling with something, please talk through it. I’ll also keep reminding you to think aloud during the task.

Participant completes tasks on three blocks (Census, Diabetes, Student). The order of blocks was counterbalanced across participants. Questions were randomly ordered in each block.

On the first block I want to remind you that you can look over all the questions first before spending remeasures, if you’d like.

Exit interview (5 min.)

Describe your strategy for deciding where to allocate remeasures.
How well do you think you did on the task?
How would you rate your confidence in your answers on a scale from 0 to 100, where 100 is the most confident?
How much of the $30 bonus do you think you earned?
Briefly describe which parts of the interface you found challenging to use (and didn't seem to help you with the task).
Briefly describe which parts of the interface you found easy to use (and helped you with the task).
What information would you have liked the interface to have included to better support you with the task?
If you were to re-do the task, what would you do differently?
How would you describe your level of familiarity with differential privacy?

Scale

1 – Not at all familiar
2 – Slightly familiar
3 – Somewhat familiar
4 – Moderately familiar
5 – Extremely familiar

Briefly describe what kind of analysis do you tend to do or have a background in.

Provide info on when gift card will be sent.