[Greenhouse] Structuring an experiment

Tue Jul 9 01:12:29 UTC 2013

Hey all,

One goal of the greenhouse project is to have some sort of quantitative 
basis to answer the question, "Does getting in touch with contributors 
improve their activity level for the project?"

To do that, I wrote up a document, and got some feedback from a friend.

Editable version here: https://etherpad.mozilla.org/mentor-research

Dave, I think it makes sense for it to be your job to make sure that 
document stays up to date with our plan over the summer as things change.

For convenience, here is that doc exported to text:

The purpose of this document is to be a statement of the
research goals for mentorship in Debian.

The reason I am writing it up is so that Dave and Asheesh and 
anyone else interested can review, with great clarity, our plan, 
and provide feedback about how to improve the plan.

Hypothesis: When Debian contributors get contacted within some 
short amount of time (say, 1 week) of their first successful 
package upload, by someone other than the person who sponsored 
their package, they are going to be characteristically more 
active within Debian. We should be able to detect the increased 
activity as:

  * For a package maintained by that contributor, the average 
time between bug filing and the contributor's first response to 
the bug will be lower than for other non-contacted contributors.
    * We can measure as soon as we start ensuring we contact 
contributors by looking at bugs on their packages.
  * They are more active in a wider range of tasks within Debian 
than non-contacted contributors.
    * One measure of this is if they participate by uploading a 
greater variety of packages than non-contacted contributors.
    * We can measure this if we have data inputs to the tool from

  * They become Debian developers more speedily than 
non-contacted contributors.
    * Note that this will be difficult to measure in any kind of 
speedy fashion, since it takes 1-5 years to go from "Did first 
upload" to "is a Debian developer".
  * They stay active in Debian for longer, on average, than 
non-contacted contributors.
    * Similarly difficult to measure.
  * Other ideas about quantifiable definition of "increased 
activity" welcome!

In the past, we had thought about opting-in only about 80% of 
the non-DD debian package maintainers.

I want to us to talk to someone well-versed in statistics, such 
as mako, about precisely what sorts of statistical measures are 
appropriate to the above questions, and then we should do a 
bunch of research to find out what our baseline is so that we 
can calculate the sample size we need to measure the effect we 
expect at the confidence we want.

Glossary:

  * Debian: a project to create a freely redistributable and 
modifiable operating system, based on existing free/open source 
software components.
  * Debian contributor: someone who does some work with the 
intent that the work helps Debian achieve its goals.
  * Package: ...
  * Sponsor of a package: Since within Debian,
  * contacted contributor: Someone who the mentorship team has 
contacted

List of relevant people:

  * Asheesh Laroia, mentor for Dave
  * Dave Lu, student doing most of the programming and the like
  * Nathan Yergler, a friend of Asheesh's who is willing to 
provide Django advice
  * Mako Hill, communications faculty member at University of 
Washington willing to advise on statistics

List of possible people:
  * Chris Chan, friend of Asheesh and statistics-savvy

Things to fix in this document:
  * There might be other ways we can detect increased/decreased 
activity.
  * We need to look into statistical measures.
  * We should write more about our planned methods.
  * We should write something about ethics, if we really have to.

Semi-finally, here is some feedback I got from a friend.

Preeya:
     this is actually very similar to things I used to do! :)
Asheesh:
     ooohhh
Preeya:
     (your debian package thing)
Asheesh:
     Can you help us making it not be full of fail?
Preeya:
     sure!
Preeya:
     My immediate reactions:
     - how are you going to choose who's in the test and control groups?
     - /how are you going to ensure that those groups aren't meaningfully
     different in ways that could confound this?
     - there is (I think open-source) software that will help you figure
     out your sample size for a given statistical power if you can estimate
     the effect size
Preeya:
     Also, are people going to know that they are part of your study?
     because that would definitely confound the results
Preeya:
     Also, what is your time for this study? I think you either need to
     set an established time limit for measuring data (e.g., you do this
     with people and track them for a year, and maybe after 5 years you
     are done? I don't know how quickly new debian developers show up) or
     you may want to look at rates of these things happening instead of
     raw numbers, which changes the statistical tests you'll want to use.
Preeya:
     Although even with rates you need to establish a time limit, I guess,
     so actually never mind that.
Asheesh:
     Yeah, thinking of a 2 or 3 month initial time limit... omg I hate
     making decisions.
Preeya:
     You should probably figure out, like, the rate of new people
     submitting packages.
Preeya:
     Then that along with the needed sample size will help you figure out
     how long this might take
Preeya:
     Overall though, this seems like a fairly simple design -- you have two
     groups, you have some measurable and continuous outcomes. I think your
     biggest problem will be figuring out confounding factors to account
     for.
Preeya:
     (which, tbf, is always the biggest problem in social science)
Preeya:
     A popular way to solve this is to use multivariate regression to
     mathematically account for confounding factors. I'm not sure if this
     would be totally appropriate for your thing; I bet mako or maybe
     Chris would actually know more.

Finally, I think here are the action items (Dave, it's up to you to edit 
our research doc accordingly):

* Write out the t he specific measures we want in the Google Doc, and 
where we'll get data for them.

* Estimate their current values, before our intervention. This way, we 
make a guess as to what kind of effect we expect, and then use 
"statistical power" tests to estimate the required sample size.

* Write up (in the doc) how we plan to create a control/experiment group, 
which is (for readers' sake) that as mentors mark people as contacted, we 
then provide add a new person's data to the data that the system displays, 
but we pick randomly which person to show. This way, if mentors have 
enough bandwidth to reach everyone, there is no control group; and if they 
don't have enough bandwidth, the people they contact are randomly chosen.

-- Asheesh.