[Greenhouse] Structuring an experiment
Asheesh Laroia
asheesh at asheesh.org
Tue Jul 9 01:12:29 UTC 2013
Hey all,
One goal of the greenhouse project is to have some sort of quantitative
basis to answer the question, "Does getting in touch with contributors
improve their activity level for the project?"
To do that, I wrote up a document, and got some feedback from a friend.
Editable version here: https://etherpad.mozilla.org/mentor-research
Dave, I think it makes sense for it to be your job to make sure that
document stays up to date with our plan over the summer as things change.
For convenience, here is that doc exported to text:
The purpose of this document is to be a statement of the
research goals for mentorship in Debian.
The reason I am writing it up is so that Dave and Asheesh and
anyone else interested can review, with great clarity, our plan,
and provide feedback about how to improve the plan.
Hypothesis: When Debian contributors get contacted within some
short amount of time (say, 1 week) of their first successful
package upload, by someone other than the person who sponsored
their package, they are going to be characteristically more
active within Debian. We should be able to detect the increased
activity as:
* For a package maintained by that contributor, the average
time between bug filing and the contributor's first response to
the bug will be lower than for other non-contacted contributors.
* We can measure as soon as we start ensuring we contact
contributors by looking at bugs on their packages.
* They are more active in a wider range of tasks within Debian
than non-contacted contributors.
* One measure of this is if they participate by uploading a
greater variety of packages than non-contacted contributors.
* We can measure this if we have data inputs to the tool from
* They become Debian developers more speedily than
non-contacted contributors.
* Note that this will be difficult to measure in any kind of
speedy fashion, since it takes 1-5 years to go from "Did first
upload" to "is a Debian developer".
* They stay active in Debian for longer, on average, than
non-contacted contributors.
* Similarly difficult to measure.
* Other ideas about quantifiable definition of "increased
activity" welcome!
In the past, we had thought about opting-in only about 80% of
the non-DD debian package maintainers.
I want to us to talk to someone well-versed in statistics, such
as mako, about precisely what sorts of statistical measures are
appropriate to the above questions, and then we should do a
bunch of research to find out what our baseline is so that we
can calculate the sample size we need to measure the effect we
expect at the confidence we want.
Glossary:
* Debian: a project to create a freely redistributable and
modifiable operating system, based on existing free/open source
software components.
* Debian contributor: someone who does some work with the
intent that the work helps Debian achieve its goals.
* Package: ...
* Sponsor of a package: Since within Debian,
* contacted contributor: Someone who the mentorship team has
contacted
List of relevant people:
* Asheesh Laroia, mentor for Dave
* Dave Lu, student doing most of the programming and the like
* Nathan Yergler, a friend of Asheesh's who is willing to
provide Django advice
* Mako Hill, communications faculty member at University of
Washington willing to advise on statistics
List of possible people:
* Chris Chan, friend of Asheesh and statistics-savvy
Things to fix in this document:
* There might be other ways we can detect increased/decreased
activity.
* We need to look into statistical measures.
* We should write more about our planned methods.
* We should write something about ethics, if we really have to.
Semi-finally, here is some feedback I got from a friend.
Preeya:
this is actually very similar to things I used to do! :)
Asheesh:
ooohhh
Preeya:
(your debian package thing)
Asheesh:
Can you help us making it not be full of fail?
Preeya:
sure!
Preeya:
My immediate reactions:
- how are you going to choose who's in the test and control groups?
- /how are you going to ensure that those groups aren't meaningfully
different in ways that could confound this?
- there is (I think open-source) software that will help you figure
out your sample size for a given statistical power if you can estimate
the effect size
Preeya:
Also, are people going to know that they are part of your study?
because that would definitely confound the results
Preeya:
Also, what is your time for this study? I think you either need to
set an established time limit for measuring data (e.g., you do this
with people and track them for a year, and maybe after 5 years you
are done? I don't know how quickly new debian developers show up) or
you may want to look at rates of these things happening instead of
raw numbers, which changes the statistical tests you'll want to use.
Preeya:
Although even with rates you need to establish a time limit, I guess,
so actually never mind that.
Asheesh:
Yeah, thinking of a 2 or 3 month initial time limit... omg I hate
making decisions.
Preeya:
You should probably figure out, like, the rate of new people
submitting packages.
Preeya:
Then that along with the needed sample size will help you figure out
how long this might take
Preeya:
Overall though, this seems like a fairly simple design -- you have two
groups, you have some measurable and continuous outcomes. I think your
biggest problem will be figuring out confounding factors to account
for.
Preeya:
(which, tbf, is always the biggest problem in social science)
Preeya:
A popular way to solve this is to use multivariate regression to
mathematically account for confounding factors. I'm not sure if this
would be totally appropriate for your thing; I bet mako or maybe
Chris would actually know more.
Finally, I think here are the action items (Dave, it's up to you to edit
our research doc accordingly):
* Write out the t he specific measures we want in the Google Doc, and
where we'll get data for them.
* Estimate their current values, before our intervention. This way, we
make a guess as to what kind of effect we expect, and then use
"statistical power" tests to estimate the required sample size.
* Write up (in the doc) how we plan to create a control/experiment group,
which is (for readers' sake) that as mentors mark people as contacted, we
then provide add a new person's data to the data that the system displays,
but we pick randomly which person to show. This way, if mentors have
enough bandwidth to reach everyone, there is no control group; and if they
don't have enough bandwidth, the people they contact are randomly chosen.
-- Asheesh.
More information about the Greenhouse
mailing list