What sort of perspective do you want w/ the analysis?
- informative vs influencing action
- exposing areas of improvement vs highlighting strengths
- community impact vs business impact
Data science / ML workflow
- codify prolems & metrics
- data collection & cleaning
- feature engineering
- model training & tuning
- model validation
- model deployment
- monitoring / validation
you may know how many contributors you have over time. You can quantify them into active vs drifting (e.g. leaving) or repeat vs fly-by (<4 contributions) contributors.
Commits over time (100 commits 1 month, 40 the next) ->
- depth of commits over time (e.g. LOC change over time)
- commits by subset of contributors (is this being maintained by a super small group of folks)
“Community campaign impact measurement”
- establish goals <-> determine what can be measured to detect impact
Step 1:: break down focus area
- if you could answer anything, what would it be? (e.g. magic 8ball question)
- what will be the data source?
- given the data, which questions could be answered to bring you closer to your 8ball question
Step 2: converting a question to a metric
- Specific data points needed
- visualization to represent the data
- Make a hypothesis re: insights and actions that will come from this
Step 3: analysis in action
- does this align with prior knowledge?
- If so, double-check your assumptions so that your bias wasn’t included.
- If not, did we misunderstand the community? Or was there a data error?
= implement community initiatives informed by the data analysis. should be measurable
- observe community iniatives
- are we measuring the right thing?
- does the initiative need to change?
Examples:
- 8 ball: are people having an experience that converts many to be consistent contributors?
- data: contributor activity on repos w/ timestamps
- metrics:
- contributors first action
- fly by vs repeate contributor
- conversion rate from first to active/repeat contributor
- Initiatives:
- contribution guidelines improve
- PR “buddy for contributors first PR.
Limitations:
- numbers/metrics are not facts. We an make them say things.
- Keep the internal skeptive alive.
- “the graph points you in a direction to investigage” is a win.
- data analysis is not always the answer. This can help