Open Source Summit

What sort of perspective do you want w/ the analysis?

  • informative vs influencing action
  • exposing areas of improvement vs highlighting strengths
  • community impact vs business impact

Data science / ML workflow

  • codify prolems & metrics
  • data collection & cleaning
  • feature engineering
  • model training & tuning
  • model validation
  • model deployment
  • monitoring / validation

you may know how many contributors you have over time. You can quantify them into active vs drifting (e.g. leaving) or repeat vs fly-by (<4 contributions) contributors.

Commits over time (100 commits 1 month, 40 the next) ->

  • depth of commits over time (e.g. LOC change over time)
  • commits by subset of contributors (is this being maintained by a super small group of folks)

“Community campaign impact measurement”

  • establish goals <-> determine what can be measured to detect impact

Step 1:: break down focus area

  • if you could answer anything, what would it be? (e.g. magic 8ball question)
  • what will be the data source?
  • given the data, which questions could be answered to bring you closer to your 8ball question

Step 2: converting a question to a metric

  • Specific data points needed
  • visualization to represent the data
  • Make a hypothesis re: insights and actions that will come from this

Step 3: analysis in action

  • does this align with prior knowledge?
    • If so, double-check your assumptions so that your bias wasn’t included.
    • If not, did we misunderstand the community? Or was there a data error?

= implement community initiatives informed by the data analysis. should be measurable

  • observe community iniatives
    • are we measuring the right thing?
    • does the initiative need to change?

Examples:

  • 8 ball: are people having an experience that converts many to be consistent contributors?
  • data: contributor activity on repos w/ timestamps
  • metrics:
    • contributors first action
    • fly by vs repeate contributor
    • conversion rate from first to active/repeat contributor
  • Initiatives:
    • contribution guidelines improve
    • PR “buddy for contributors first PR.

Limitations:

  • numbers/metrics are not facts. We an make them say things.
  • Keep the internal skeptive alive.
  • “the graph points you in a direction to investigage” is a win.
  • data analysis is not always the answer. This can help