Figuring out owners for github repos

I had a task to find out who owned a pile of GitHub repos. Backing out that information across many repositories wasn’t so clear. To address this, I queried for who belonged to which team, then who contributed to which repos. With that information in hand, we can work out which teams “own” which repos.

Getting repos to contributors

REPOS=$(gh api \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  /orgs/$org/repos --paginate | jq -r '.[] | .name' | sort)
 
for repo in $REPOS; do
    echo $repo
    for user in $(gh api \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  "/repos/$org/$repo/contributors" --paginate | jq -r '.[] | .login' | sort); do
        echo "\t$user"
    done
    sleep 1
done

Getting team membership

TEAMS=$(gh api \
  -H "Accept: application/vnd.github+json" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  --paginate \
  /orgs/$org/teams | jq -r '.[] | .name')
 
for team in $TEAMS; do
    echo "$team"
    for member in $(gh api -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" --paginate "/orgs/$org/teams/$team/members" | jq -r '.[] | .login'); do
        echo "\t$member"
    done
done

Getting the mapping

Repo contributors goes to repo-to-contributor.txt. The list of teams to members lives in teams.txt. A list of repositories you want to look at are in unmapped-repos.txt (one repo name per file).

from collections import defaultdict, Counter
 
with open('unmapped-repos.txt') as f:
    unmapped = f.read().split()
 
with open('repo-to-contributor.txt') as f:
    raw = f.read()
 
def map_from_outline(s):
    _mapping = defaultdict(list)
    current = None
    for line in s.split('\n'):
        if line.strip() == '':
            continue
        if line.startswith("	"):
            # person
            _mapping[current].append(line.strip())
 
        else:
            current = line
    return _mapping
 
 
mapping = map_from_outline(raw)
all_users = Counter()
for repo, people in mapping.items():
    all_users.update(people)
 
with open('teams.txt') as f:
    team_setup = map_from_outline(f.read())
 
 
people_teams = defaultdict(list)
for k,vs in team_setup.items():
    for v in vs:
        people_teams[v].append(k)
 
 
people = dict({x:y for x,y in all_users.items() if y > 1})
people = sorted(people.items(), key=lambda x: -x[1])
 
guess = defaultdict()
for repo, people in mapping.items():
    if repo not in unmapped:
        continue
    cnt = Counter()
    for p in people:
        cnt.update(people_teams.get(p))
    guess[repo] = dict(cnt.most_common(2))
 
 
from pprint import pprint
pprint(dict(guess))
 
print("\n\n")
print("No guesses for ownership:")
for k, v in guess.items():
    if len(v) == 0:
        print(f"\t{k}")
 
print("\n\n")
seen = set()
 
print("Multiple guesses for ownership:")
for k, v in guess.items():
    if len(v) > 1:
        print(f"\t{k}: {v}")
 
print("Folks who should own it:")
for k, v in guess.items():
    if len(v) == 1:
        print(f"\t{k}: {v}")

The notes of Justin Abrahms

Recently updated

Twyman's Law

Trustworthy Online Controlled Experiments

Threats to internal validity in experiments

Explorer

Figuring out owners for github repos

Getting repos to contributors

Getting team membership

Getting the mapping

Graph View

Table of Contents

Backlinks