Skip to main content

Collaboration Development Index

Open source projects, as a typical manifestation of human group intelligence, the ability to establish collaborative development management is a key element contributing to the success of the project. And code, as the final output of a project, is the essence of the entire community's contribution. So we evaluate how well the development process is managed and how well the community does collaborative development around a series of indirect metrics related to code contribution.

Metrics in the Metrics Model

Code Contributor Count

  • Definition: Determine how many active pr creators, code reviewers, commit authors there are in the past 90 days.
  • Weight: 19.987%
  • Threshold: 1000

Here we focus on the number of contributors directly related to the code contribution. As a result of this model, it identifies a well-managed community of development collaborations that are gathering more and more active code contributors. We believe that some types of the Issue are also related to code contributions, such as Bug , new requirements, and so on, and will eventually introduce code contributions. But as a general indicator, it is difficult to identify issues in a uniform way (such as a Bug type) within the specific type of each project, because each project has a different definition and understanding of the Bug type. So we made a trade-off, choosing only PR-related and Code Commit-related contributors as insight objects.

Commit Frequency

  • Definition: Determine the average number of commits per week in the past 90 days.
  • Weight: 16.363%
  • Threshold: 1000

As an outcome indicator of this model, it identifies the sustainability and quantity of project contributions. Refers to the overall workload of the project.

Is Maintained

  • Definition: Percentage of weeks with at least one code commit in the past 90 days(single repository). Percentage of code repositories with at least one code commit in the last 30 days(multiple repositories).
  • Weight: 13.853%
  • Threshold: 100%
  • Note: definitions of single repository and multiple repositories are differerent.

This metric is used to determine whether the repository is being maintained on an ongoing basis. An actively maintained projects may be more resistant to risks such as vulnerabilities. It is related to the Commit Frequency, but the focus is different. The former focuses on the number of code contributions during the cycle, while the latter focuses on the continuity of the code contributions. But we should not draw any direct conclusions about projects that are low on this indicator, and further insights are needed.

Commit PR Linked Ratio

  • Definition: Determine the percentage of new code commit link pull request in the last 90 days.
  • Weight: 12.612%
  • Threshold: 100%

This indicator is used to determine whether the code submission has gone through the PR process. To determine whether a project is open source, perhaps we can verify the License it declares. But whether open source projects use open community collaboration model to develop, and whether the community is willing to build community with open source contributors and organizational partners, it is not enough to rely on License alone. Here we examine the project's willingness to collaborate by informing contributors in an open and transparent manner about the purpose of the code submission, whether it is a PR or not, and by being reviewed.

PR Issue Linked Ratio

  • Definition: Determine the percentage of new pull request link issues in the last 90 days.
  • Weight: 11.319%
  • Threshold: 100%

This metric is used to see if the code contributions are based upon open discussion, for example using the Issue. Note that not all issues result in code contributions, such as an Issue for the advisory type, there would be no code modifications generally . At the same time, we should also note that if PR goes through public discussion, Issue is not the only way, it may come from discussion in the forum. Therefore, we can not blindly pursue the high proportion of this indicator.

Code Review Ratio

  • Definition: Determine the percentage of code commits with at least one reviewer (not PR creator) in the last 90 days.
  • Weight: 10.113%
  • Threshold: 100%

If a PR is merged without being reviewed by others, the probability of a code defect or vulnerability being introduced, intentionally or unintentionally, is greatly increased. Code reviews do not fully protect against this risk, but they greatly improve the introduction of risk.

Code Merge Ratio

  • Definition: Determine the percentage of PR Mergers and PR authors who are not the same person in the last 90 days of commits.
  • Weight: 10.113%
  • Threshold: 100%

This metric was observed in conjunction with the Code Review Ratio. When we introduced a third-party review, but if the PR creator intentionally or unintentionally ignored the third-party review and merged the code directly, there are also risks. It's important to note that the Code Review Ratio and Code Merge Ratio are one best practices we observe in opensource communities, but rather the only standards. We also see in some communities based on trust in some good, long-term contributors, these people are granted the right to merge their own code direclty.

Lines of Code Frequency

  • Definition: Determine the average number of lines touched (lines added plus lines removed) per week in the past 90 days.
  • Weight: 5.640%
  • Threshold: 300000

The number of lines of source code is indeed strongly correlated with the amount of work, but it is less correlated with the value created. What forms of code can be counted in the LOC, are uncertain factors. We do not care so much about the code form(program language, configuration files etc), only use it as the description of the workload, so its weight in the overall metrics model is low.

Metric Model Algorithm

Weight

We use AHP to calculate weight of each metric.

AHP Input Data

Metric NameCode Contributor CountCommit FrequencyIs MaintainedCommit PR Linked RatioPR Issue Linked RatioCode Review RatioCode Merge RatioLines of Code Frequency
Code Contributor Count1.0001.1111.2501.4291.6672.0002.0005.000
Commit Frequency0.9001.0001.2501.2501.4291.6671.6672.500
Is Maintained0.8000.8001.0001.1111.2501.4291.4292.000
Commit PR Linked Ratio0.7000.8000.9001.0001.1111.2501.2502.000
PR Issue Linked Ratio0.6000.7000.8000.9001.0001.1111.1112.000
Code Review Ratio0.5000.6000.7000.8000.9001.0001.0002.000
Code Merge Ratio0.5000.6000.7000.8000.9001.0001.0002.000
Lines of Code Frequency0.2000.4000.5000.5000.5000.5000.5001.000

AHP Analysis Result

Metrics NameEigenvectorWeight
Contributor Count1.59919.987%
Commit Frequency1.30916.363%
Is Maintained1.10813.853%
Commit PR Linked Ratio1.00912.612%
PR Issue Linked Ratio0.90611.319%
Code Review Ratio0.80910.113%
Code Merge Ratio0.80910.113%
Lines of Code Frequency0.4515.640%

Consistency Test Results

Largest EigenvalueCI ValueRI ValueCR ValueConsistency Test
8.0340.0051.4100.003PASS

Threshold

The threshold we chose is based on the big-data observations from different types of open source projects.

References

Contributors

Frontend

  • Shengxiang Zhang
  • Feng Zhong
  • Chaoqun Huang
  • Huatian Qin
  • Xingyou Lai

Backend

  • Yehui Wang
  • Chenqi Shan
  • Shengbao Li
  • Huatian Qin

Metric Model

  • Yehui Wang
  • Liang Wang
  • Chenqi Shan
  • Matt Germonprez
  • Sean Goggins
  • Vinod Ahuja

Copyright © 2023 OSS compass. All Rights Reserved.