You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+8-6Lines changed: 8 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,14 +45,16 @@ This work is an *initial* demonstrator delivered at month 18 of the [OPENNEXT](h
45
45
46
46
Work package 2 of OPENNEXT is gathering theoretical and practical insights on best practices for company-community collaboration when developing open source hardware. This includes running [Delphi studies](https://www.edelphi.org/) to develop a maturity model to describe the collaboration and developing a precise definition for what the "source" is in open source hardware. In particular, task 2.2 in this work package is developing a project status dashboard with "health" indicators showing the evolution of a project within the maturity model; design activities; or progress towards success based on project goals.
47
47
48
-
To that end, the month 18 deliverable for task 2.2 is focused on establishing the underlying infrastructure to mine data about open source hardware projects from version control repositories that they are hosted on (`osmine`). The Python scripts in this repository currently query the public [application programming interfaces](https://en.wikipedia.org/wiki/API) (APIs) of [GitHub](https://www.github.com/) and [Wikifactory](https://www.wikifactory.com/). Both host version control repositories with the latter having a focus on supporting open source hardware projects. There is also a user-facing demonstration dashboard (`osdash`) which computes core metrics from the the mined data and presents interactive visualisations. Currently, post-month-18 development is envisaged to include, but not limited to:
48
+
To that end, the month 18 deliverable for task 2.2 is focused on establishing the underlying infrastructure to mine data about open source hardware projects from version control repositories that they are hosted on (`osmine`). The Python scripts in this repository currently query the public [application programming interfaces](https://en.wikipedia.org/wiki/API) (APIs) of [GitHub](https://www.github.com/) and [Wikifactory](https://www.wikifactory.com/). Both host version control repositories with the latter having a focus on supporting open source hardware projects. There is also a user-facing demonstration dashboard (`osdash`) which computes core metrics from the the mined data and presents interactive visualisations. Currently, post-month-18 development is envisaged to include, but not limited to (elaborated in [Future work](#future-work) section):
49
49
50
50
* Modules to query other platforms such as [GitLab](https://gitlab.com/) or generic Git repositories;
51
51
* Logging
52
52
* Network visualisations of file co-edition histories and participation in tickets (e.g. GitHub Issues) with cluster analyses
53
53
* Compute indicators for the dashboard derived from the company-community collaboration maturity model under development
54
54
* Validate with OPENNEXT SME partners
55
55
56
+
There are other excellent open source software for open source project analytics and data visualisation, with [Grimoirelab](https://chaoss.github.io/grimoirelab/) being a prime example. However, the full Grimoirelab pipeline requires a full server stack necessitating advanced skills in heavy-duty (but potentially complicated) web technologies such as [Kibana](https://www.elastic.co/products/kibana) or [Elastiserach](https://www.elastic.co/products/elasticsearch). This project aims to create a lighter, more focused solution needing only the use of Python.
57
+
56
58
This documentation aims to demonstrate practices that facilitate design reuse, including of this repository. In addition to the [Install](#install) and [Usage](#usage) sections that increase reproducibility, [Design notes](#design-notes) and [Future work](#future-work) communicate the thought process and lessons-learned while developing the dashboard. Together, they constitute an intangible body of "know-how" that is very often undocumented. For example, motivations for the internal data model or the approach to compressing data a the end of the section [Internal data structure](#internal-data-structure) which reduces disk usage are of practical benefit. But "snippets" of practical experience like these are seldom recorded.
57
59
58
60
In addition, this repository aims to follow international standards and good practices in open source development such as, but not limited to:
@@ -214,7 +216,7 @@ The rest of this section describes the thought-process and decisions made when d
214
216
215
217
### Internal data structure
216
218
217
-
The GitHub and Wikifactory APIs respond to queries with JSON-formatted data strings. However, the data fields and how they are structured differ between the two platforms. To standardise the internal data structure used by `osmine` and `osdash`in a non-arbitrary way, the ForgeFed data model was used as a starting point. [ForgeFed](https://forgefed.peers.community/) is a new "[federation](https://en.wikipedia.org/wiki/Federation_(information_technology)) protocol for enabling interoperability between version control services". Relevant to the work in this repository is the standard [ForgeFed data model](https://forgefed.peers.community/modeling.html) that describes the essential elements of a version control repository. This includes:
219
+
The GitHub and Wikifactory APIs respond to queries with JSON-formatted data strings. However, the data fields and how they are structured differ between the two platforms. To standardise the internal data structure used by `osmine` and `osdash`in a non-arbitrary way, the [ForgeFed data model](https://forgefed.peers.community/modeling.html) was used as a starting point. [ForgeFed](https://forgefed.peers.community/) is a new "[federation](https://en.wikipedia.org/wiki/Federation_(information_technology)) protocol for enabling interoperability between version control services". Relevant to the work in this repository is the standard [ForgeFed data model](https://forgefed.peers.community/modeling.html) that describes the essential elements of a version control repository. This includes:
218
220
219
221
*`Repository` - Basic information about the whole repository.
220
222
*`Branch` - A named reference to a point along a version-controlled repository's revision history. In practice, this allows developers to work on different parts of a complex software project which can later be "[merged](https://git-scm.com/book/en/v2/Git-Branching-Basic-Branching-and-Merging)" into a primary branch. This is implemented in Git (and hence GitHub) but not currently by Wikifactory.
@@ -276,6 +278,8 @@ One design goal of `osmine` is modularity in the supported version control platf
276
278
277
279
The most widely used Python scientific plotting libraries such as [Matplotlib](https://matplotlib.org/) are focused on producing static images foroffline viewing. By design, the `osdash` dashboard is to be delivered online with interactive and dynamic data visualisations. To that end, we decided to utilise the [Dash](https://dash.plotly.com/) framework which was created for building "web analytic applications". Dash allows the creation of web applications with complex, dynamic, and interactive data visualisations arranged on a unified canvas using Python. These tasks were traditionally done with web-focused programming languages such as HTML, [CSS](https://www.w3.org/Style/CSS/), or [Javascript](https://en.wikipedia.org/wiki/JavaScript). Dash allowed us to conduct all developmentin Python, the only language we are familiar with, thereby saving substantial time and effort. Other benefits of Dash are its maturity, active development, and strong community support.
278
280
281
+
In addition, the Dash framework includes components (such as [Cytoscape](https://dash.plotly.com/cytoscape)) for visualising and interacting with network graphs. This was an important factor when we chose Dash as the basis of our `osdash` dashboard module as our future work aspires to implement these visualisations.
282
+
279
283
Notably, a live Dash web app automatically reloads when the underlying Python code is changed. This saves time during development since we can edit the code of the `osdash` module and changes (or errors) are immediately reflected in the web browser. Lastly, the Dash framework is a piece of commercially successful open source software. This aligns with the theme of OPENNEXT which is to study models of commercially-produced open source hardware.
280
284
281
285
There is no strict limitation on where Dash web apps can be hosted. However, since the [demo instance](https://psaltyi.pythonanywhere.com/) of the `osdash` module is currently hosted on [PythonAnywhere](https://eu.pythonanywhere.com/), the version numbers of the dependencies forthis repository listedin [`requirements.txt`](./requirements.txt) are based on [those offered](https://eu.pythonanywhere.com/batteries_included/) by their default Python 3.8 environment.
@@ -292,8 +296,6 @@ To our knowledge, the most promising solution at time of writing is to use Git's
292
296
293
297
It should also be noted that a graph of community interactions need not be based exclusively on file co-edition histories. Since we also mine data on tickets (i.e. GitHub and Wikifactory issues), user interactions there can also form edges in the graph. Including these activites could form a more complete representation of community structure.
294
298
295
-
In addition, the Dash framework includes components (such as [Cytoscape](https://dash.plotly.com/cytoscape)) for visualising and interacting with network graphs. This was an important factor when we chose Dash as the basis of our `osdash` dashboard module.
296
-
297
299
### Enhanced project health indicators
298
300
299
301
After month 18, we hope to gather feedback from the OPENNEXT SMEs producing open source hardware on which indicators they believe would aid their development process. Certain indicators may require development of methods to acquire deeper insights from repository metadata. For example, by studying the evolution of a large number of open source hardware repositories, we might see typical structures that reflect different stages of development. This could be used to derive a "project stage" indicator showing if a project is in the ideation, prototyping, or production stages.
@@ -308,7 +310,7 @@ There could also be indicators that require self-reporting. While existing files
308
310
309
311
Task 3.3 in the OPENNEXT project is deploying a [Wikibase](https://www.wikiba.se/) graph database to record metadata on open source hardware products. This database will implement a new, standardised data model – partially derived from the [Open Know-How Manifest specification](https://app.standardsrepo.com/MakerNetAlliance/OpenKnowHow/src/branch/master/1) – for describing all aspects of a product such as its bill of materials (BOM) or manufacturing information (e.g. materials, production method). We plan to collaborate with task 3.3 by contributing our data-mining efforts while benefiting from the web hosting that they will set up. If `osdash` can be hosted with the Wikibase instance, then we can replace the current PythonAnywhere hosting which is running on a lower-performance plan.
310
312
311
-
As noted above, `osmine` saves the timestamp of when each repository was mined. During development, it was realised that since it can take several minutes to mine a repository, it is conceivable that new commits or tickets were added during that time. Therefore, the "last mined" timestamp is currently set to the end of the mining process. This would avoid future runs of `osmine` to retrieve data that have previously been saved. However, if new commits or tickets are indeed created when mining, then they will be missed in future runs. To solve this, an upcoming revision of `osmine` will set the "last mined" timestamp to the beginning of the run and check for duplicated results against previously saved data.
313
+
As noted above in the [Usage](#usage) section, `osmine` saves the timestamp of when each repository was mined. During development, it was realised that since it can take several minutes to mine a repository, it is conceivable that new commits or tickets were added during that time. Therefore, the "last mined" timestamp is currently set to the end of the mining process. This would avoid future runs of `osmine` to retrieve data that have previously been saved. However, if new commits or tickets are indeed created when mining, then they will be missed in future runs. To solve this, an upcoming revision of `osmine` will set the "last mined" timestamp to the beginning of the run and check for duplicated results against previously saved data.
312
314
313
315
Other technical improves may include, but are not limited to:
314
316
@@ -318,7 +320,7 @@ Other technical improves may include, but are not limited to:
318
320
* Optionally support using task 3.3's Wikibase instance as data storage instead of a local data file.
319
321
* If resources permit, incorporate [unit testing](https://en.wikipedia.org/wiki/Unit_testing) across all Python code in this repository. This is a software testing technique where dedicated testing code is written for all elements of a program. Comprehensive unit testing will take an intensive effort to implement but greatly improve the reliability and long-term maintainability of the code.
320
322
* Further improve documentation by:
321
-
*Creating dedicated documentation beyond this README file.
323
+
*If it is helpful for OPENNEXT SMEs or other open source development communities, additional documentation could be created beyond this README
322
324
* Improve on the current README file by attaining level four or five in the [README Maturity Model](https://github.com/LappleApple/feedmereadmes/blob/master/README-maturity-model.md#level-five-product-oriented-readme).
0 commit comments