Add Compound proteinIdType and entity-agnostic grounding for metabolite networks by swaraj-neu · Pull Request #105 · Vitek-Lab/MSstatsBioNet

swaraj-neu · 2026-06-10T04:56:01Z

Add Compound proteinIdType and entity-agnostic grounding columns:
Generalize the protein-only HgncId/HgncName contract into EntityNamespace/EntityId/EntityName grounded through Gilda, keeping multi-grounding as semicolon-joined aligned lists that fan out into the INDRA query. Gene-only annotations are skipped for compounds, and the new contract flows through annotateProteinInfoFromIndra, getSubnetworkFromIndra, and cytoscapeNetwork.

Motivation and Context

Please include relevant motivation and context of the problem along with a short summary of the solution.

Changes

Please provide a detailed bullet point list of your changes.

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

I have read the MSstats contributing guidelines
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules
Ran styler::style_pkg(transformers = styler::tidyverse_style(indent_by = 4))
Ran devtools::document()

Generalize the protein-only HgncId/HgncName contract into EntityNamespace/EntityId/EntityName grounded through Gilda, keeping multi-grounding as semicolon-joined aligned lists that fan out into the INDRA query. Gene-only annotations are skipped for compounds, and the new contract flows through annotateProteinInfoFromIndra, getSubnetworkFromIndra, and cytoscapeNetwork.

coderabbitai · 2026-06-10T04:56:08Z

Warning

Review limit reached

@swaraj-neu, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 39 minutes and 58 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: fbdf590d-d363-4556-8c30-1a03c1ac0a8c

📥 Commits

Reviewing files that changed from the base of the PR and between d1e6219 and 24fff4a.

⛔ Files ignored due to path filters (2)

inst/extdata/groupComparisonModel.csv is excluded by !**/*.csv
inst/extdata/groupComparisonModel_compound.csv is excluded by !**/*.csv

📒 Files selected for processing (30)

R/annotateProteinInfoFromIndra.R
R/cytoscapeNetwork.R
R/getSubnetworkFromIndra.R
R/utils_annotateProteinInfoFromIndra.R
R/utils_cytoscapeNetwork.R
R/utils_getSubnetworkFromIndra.R
man/annotateProteinInfoFromIndra.Rd
man/cytoscapeNetwork.Rd
man/dot-populateEntityIdsInDataFrame.Rd
man/dot-populateEntityNamesInDataFrame.Rd
man/dot-populateHgncIdsInDataFrame.Rd
man/dot-populateHgncNamesInDataFrame.Rd
man/dot-populateKinaseInfoInDataFrame.Rd
man/dot-populatePhophataseInfoInDataFrame.Rd
man/dot-populateTranscriptionFactorInfoInDataFrame.Rd
man/dot-populateUniprotIdsInDataFrame.Rd
man/dot-validateAnnotateProteinInfoFromIndraInput.Rd
man/exportNetworkToHTML.Rd
man/getSubnetworkFromIndra.Rd
man/previewNetworkInBrowser.Rd
tests/testthat/test-annotateProteinInfoFromIndra.R
tests/testthat/test-exportNetworkToHTML.R
tests/testthat/test-getSubnetworkFromIndra.R
tests/testthat/test-multi-grounding.R
tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R
tests/testthat/test-utils_cytoscapeNetwork.R
tests/testthat/test-utils_getSubnetworkFromIndra.R
vignettes/Cytoscape-Visualization.Rmd
vignettes/MSstatsBioNet.Rmd
vignettes/PTM-Analysis.Rmd

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch MSstatsBioNet/work/20260528_compound-id-type

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov-commenter · 2026-06-10T05:01:53Z

Codecov Report

❌ Patch coverage is 92.40506% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.13%. Comparing base (d1e6219) to head (24fff4a).

Files with missing lines	Patch %	Lines
R/utils_getSubnetworkFromIndra.R	85.71%	9 Missing ⚠️
R/utils_annotateProteinInfoFromIndra.R	90.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel     #105      +/-   ##
==========================================
+ Coverage   75.35%   77.13%   +1.77%     
==========================================
  Files           9        9              
  Lines        1047     1124      +77     
==========================================
+ Hits          789      867      +78     
+ Misses        258      257       -1

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

swaraj-neu · 2026-06-10T16:47:37Z

@coderabbitai review

coderabbitai · 2026-06-10T16:47:44Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

tonywu1999 · 2026-06-11T16:59:46Z

+#' This function annotates a data frame with entity (protein or compound)
+#' grounding information from INDRA / Gilda, plus gene-only flags
+#' (transcription factor / kinase / phosphatase) for the protein paths.


nitpick: rewrite the first part to:

"This function standardizes entity identifiers from protein, compound, or gene inputs to a unified namespace using ID conversion from INDRA cogex or Gilda grounding."

tonywu1999 · 2026-06-11T17:01:53Z

+#'   \item{GlobalProtein}{Character. The input identifier with the
+#'       MSstats mnemonic suffix stripped, used as the grounding key.}


I'd say this is the post translational modification suffix stripped, where this suffix is typically <amino acid><site number> e.g. _S148

tonywu1999 · 2026-06-11T17:13:02Z

-                        if (!is.null(nameMapping[[hgncId]])) {
-                                df$HgncName[df$HgncId == hgncId] <- nameMapping[[hgncId]]
+#' @return A data frame with populated entity names.
+.populateEntityNamesInDataFrame <- function(df) {


This function name is interesting because it only populates HGNC Names, which makes it misleading.

I'd create a new function that combines both .populateEntityIdsInDataFrame and .populateEntityNamesInDataFrame into single function. The structure would look like:

.populateEntityInformationInDataFrame

if(uniprot || uniprot_mnemonic) --> .populateEntityInformationWithIndraCogex

else --> .populateEntityInformationWithGilda

tonywu1999 · 2026-06-11T17:17:03Z

+        if (proteinIdType == "Compound") {
+                return(df)
+        }
+        validNameMask <- !is.na(df$EntityName)


I'm not sure how to handle it right now, but as a short term hack, could you also ensure there aren't any rows with semicolons as well with this mask? Similar comment for populating kinase and phosphatase information.

tonywu1999 · 2026-06-11T17:25:54Z

@@ -72,7 +75,7 @@ getSubnetworkFromIndra <- function(input,
    direction = match.arg(direction)
    input <- .filterGetSubnetworkFromIndraInput(input, pvalueCutoff, logfc_cutoff, force_include_other, include_infinite_fc, direction)


I added a differential abundance analysis results table here. It's labeled as data-2026-06-10.csv

I noticed that getSubnetworkFromIndra fails with this dataset, but after I filter out all of the rows that have NA in the EntityName/EntityId/EntityNamespace columns, the function works fine. Could you look into the root cause? One solution I thought of was to filter out NA EntityId rows in .filterGetSubnetworkFromIndraInput, but that'd be if the NAs are truly causing the problems

tonywu1999 · 2026-06-11T17:29:10Z

        list(
-            text = hgnc_name,
+            text = text_input,
            organisms = list("9606")


Could you check if the results change (i.e. counting number of rows with NA entityName should be sufficient) if we remove this parameter for organisms (i.e. with the dataset linked in the google drive)? My thinking is that we might accidentally be losing out on chemicals from other organisms (e.g. bacteria).

tonywu1999 · 2026-06-11T17:36:44Z

+    # `emitted_cpds` and `node_type = "compound"` below refer to Cytoscape
+    # grouping containers used to parent PTM satellite nodes around a protein.
+    # This Cytoscape "compound" concept is UNRELATED to the chemical
+    # `proteinIdType = "Compound"` analyte type in annotateProteinInfoFromIndra.


For now, let's use metabolite instead of compound as an enum for proteinIdType. Could you make this change? And then this comment could get removed.

tonywu1999 · 2026-06-11T17:45:16Z

+#' Splits each row's semicolon-joined \code{EntityNamespace} / \code{EntityId}
+#' positionally, fans out each pair into its own grounding node, then appends
+#' any \code{force_include_other} entries (parsed as \code{"namespace:id"}),
+#' returning the unique set. Extracted from \code{.callIndraCogexApi} to keep


nitpick: you can remove the text Extracted from \code{.callIndraCogexApi} to keep #' the network-free portion unit-testable.

tonywu1999 · 2026-06-11T17:51:18Z

+    ns_split <- strsplit(as.character(namespaces), ";")
+    id_split <- strsplit(as.character(ids),        ";")
+    if (length(ns_split) != length(id_split)) {
+        stop("EntityNamespace and EntityId must have the same length")
+    }
+
+    pairs <- list()
+    for (i in seq_along(ns_split)) {
+        ns_i <- ns_split[[i]]
+        id_i <- id_split[[i]]
+        if (length(ns_i) != length(id_i)) {
+            stop("EntityNamespace and EntityId entries must be positionally aligned ",
+                 "after splitting on ';' (mismatch at row ", i, ")")
+        }
+        for (k in seq_along(ns_i)) {
+            pairs <- c(pairs, list(list(ns_i[k], id_i[k])))
+        }
+    }


w.r.t. readability, I'm a little confused about this loop.

It seems like you're splitting the whole character vector by ";" and then process each chunk. For more intuitive readability, would it be better to process each value in namespaces, and then split by ";" after?

swaraj-neu requested a review from tonywu1999 June 10, 2026 04:56

swaraj-neu self-assigned this Jun 10, 2026

tonywu1999 reviewed Jun 11, 2026

View reviewed changes

		#' \item{GlobalProtein}{Character. The input identifier with the
		#' MSstats mnemonic suffix stripped, used as the grounding key.}

		@@ -72,7 +75,7 @@ getSubnetworkFromIndra <- function(input,
		direction = match.arg(direction)
		input <- .filterGetSubnetworkFromIndraInput(input, pvalueCutoff, logfc_cutoff, force_include_other, include_infinite_fc, direction)

Conversation

swaraj-neu commented Jun 10, 2026

Motivation and Context

Changes

Testing

Checklist Before Requesting a Review

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Uh oh!

codecov-commenter commented Jun 10, 2026

Codecov Report

Uh oh!

swaraj-neu commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading

coderabbitai Bot commented Jun 10, 2026 •

edited

Loading