From 27099220b94a561dbc91abd691f948d8e03c17d9 Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Wed, 1 Jul 2026 08:13:59 +0000
Subject: [PATCH 1/3] docs: add FIPC key literature and development notes to
 AGENTS.md

---
 AGENTS.md | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/AGENTS.md b/AGENTS.md
index 95d9c93..35480bb 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -38,3 +38,106 @@ parameter calibration and test linking.
 - Isolate operational fixes (workflow/docs/dependency policy) from algorithmic
   edits.
 - Document assumptions and risk in commit/PR summaries.
+
+---
+
+## Key Literature: Fixed Item Parameter Calibration (FIPC)
+
+This software implements FIPC-based test linking. The following articles are
+canonical references for understanding and validating algorithmic choices.
+
+### Foundational / Original Source
+
+- **Kang, T., & Petersen, N. S. (2009).** *Linking item parameters to a base
+  scale.* ACT Research Report Series, 2009-2. ERIC: ED510480.
+  https://files.eric.ed.gov/fulltext/ED510480.pdf
+  > **This is the primary source (원전) for this software.** Compares
+  > concurrent calibration, separate calibration with linking, and FIPC using
+  > BILOG-MG and PARSCALE. Key finding: **PARSCALE updates the prior ability
+  > distribution during FIPC whereas BILOG-MG does not.** Only the PARSCALE
+  > implementation (when used correctly) produced results comparable to the
+  > other methods.
+
+### Classic / Methodological Foundations
+
+- **Stocking, M. L., & Lord, F. M. (1983).** Developing a common metric in
+  item response theory. *Applied Psychological Measurement, 7*(2), 201–210.
+  https://doi.org/10.1177/014662168300700208
+  > Classic source for scale transformation in IRT linking. Outlines how
+  > fixed common-item constraints achieve a common metric across forms.
+
+- **Ban, J.-C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D. J. (2001).**
+  A comparative study of on-line pretest item calibration/scaling methods in
+  computerized adaptive testing. *Journal of Educational Measurement, 38*(3),
+  191–212. https://www.jstor.org/stable/1435120 (also ERIC: ED449201)
+  > Compares five online calibration methods for CAT including fixed-parameter
+  > approaches. Important for understanding FIPC in adaptive testing contexts.
+
+- **Kolen, M. J., & Brennan, R. L. (2014).** *Test equating, scaling, and
+  linking: Methods and practices* (3rd ed.). Springer.
+  https://link.springer.com/book/10.1007/978-1-4939-0317-7
+  > Definitive textbook. Chapters on IRT linking provide theoretical
+  > foundations for FIPC, anchor-item design, and scale maintenance.
+
+- **Kim, S. H., Cohen, A. S., & Kim, H. (2011).** Fixed-parameter calibration
+  of item banks. *Applied Psychological Measurement, 35*(7), 559–578.
+  https://doi.org/10.1177/0146621611401805
+  > Evaluates FIPC effectiveness in large-scale CAT and item banking;
+  > confirms accuracy with sufficient well-distributed anchor items.
+
+### Recent Advances (2020–2025)
+
+- **Robitzsch, A. (2024).** Bias and linking error in fixed item parameter
+  calibration. *AppliedMath, 4*(3), 1181–1191.
+  https://doi.org/10.3390/appliedmath4030063
+  > Analytically derives bias and linking error of FIPC under random DIF
+  > (2PL model). Shows that as DIF variance grows, both bias and variance of
+  > group distribution estimates increase substantially.
+
+- **Robitzsch, A. (2025).** Linking error estimation in fixed item parameter
+  calibration: Theory and application in large-scale assessment studies.
+  *Foundations, 5*(1), 4. https://doi.org/10.3390/foundations5010004
+  > Proposes a bias-corrected linking error estimator. Conventional jackknife
+  > resampling estimates are positively biased; this correction is critical for
+  > valid statistical inference in large-scale assessments (e.g., PISA).
+
+---
+
+## Software Development Notes (FIPC-Specific)
+
+The following algorithmic and design constraints flow directly from the
+literature above and must be respected in all code changes:
+
+1. **Prior ability distribution update during FIPC.**
+   Kang & Petersen (2009) found that the key behavioral difference between
+   BILOG-MG and PARSCALE was whether the prior ability distribution is updated
+   during FIPC. `mirt`-based calibration in this package must be audited for
+   which behavior it implements before changing estimation calls.
+
+2. **Anchor/common item quality is critical.**
+   At least 20–30 well-distributed anchor items spanning the θ continuum are
+   recommended (Kim et al., 2011). The `checkIPD` step in `autoFIPC()` exists
+   to detect and flag drifted anchor items before linking; do not disable or
+   weaken this check without regression evidence.
+
+3. **Item Parameter Drift (IPD) must be detected before FIPC.**
+   IPD (change in item parameters over time or across cohorts) biases FIPC
+   linking when drifted items remain in the anchor set. Detection methods
+   include robust-Z, D2/WRMSD, RMSD, and likelihood-ratio tests. Items with
+   significant drift should be excluded from the anchor set.
+
+4. **DIF and linking error.**
+   Unmodeled DIF in anchor items inflates both bias and linking error
+   (Robitzsch, 2024). Any future reporting of linking uncertainty must
+   account for bias-corrected linking error, not just sampling error.
+
+5. **Comparison reference package.**
+   The R package `irtQ` (CRAN) independently implements FIPC for
+   dichotomous (1PL/2PL/3PL) and polytomous (GRM/GPCM) models and can be
+   used to cross-validate numerical results from `aFIPC`.
+
+6. **Interactive prompts limit automation.**
+   The `confirmCommonItems` parameter was added to bypass the interactive
+   common-item confirmation step. Any further removal of interactive prompts
+   must preserve identical numerical behavior and be covered by regression
+   fixtures before merging.

From 3c2e568c1277d93a2ead6261fa30d2e20fb10383 Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Wed, 1 Jul 2026 08:38:12 +0000
Subject: [PATCH 2/3] Add fixture-backed regression tests for prior update and
 IPD anchor filtering

---
 ARCHITECTURE.md                               |   4 +-
 .../fixtures/fipc-regression-fixtures.R       |  23 ++
 tests/testthat/test-regression-fixtures.R     | 208 ++++++++++++++++++
 3 files changed, 233 insertions(+), 2 deletions(-)
 create mode 100644 tests/testthat/fixtures/fipc-regression-fixtures.R
 create mode 100644 tests/testthat/test-regression-fixtures.R

diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
index 3880f15..3b62fe9 100644
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -94,8 +94,8 @@ package metadata, and CI workflow definitions in Git.
 
 ## 9. Future Considerations / Roadmap
 
-- Add non-interactive regression fixtures for historically trusted
-  FIPC results.
+- Expand regression fixtures beyond current prior-update/IPD-anchor
+  scenarios to additional historically trusted FIPC configurations.
 - Reduce interactive prompts in `autoFIPC()` for automation friendliness.
 - Evaluate migration path from historical `packrat/` to a modern
   lock workflow.
diff --git a/tests/testthat/fixtures/fipc-regression-fixtures.R b/tests/testthat/fixtures/fipc-regression-fixtures.R
new file mode 100644
index 0000000..fcfc4ab
--- /dev/null
+++ b/tests/testthat/fixtures/fipc-regression-fixtures.R
@@ -0,0 +1,23 @@
+regression_fixture_prior_update <- list(
+  seed = 20260701L,
+  n_old = 1800L,
+  n_new = 1800L,
+  old_theta_mean = 0,
+  old_theta_sd = 1,
+  new_theta_mean = 0.85,
+  new_theta_sd = 1.15,
+  common_count = 6L,
+  unique_count = 2L,
+  itemtype = "2PL",
+  expect_shifted_mean_abs_gt = 0.2
+)
+
+regression_fixture_ipd_anchor <- list(
+  seed = 20260702L,
+  n_old = 2200L,
+  n_new = 2200L,
+  common_count = 6L,
+  unique_count = 2L,
+  drift_common_index = 3L,
+  itemtype = "2PL"
+)
diff --git a/tests/testthat/test-regression-fixtures.R b/tests/testthat/test-regression-fixtures.R
new file mode 100644
index 0000000..18f4055
--- /dev/null
+++ b/tests/testthat/test-regression-fixtures.R
@@ -0,0 +1,208 @@
+source(testthat::test_path("fixtures", "fipc-regression-fixtures.R"), local = TRUE)
+
+extract_group_parameter <- function(model, parameter_name_pattern) {
+  values <- mirt::mod2values(model)
+  values[values$item == "GROUP" & grepl(parameter_name_pattern, values$name), , drop = FALSE]
+}
+
+test_that("prior-update fixture distinguishes free-mean and fixed-normal linking", {
+  skip_if_not_installed("mirt")
+
+  fx <- regression_fixture_prior_update
+  set.seed(fx$seed)
+
+  old_common_items <- paste0("old_common_", seq_len(fx$common_count))
+  new_common_items <- paste0("new_common_", seq_len(fx$common_count))
+  old_unique_items <- paste0("old_unique_", seq_len(fx$unique_count))
+  new_unique_items <- paste0("new_unique_", seq_len(fx$unique_count))
+
+  old_item_names <- c(old_common_items, old_unique_items)
+  new_item_names <- c(new_common_items, new_unique_items)
+
+  old_a <- matrix(c(0.88, 1.05, 1.19, 0.93, 1.12, 0.99, 1.28, 0.76), ncol = 1)
+  old_d <- c(-1.20, -0.65, -0.20, 0.10, 0.55, 0.95, -0.35, 0.60)
+  new_a <- matrix(c(0.88, 1.05, 1.19, 0.93, 1.12, 0.99, 1.38, 0.70), ncol = 1)
+  new_d <- c(-1.20, -0.65, -0.20, 0.10, 0.55, 0.95, 0.25, 1.10)
+
+  theta_old <- matrix(
+    rnorm(fx$n_old, mean = fx$old_theta_mean, sd = fx$old_theta_sd),
+    ncol = 1
+  )
+  theta_new <- matrix(
+    rnorm(fx$n_new, mean = fx$new_theta_mean, sd = fx$new_theta_sd),
+    ncol = 1
+  )
+
+  old_data <- as.data.frame(mirt::simdata(
+    a = old_a,
+    d = old_d,
+    itemtype = rep(fx$itemtype, length(old_item_names)),
+    Theta = theta_old
+  ))
+  new_data <- as.data.frame(mirt::simdata(
+    a = new_a,
+    d = new_d,
+    itemtype = rep(fx$itemtype, length(new_item_names)),
+    Theta = theta_new
+  ))
+  names(old_data) <- old_item_names
+  names(new_data) <- new_item_names
+
+  old_model <- mirt::mirt(
+    old_data,
+    1,
+    itemtype = fx$itemtype,
+    method = "EM",
+    SE = FALSE,
+    verbose = FALSE,
+    technical = list(NCYCLES = 600)
+  )
+  new_model <- mirt::mirt(
+    new_data,
+    1,
+    itemtype = fx$itemtype,
+    method = "EM",
+    SE = FALSE,
+    verbose = FALSE,
+    technical = list(NCYCLES = 600)
+  )
+
+  linked_free <- aFIPC::autoFIPC(
+    newformXData = new_model,
+    oldformYData = old_model,
+    newformCommonItemNames = new_common_items,
+    oldformCommonItemNames = old_common_items,
+    itemtype = fx$itemtype,
+    checkIPD = FALSE,
+    tryEM = TRUE,
+    freeMEAN = TRUE,
+    forceNormalZeroOne = FALSE,
+    confirmCommonItems = TRUE
+  )
+
+  linked_fixed <- aFIPC::autoFIPC(
+    newformXData = new_model,
+    oldformYData = old_model,
+    newformCommonItemNames = new_common_items,
+    oldformCommonItemNames = old_common_items,
+    itemtype = fx$itemtype,
+    checkIPD = FALSE,
+    tryEM = TRUE,
+    freeMEAN = TRUE,
+    forceNormalZeroOne = TRUE,
+    confirmCommonItems = TRUE
+  )
+
+  free_mean <- extract_group_parameter(linked_free$LinkedModel, "MEAN")
+  fixed_mean <- extract_group_parameter(linked_fixed$LinkedModel, "MEAN")
+  fixed_cov <- extract_group_parameter(linked_fixed$LinkedModel, "COV")
+
+  expect_true(any(free_mean$est))
+  expect_gt(abs(free_mean$value[1]), fx$expect_shifted_mean_abs_gt)
+  expect_false(any(fixed_mean$est))
+  expect_equal(fixed_mean$value[1], 0, tolerance = 1e-8)
+  expect_false(any(fixed_cov$est))
+  expect_equal(fixed_cov$value[1], 1, tolerance = 1e-8)
+})
+
+test_that("IPD fixture quantitatively filters drifted anchors before linking", {
+  skip_if_not_installed("mirt")
+
+  fx <- regression_fixture_ipd_anchor
+  set.seed(fx$seed)
+
+  old_common_items <- paste0("old_anchor_", seq_len(fx$common_count))
+  new_common_items <- paste0("new_anchor_", seq_len(fx$common_count))
+  old_unique_items <- paste0("old_unique_", seq_len(fx$unique_count))
+  new_unique_items <- paste0("new_unique_", seq_len(fx$unique_count))
+
+  old_item_names <- c(old_common_items, old_unique_items)
+  new_item_names <- c(new_common_items, new_unique_items)
+
+  old_a <- matrix(c(0.90, 1.07, 1.18, 0.96, 1.10, 0.87, 1.33, 0.78), ncol = 1)
+  old_d <- c(-1.10, -0.55, -0.15, 0.20, 0.70, 1.05, -0.40, 0.50)
+  new_a <- old_a
+  new_d <- old_d
+
+  drift_index <- fx$drift_common_index
+  new_a[drift_index, 1] <- 1.85
+  new_d[drift_index] <- 2.20
+
+  old_data <- as.data.frame(mirt::simdata(
+    a = old_a,
+    d = old_d,
+    itemtype = rep(fx$itemtype, length(old_item_names)),
+    N = fx$n_old
+  ))
+  new_data <- as.data.frame(mirt::simdata(
+    a = new_a,
+    d = new_d,
+    itemtype = rep(fx$itemtype, length(new_item_names)),
+    N = fx$n_new
+  ))
+  names(old_data) <- old_item_names
+  names(new_data) <- new_item_names
+
+  old_model <- mirt::mirt(
+    old_data,
+    1,
+    itemtype = fx$itemtype,
+    method = "EM",
+    SE = FALSE,
+    verbose = FALSE,
+    technical = list(NCYCLES = 600)
+  )
+  new_model <- mirt::mirt(
+    new_data,
+    1,
+    itemtype = fx$itemtype,
+    method = "EM",
+    SE = FALSE,
+    verbose = FALSE,
+    technical = list(NCYCLES = 600)
+  )
+
+  linked <- aFIPC::autoFIPC(
+    newformXData = new_model,
+    oldformYData = old_model,
+    newformCommonItemNames = new_common_items,
+    oldformCommonItemNames = old_common_items,
+    itemtype = fx$itemtype,
+    checkIPD = TRUE,
+    tryEM = TRUE,
+    confirmCommonItems = TRUE
+  )
+
+  expect_true("IPDData" %in% names(linked))
+  expect_true("IPDCommonItemList" %in% names(linked))
+
+  retained_old <- as.character(unlist(linked$IPDCommonItemList[1, , drop = TRUE]))
+  retained_new <- as.character(unlist(linked$IPDCommonItemList[2, , drop = TRUE]))
+  drifted_old <- old_common_items[drift_index]
+  drifted_new <- new_common_items[drift_index]
+
+  expect_lt(length(retained_old), length(old_common_items))
+  expect_false(drifted_old %in% retained_old)
+  expect_false(drifted_new %in% retained_new)
+
+  old_values <- mirt::mod2values(old_model)
+  linked_values <- mirt::mod2values(linked$LinkedModel)
+  common_count <- length(retained_old)
+  mean_abs_distance <- numeric(common_count)
+
+  for (i in seq_len(common_count)) {
+    old_item <- retained_old[i]
+    new_item <- retained_new[i]
+    old_fixed <- old_values[
+      old_values$item == old_item & old_values$name %in% c("a1", "d"),
+      "value"
+    ]
+    linked_fixed <- linked_values[
+      linked_values$item == new_item & linked_values$name %in% c("a1", "d"),
+      "value"
+    ]
+    mean_abs_distance[i] <- mean(abs(old_fixed - linked_fixed))
+  }
+
+  expect_lt(mean(mean_abs_distance), 1e-6)
+})

From 08ecba0397ae0e768ca9703e3218c5971f50481d Mon Sep 17 00:00:00 2001
From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com>
Date: Wed, 1 Jul 2026 08:39:01 +0000
Subject: [PATCH 3/3] Align parameter names in IPD anchor distance regression
 assertion

---
 tests/testthat/test-regression-fixtures.R | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tests/testthat/test-regression-fixtures.R b/tests/testthat/test-regression-fixtures.R
index 18f4055..739d8f9 100644
--- a/tests/testthat/test-regression-fixtures.R
+++ b/tests/testthat/test-regression-fixtures.R
@@ -195,13 +195,14 @@ test_that("IPD fixture quantitatively filters drifted anchors before linking", {
     new_item <- retained_new[i]
     old_fixed <- old_values[
       old_values$item == old_item & old_values$name %in% c("a1", "d"),
-      "value"
+      c("name", "value")
     ]
     linked_fixed <- linked_values[
       linked_values$item == new_item & linked_values$name %in% c("a1", "d"),
-      "value"
+      c("name", "value")
     ]
-    mean_abs_distance[i] <- mean(abs(old_fixed - linked_fixed))
+    aligned <- merge(old_fixed, linked_fixed, by = "name", sort = FALSE)
+    mean_abs_distance[i] <- mean(abs(aligned$value.x - aligned$value.y))
   }
 
   expect_lt(mean(mean_abs_distance), 1e-6)