From 27099220b94a561dbc91abd691f948d8e03c17d9 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 1 Jul 2026 08:13:59 +0000 Subject: [PATCH 1/3] docs: add FIPC key literature and development notes to AGENTS.md --- AGENTS.md | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 95d9c93..35480bb 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -38,3 +38,106 @@ parameter calibration and test linking. - Isolate operational fixes (workflow/docs/dependency policy) from algorithmic edits. - Document assumptions and risk in commit/PR summaries. + +--- + +## Key Literature: Fixed Item Parameter Calibration (FIPC) + +This software implements FIPC-based test linking. The following articles are +canonical references for understanding and validating algorithmic choices. + +### Foundational / Original Source + +- **Kang, T., & Petersen, N. S. (2009).** *Linking item parameters to a base + scale.* ACT Research Report Series, 2009-2. ERIC: ED510480. + https://files.eric.ed.gov/fulltext/ED510480.pdf + > **This is the primary source (원전) for this software.** Compares + > concurrent calibration, separate calibration with linking, and FIPC using + > BILOG-MG and PARSCALE. Key finding: **PARSCALE updates the prior ability + > distribution during FIPC whereas BILOG-MG does not.** Only the PARSCALE + > implementation (when used correctly) produced results comparable to the + > other methods. + +### Classic / Methodological Foundations + +- **Stocking, M. L., & Lord, F. M. (1983).** Developing a common metric in + item response theory. *Applied Psychological Measurement, 7*(2), 201–210. + https://doi.org/10.1177/014662168300700208 + > Classic source for scale transformation in IRT linking. Outlines how + > fixed common-item constraints achieve a common metric across forms. + +- **Ban, J.-C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D. J. (2001).** + A comparative study of on-line pretest item calibration/scaling methods in + computerized adaptive testing. *Journal of Educational Measurement, 38*(3), + 191–212. https://www.jstor.org/stable/1435120 (also ERIC: ED449201) + > Compares five online calibration methods for CAT including fixed-parameter + > approaches. Important for understanding FIPC in adaptive testing contexts. + +- **Kolen, M. J., & Brennan, R. L. (2014).** *Test equating, scaling, and + linking: Methods and practices* (3rd ed.). Springer. + https://link.springer.com/book/10.1007/978-1-4939-0317-7 + > Definitive textbook. Chapters on IRT linking provide theoretical + > foundations for FIPC, anchor-item design, and scale maintenance. + +- **Kim, S. H., Cohen, A. S., & Kim, H. (2011).** Fixed-parameter calibration + of item banks. *Applied Psychological Measurement, 35*(7), 559–578. + https://doi.org/10.1177/0146621611401805 + > Evaluates FIPC effectiveness in large-scale CAT and item banking; + > confirms accuracy with sufficient well-distributed anchor items. + +### Recent Advances (2020–2025) + +- **Robitzsch, A. (2024).** Bias and linking error in fixed item parameter + calibration. *AppliedMath, 4*(3), 1181–1191. + https://doi.org/10.3390/appliedmath4030063 + > Analytically derives bias and linking error of FIPC under random DIF + > (2PL model). Shows that as DIF variance grows, both bias and variance of + > group distribution estimates increase substantially. + +- **Robitzsch, A. (2025).** Linking error estimation in fixed item parameter + calibration: Theory and application in large-scale assessment studies. + *Foundations, 5*(1), 4. https://doi.org/10.3390/foundations5010004 + > Proposes a bias-corrected linking error estimator. Conventional jackknife + > resampling estimates are positively biased; this correction is critical for + > valid statistical inference in large-scale assessments (e.g., PISA). + +--- + +## Software Development Notes (FIPC-Specific) + +The following algorithmic and design constraints flow directly from the +literature above and must be respected in all code changes: + +1. **Prior ability distribution update during FIPC.** + Kang & Petersen (2009) found that the key behavioral difference between + BILOG-MG and PARSCALE was whether the prior ability distribution is updated + during FIPC. `mirt`-based calibration in this package must be audited for + which behavior it implements before changing estimation calls. + +2. **Anchor/common item quality is critical.** + At least 20–30 well-distributed anchor items spanning the θ continuum are + recommended (Kim et al., 2011). The `checkIPD` step in `autoFIPC()` exists + to detect and flag drifted anchor items before linking; do not disable or + weaken this check without regression evidence. + +3. **Item Parameter Drift (IPD) must be detected before FIPC.** + IPD (change in item parameters over time or across cohorts) biases FIPC + linking when drifted items remain in the anchor set. Detection methods + include robust-Z, D2/WRMSD, RMSD, and likelihood-ratio tests. Items with + significant drift should be excluded from the anchor set. + +4. **DIF and linking error.** + Unmodeled DIF in anchor items inflates both bias and linking error + (Robitzsch, 2024). Any future reporting of linking uncertainty must + account for bias-corrected linking error, not just sampling error. + +5. **Comparison reference package.** + The R package `irtQ` (CRAN) independently implements FIPC for + dichotomous (1PL/2PL/3PL) and polytomous (GRM/GPCM) models and can be + used to cross-validate numerical results from `aFIPC`. + +6. **Interactive prompts limit automation.** + The `confirmCommonItems` parameter was added to bypass the interactive + common-item confirmation step. Any further removal of interactive prompts + must preserve identical numerical behavior and be covered by regression + fixtures before merging. From 3c2e568c1277d93a2ead6261fa30d2e20fb10383 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 1 Jul 2026 08:38:12 +0000 Subject: [PATCH 2/3] Add fixture-backed regression tests for prior update and IPD anchor filtering --- ARCHITECTURE.md | 4 +- .../fixtures/fipc-regression-fixtures.R | 23 ++ tests/testthat/test-regression-fixtures.R | 208 ++++++++++++++++++ 3 files changed, 233 insertions(+), 2 deletions(-) create mode 100644 tests/testthat/fixtures/fipc-regression-fixtures.R create mode 100644 tests/testthat/test-regression-fixtures.R diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 3880f15..3b62fe9 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -94,8 +94,8 @@ package metadata, and CI workflow definitions in Git. ## 9. Future Considerations / Roadmap -- Add non-interactive regression fixtures for historically trusted - FIPC results. +- Expand regression fixtures beyond current prior-update/IPD-anchor + scenarios to additional historically trusted FIPC configurations. - Reduce interactive prompts in `autoFIPC()` for automation friendliness. - Evaluate migration path from historical `packrat/` to a modern lock workflow. diff --git a/tests/testthat/fixtures/fipc-regression-fixtures.R b/tests/testthat/fixtures/fipc-regression-fixtures.R new file mode 100644 index 0000000..fcfc4ab --- /dev/null +++ b/tests/testthat/fixtures/fipc-regression-fixtures.R @@ -0,0 +1,23 @@ +regression_fixture_prior_update <- list( + seed = 20260701L, + n_old = 1800L, + n_new = 1800L, + old_theta_mean = 0, + old_theta_sd = 1, + new_theta_mean = 0.85, + new_theta_sd = 1.15, + common_count = 6L, + unique_count = 2L, + itemtype = "2PL", + expect_shifted_mean_abs_gt = 0.2 +) + +regression_fixture_ipd_anchor <- list( + seed = 20260702L, + n_old = 2200L, + n_new = 2200L, + common_count = 6L, + unique_count = 2L, + drift_common_index = 3L, + itemtype = "2PL" +) diff --git a/tests/testthat/test-regression-fixtures.R b/tests/testthat/test-regression-fixtures.R new file mode 100644 index 0000000..18f4055 --- /dev/null +++ b/tests/testthat/test-regression-fixtures.R @@ -0,0 +1,208 @@ +source(testthat::test_path("fixtures", "fipc-regression-fixtures.R"), local = TRUE) + +extract_group_parameter <- function(model, parameter_name_pattern) { + values <- mirt::mod2values(model) + values[values$item == "GROUP" & grepl(parameter_name_pattern, values$name), , drop = FALSE] +} + +test_that("prior-update fixture distinguishes free-mean and fixed-normal linking", { + skip_if_not_installed("mirt") + + fx <- regression_fixture_prior_update + set.seed(fx$seed) + + old_common_items <- paste0("old_common_", seq_len(fx$common_count)) + new_common_items <- paste0("new_common_", seq_len(fx$common_count)) + old_unique_items <- paste0("old_unique_", seq_len(fx$unique_count)) + new_unique_items <- paste0("new_unique_", seq_len(fx$unique_count)) + + old_item_names <- c(old_common_items, old_unique_items) + new_item_names <- c(new_common_items, new_unique_items) + + old_a <- matrix(c(0.88, 1.05, 1.19, 0.93, 1.12, 0.99, 1.28, 0.76), ncol = 1) + old_d <- c(-1.20, -0.65, -0.20, 0.10, 0.55, 0.95, -0.35, 0.60) + new_a <- matrix(c(0.88, 1.05, 1.19, 0.93, 1.12, 0.99, 1.38, 0.70), ncol = 1) + new_d <- c(-1.20, -0.65, -0.20, 0.10, 0.55, 0.95, 0.25, 1.10) + + theta_old <- matrix( + rnorm(fx$n_old, mean = fx$old_theta_mean, sd = fx$old_theta_sd), + ncol = 1 + ) + theta_new <- matrix( + rnorm(fx$n_new, mean = fx$new_theta_mean, sd = fx$new_theta_sd), + ncol = 1 + ) + + old_data <- as.data.frame(mirt::simdata( + a = old_a, + d = old_d, + itemtype = rep(fx$itemtype, length(old_item_names)), + Theta = theta_old + )) + new_data <- as.data.frame(mirt::simdata( + a = new_a, + d = new_d, + itemtype = rep(fx$itemtype, length(new_item_names)), + Theta = theta_new + )) + names(old_data) <- old_item_names + names(new_data) <- new_item_names + + old_model <- mirt::mirt( + old_data, + 1, + itemtype = fx$itemtype, + method = "EM", + SE = FALSE, + verbose = FALSE, + technical = list(NCYCLES = 600) + ) + new_model <- mirt::mirt( + new_data, + 1, + itemtype = fx$itemtype, + method = "EM", + SE = FALSE, + verbose = FALSE, + technical = list(NCYCLES = 600) + ) + + linked_free <- aFIPC::autoFIPC( + newformXData = new_model, + oldformYData = old_model, + newformCommonItemNames = new_common_items, + oldformCommonItemNames = old_common_items, + itemtype = fx$itemtype, + checkIPD = FALSE, + tryEM = TRUE, + freeMEAN = TRUE, + forceNormalZeroOne = FALSE, + confirmCommonItems = TRUE + ) + + linked_fixed <- aFIPC::autoFIPC( + newformXData = new_model, + oldformYData = old_model, + newformCommonItemNames = new_common_items, + oldformCommonItemNames = old_common_items, + itemtype = fx$itemtype, + checkIPD = FALSE, + tryEM = TRUE, + freeMEAN = TRUE, + forceNormalZeroOne = TRUE, + confirmCommonItems = TRUE + ) + + free_mean <- extract_group_parameter(linked_free$LinkedModel, "MEAN") + fixed_mean <- extract_group_parameter(linked_fixed$LinkedModel, "MEAN") + fixed_cov <- extract_group_parameter(linked_fixed$LinkedModel, "COV") + + expect_true(any(free_mean$est)) + expect_gt(abs(free_mean$value[1]), fx$expect_shifted_mean_abs_gt) + expect_false(any(fixed_mean$est)) + expect_equal(fixed_mean$value[1], 0, tolerance = 1e-8) + expect_false(any(fixed_cov$est)) + expect_equal(fixed_cov$value[1], 1, tolerance = 1e-8) +}) + +test_that("IPD fixture quantitatively filters drifted anchors before linking", { + skip_if_not_installed("mirt") + + fx <- regression_fixture_ipd_anchor + set.seed(fx$seed) + + old_common_items <- paste0("old_anchor_", seq_len(fx$common_count)) + new_common_items <- paste0("new_anchor_", seq_len(fx$common_count)) + old_unique_items <- paste0("old_unique_", seq_len(fx$unique_count)) + new_unique_items <- paste0("new_unique_", seq_len(fx$unique_count)) + + old_item_names <- c(old_common_items, old_unique_items) + new_item_names <- c(new_common_items, new_unique_items) + + old_a <- matrix(c(0.90, 1.07, 1.18, 0.96, 1.10, 0.87, 1.33, 0.78), ncol = 1) + old_d <- c(-1.10, -0.55, -0.15, 0.20, 0.70, 1.05, -0.40, 0.50) + new_a <- old_a + new_d <- old_d + + drift_index <- fx$drift_common_index + new_a[drift_index, 1] <- 1.85 + new_d[drift_index] <- 2.20 + + old_data <- as.data.frame(mirt::simdata( + a = old_a, + d = old_d, + itemtype = rep(fx$itemtype, length(old_item_names)), + N = fx$n_old + )) + new_data <- as.data.frame(mirt::simdata( + a = new_a, + d = new_d, + itemtype = rep(fx$itemtype, length(new_item_names)), + N = fx$n_new + )) + names(old_data) <- old_item_names + names(new_data) <- new_item_names + + old_model <- mirt::mirt( + old_data, + 1, + itemtype = fx$itemtype, + method = "EM", + SE = FALSE, + verbose = FALSE, + technical = list(NCYCLES = 600) + ) + new_model <- mirt::mirt( + new_data, + 1, + itemtype = fx$itemtype, + method = "EM", + SE = FALSE, + verbose = FALSE, + technical = list(NCYCLES = 600) + ) + + linked <- aFIPC::autoFIPC( + newformXData = new_model, + oldformYData = old_model, + newformCommonItemNames = new_common_items, + oldformCommonItemNames = old_common_items, + itemtype = fx$itemtype, + checkIPD = TRUE, + tryEM = TRUE, + confirmCommonItems = TRUE + ) + + expect_true("IPDData" %in% names(linked)) + expect_true("IPDCommonItemList" %in% names(linked)) + + retained_old <- as.character(unlist(linked$IPDCommonItemList[1, , drop = TRUE])) + retained_new <- as.character(unlist(linked$IPDCommonItemList[2, , drop = TRUE])) + drifted_old <- old_common_items[drift_index] + drifted_new <- new_common_items[drift_index] + + expect_lt(length(retained_old), length(old_common_items)) + expect_false(drifted_old %in% retained_old) + expect_false(drifted_new %in% retained_new) + + old_values <- mirt::mod2values(old_model) + linked_values <- mirt::mod2values(linked$LinkedModel) + common_count <- length(retained_old) + mean_abs_distance <- numeric(common_count) + + for (i in seq_len(common_count)) { + old_item <- retained_old[i] + new_item <- retained_new[i] + old_fixed <- old_values[ + old_values$item == old_item & old_values$name %in% c("a1", "d"), + "value" + ] + linked_fixed <- linked_values[ + linked_values$item == new_item & linked_values$name %in% c("a1", "d"), + "value" + ] + mean_abs_distance[i] <- mean(abs(old_fixed - linked_fixed)) + } + + expect_lt(mean(mean_abs_distance), 1e-6) +}) From 08ecba0397ae0e768ca9703e3218c5971f50481d Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Wed, 1 Jul 2026 08:39:01 +0000 Subject: [PATCH 3/3] Align parameter names in IPD anchor distance regression assertion --- tests/testthat/test-regression-fixtures.R | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/tests/testthat/test-regression-fixtures.R b/tests/testthat/test-regression-fixtures.R index 18f4055..739d8f9 100644 --- a/tests/testthat/test-regression-fixtures.R +++ b/tests/testthat/test-regression-fixtures.R @@ -195,13 +195,14 @@ test_that("IPD fixture quantitatively filters drifted anchors before linking", { new_item <- retained_new[i] old_fixed <- old_values[ old_values$item == old_item & old_values$name %in% c("a1", "d"), - "value" + c("name", "value") ] linked_fixed <- linked_values[ linked_values$item == new_item & linked_values$name %in% c("a1", "d"), - "value" + c("name", "value") ] - mean_abs_distance[i] <- mean(abs(old_fixed - linked_fixed)) + aligned <- merge(old_fixed, linked_fixed, by = "name", sort = FALSE) + mean_abs_distance[i] <- mean(abs(aligned$value.x - aligned$value.y)) } expect_lt(mean(mean_abs_distance), 1e-6)