Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,106 @@ parameter calibration and test linking.
- Isolate operational fixes (workflow/docs/dependency policy) from algorithmic
edits.
- Document assumptions and risk in commit/PR summaries.

---

## Key Literature: Fixed Item Parameter Calibration (FIPC)

This software implements FIPC-based test linking. The following articles are
canonical references for understanding and validating algorithmic choices.

### Foundational / Original Source

- **Kang, T., & Petersen, N. S. (2009).** *Linking item parameters to a base
scale.* ACT Research Report Series, 2009-2. ERIC: ED510480.
https://files.eric.ed.gov/fulltext/ED510480.pdf
> **This is the primary source (원전) for this software.** Compares
> concurrent calibration, separate calibration with linking, and FIPC using
> BILOG-MG and PARSCALE. Key finding: **PARSCALE updates the prior ability
> distribution during FIPC whereas BILOG-MG does not.** Only the PARSCALE
> implementation (when used correctly) produced results comparable to the
> other methods.

### Classic / Methodological Foundations

- **Stocking, M. L., & Lord, F. M. (1983).** Developing a common metric in
item response theory. *Applied Psychological Measurement, 7*(2), 201–210.
https://doi.org/10.1177/014662168300700208
> Classic source for scale transformation in IRT linking. Outlines how
> fixed common-item constraints achieve a common metric across forms.

- **Ban, J.-C., Hanson, B. A., Wang, T., Yi, Q., & Harris, D. J. (2001).**
A comparative study of on-line pretest item calibration/scaling methods in
computerized adaptive testing. *Journal of Educational Measurement, 38*(3),
191–212. https://www.jstor.org/stable/1435120 (also ERIC: ED449201)
> Compares five online calibration methods for CAT including fixed-parameter
> approaches. Important for understanding FIPC in adaptive testing contexts.

- **Kolen, M. J., & Brennan, R. L. (2014).** *Test equating, scaling, and
linking: Methods and practices* (3rd ed.). Springer.
https://link.springer.com/book/10.1007/978-1-4939-0317-7
> Definitive textbook. Chapters on IRT linking provide theoretical
> foundations for FIPC, anchor-item design, and scale maintenance.

- **Kim, S. H., Cohen, A. S., & Kim, H. (2011).** Fixed-parameter calibration
of item banks. *Applied Psychological Measurement, 35*(7), 559–578.
https://doi.org/10.1177/0146621611401805
> Evaluates FIPC effectiveness in large-scale CAT and item banking;
> confirms accuracy with sufficient well-distributed anchor items.

### Recent Advances (2020–2025)

- **Robitzsch, A. (2024).** Bias and linking error in fixed item parameter
calibration. *AppliedMath, 4*(3), 1181–1191.
https://doi.org/10.3390/appliedmath4030063
> Analytically derives bias and linking error of FIPC under random DIF
> (2PL model). Shows that as DIF variance grows, both bias and variance of
> group distribution estimates increase substantially.

- **Robitzsch, A. (2025).** Linking error estimation in fixed item parameter
calibration: Theory and application in large-scale assessment studies.
*Foundations, 5*(1), 4. https://doi.org/10.3390/foundations5010004
> Proposes a bias-corrected linking error estimator. Conventional jackknife
> resampling estimates are positively biased; this correction is critical for
> valid statistical inference in large-scale assessments (e.g., PISA).

---

## Software Development Notes (FIPC-Specific)

The following algorithmic and design constraints flow directly from the
literature above and must be respected in all code changes:

1. **Prior ability distribution update during FIPC.**
Kang & Petersen (2009) found that the key behavioral difference between
BILOG-MG and PARSCALE was whether the prior ability distribution is updated
during FIPC. `mirt`-based calibration in this package must be audited for
which behavior it implements before changing estimation calls.

2. **Anchor/common item quality is critical.**
At least 20–30 well-distributed anchor items spanning the θ continuum are
recommended (Kim et al., 2011). The `checkIPD` step in `autoFIPC()` exists
to detect and flag drifted anchor items before linking; do not disable or
weaken this check without regression evidence.

3. **Item Parameter Drift (IPD) must be detected before FIPC.**
IPD (change in item parameters over time or across cohorts) biases FIPC
linking when drifted items remain in the anchor set. Detection methods
include robust-Z, D2/WRMSD, RMSD, and likelihood-ratio tests. Items with
significant drift should be excluded from the anchor set.

4. **DIF and linking error.**
Unmodeled DIF in anchor items inflates both bias and linking error
(Robitzsch, 2024). Any future reporting of linking uncertainty must
account for bias-corrected linking error, not just sampling error.

5. **Comparison reference package.**
The R package `irtQ` (CRAN) independently implements FIPC for
dichotomous (1PL/2PL/3PL) and polytomous (GRM/GPCM) models and can be
used to cross-validate numerical results from `aFIPC`.

6. **Interactive prompts limit automation.**
The `confirmCommonItems` parameter was added to bypass the interactive
common-item confirmation step. Any further removal of interactive prompts
must preserve identical numerical behavior and be covered by regression
fixtures before merging.
4 changes: 2 additions & 2 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,8 @@ package metadata, and CI workflow definitions in Git.

## 9. Future Considerations / Roadmap

- Add non-interactive regression fixtures for historically trusted
FIPC results.
- Expand regression fixtures beyond current prior-update/IPD-anchor
scenarios to additional historically trusted FIPC configurations.
- Reduce interactive prompts in `autoFIPC()` for automation friendliness.
- Evaluate migration path from historical `packrat/` to a modern
lock workflow.
Expand Down
23 changes: 23 additions & 0 deletions tests/testthat/fixtures/fipc-regression-fixtures.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
regression_fixture_prior_update <- list(
seed = 20260701L,
n_old = 1800L,
n_new = 1800L,
old_theta_mean = 0,
old_theta_sd = 1,
new_theta_mean = 0.85,
new_theta_sd = 1.15,
common_count = 6L,
unique_count = 2L,
itemtype = "2PL",
expect_shifted_mean_abs_gt = 0.2
)

regression_fixture_ipd_anchor <- list(
seed = 20260702L,
n_old = 2200L,
n_new = 2200L,
common_count = 6L,
unique_count = 2L,
drift_common_index = 3L,
itemtype = "2PL"
)
209 changes: 209 additions & 0 deletions tests/testthat/test-regression-fixtures.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
source(testthat::test_path("fixtures", "fipc-regression-fixtures.R"), local = TRUE)

extract_group_parameter <- function(model, parameter_name_pattern) {
values <- mirt::mod2values(model)
values[values$item == "GROUP" & grepl(parameter_name_pattern, values$name), , drop = FALSE]
}

test_that("prior-update fixture distinguishes free-mean and fixed-normal linking", {
skip_if_not_installed("mirt")

fx <- regression_fixture_prior_update
set.seed(fx$seed)

old_common_items <- paste0("old_common_", seq_len(fx$common_count))
new_common_items <- paste0("new_common_", seq_len(fx$common_count))
old_unique_items <- paste0("old_unique_", seq_len(fx$unique_count))
new_unique_items <- paste0("new_unique_", seq_len(fx$unique_count))

old_item_names <- c(old_common_items, old_unique_items)
new_item_names <- c(new_common_items, new_unique_items)

old_a <- matrix(c(0.88, 1.05, 1.19, 0.93, 1.12, 0.99, 1.28, 0.76), ncol = 1)
old_d <- c(-1.20, -0.65, -0.20, 0.10, 0.55, 0.95, -0.35, 0.60)
new_a <- matrix(c(0.88, 1.05, 1.19, 0.93, 1.12, 0.99, 1.38, 0.70), ncol = 1)
new_d <- c(-1.20, -0.65, -0.20, 0.10, 0.55, 0.95, 0.25, 1.10)

theta_old <- matrix(
rnorm(fx$n_old, mean = fx$old_theta_mean, sd = fx$old_theta_sd),
ncol = 1
)
theta_new <- matrix(
rnorm(fx$n_new, mean = fx$new_theta_mean, sd = fx$new_theta_sd),
ncol = 1
)

old_data <- as.data.frame(mirt::simdata(
a = old_a,
d = old_d,
itemtype = rep(fx$itemtype, length(old_item_names)),
Theta = theta_old
))
new_data <- as.data.frame(mirt::simdata(
a = new_a,
d = new_d,
itemtype = rep(fx$itemtype, length(new_item_names)),
Theta = theta_new
))
names(old_data) <- old_item_names
names(new_data) <- new_item_names

old_model <- mirt::mirt(
old_data,
1,
itemtype = fx$itemtype,
method = "EM",
SE = FALSE,
verbose = FALSE,
technical = list(NCYCLES = 600)
)
new_model <- mirt::mirt(
new_data,
1,
itemtype = fx$itemtype,
method = "EM",
SE = FALSE,
verbose = FALSE,
technical = list(NCYCLES = 600)
)

linked_free <- aFIPC::autoFIPC(
newformXData = new_model,
oldformYData = old_model,
newformCommonItemNames = new_common_items,
oldformCommonItemNames = old_common_items,
itemtype = fx$itemtype,
checkIPD = FALSE,
tryEM = TRUE,
freeMEAN = TRUE,
forceNormalZeroOne = FALSE,
confirmCommonItems = TRUE
)

linked_fixed <- aFIPC::autoFIPC(
newformXData = new_model,
oldformYData = old_model,
newformCommonItemNames = new_common_items,
oldformCommonItemNames = old_common_items,
itemtype = fx$itemtype,
checkIPD = FALSE,
tryEM = TRUE,
freeMEAN = TRUE,
forceNormalZeroOne = TRUE,
confirmCommonItems = TRUE
)

free_mean <- extract_group_parameter(linked_free$LinkedModel, "MEAN")
fixed_mean <- extract_group_parameter(linked_fixed$LinkedModel, "MEAN")
fixed_cov <- extract_group_parameter(linked_fixed$LinkedModel, "COV")

expect_true(any(free_mean$est))
expect_gt(abs(free_mean$value[1]), fx$expect_shifted_mean_abs_gt)
expect_false(any(fixed_mean$est))
expect_equal(fixed_mean$value[1], 0, tolerance = 1e-8)
expect_false(any(fixed_cov$est))
expect_equal(fixed_cov$value[1], 1, tolerance = 1e-8)
})

test_that("IPD fixture quantitatively filters drifted anchors before linking", {
skip_if_not_installed("mirt")

fx <- regression_fixture_ipd_anchor
set.seed(fx$seed)

old_common_items <- paste0("old_anchor_", seq_len(fx$common_count))
new_common_items <- paste0("new_anchor_", seq_len(fx$common_count))
old_unique_items <- paste0("old_unique_", seq_len(fx$unique_count))
new_unique_items <- paste0("new_unique_", seq_len(fx$unique_count))

old_item_names <- c(old_common_items, old_unique_items)
new_item_names <- c(new_common_items, new_unique_items)

old_a <- matrix(c(0.90, 1.07, 1.18, 0.96, 1.10, 0.87, 1.33, 0.78), ncol = 1)
old_d <- c(-1.10, -0.55, -0.15, 0.20, 0.70, 1.05, -0.40, 0.50)
new_a <- old_a
new_d <- old_d

drift_index <- fx$drift_common_index
new_a[drift_index, 1] <- 1.85
new_d[drift_index] <- 2.20

old_data <- as.data.frame(mirt::simdata(
a = old_a,
d = old_d,
itemtype = rep(fx$itemtype, length(old_item_names)),
N = fx$n_old
))
new_data <- as.data.frame(mirt::simdata(
a = new_a,
d = new_d,
itemtype = rep(fx$itemtype, length(new_item_names)),
N = fx$n_new
))
names(old_data) <- old_item_names
names(new_data) <- new_item_names

old_model <- mirt::mirt(
old_data,
1,
itemtype = fx$itemtype,
method = "EM",
SE = FALSE,
verbose = FALSE,
technical = list(NCYCLES = 600)
)
new_model <- mirt::mirt(
new_data,
1,
itemtype = fx$itemtype,
method = "EM",
SE = FALSE,
verbose = FALSE,
technical = list(NCYCLES = 600)
)

linked <- aFIPC::autoFIPC(
newformXData = new_model,
oldformYData = old_model,
newformCommonItemNames = new_common_items,
oldformCommonItemNames = old_common_items,
itemtype = fx$itemtype,
checkIPD = TRUE,
tryEM = TRUE,
confirmCommonItems = TRUE
)

expect_true("IPDData" %in% names(linked))
expect_true("IPDCommonItemList" %in% names(linked))

retained_old <- as.character(unlist(linked$IPDCommonItemList[1, , drop = TRUE]))
retained_new <- as.character(unlist(linked$IPDCommonItemList[2, , drop = TRUE]))
drifted_old <- old_common_items[drift_index]
drifted_new <- new_common_items[drift_index]

expect_lt(length(retained_old), length(old_common_items))
expect_false(drifted_old %in% retained_old)
expect_false(drifted_new %in% retained_new)

old_values <- mirt::mod2values(old_model)
linked_values <- mirt::mod2values(linked$LinkedModel)
common_count <- length(retained_old)
mean_abs_distance <- numeric(common_count)

for (i in seq_len(common_count)) {
old_item <- retained_old[i]
new_item <- retained_new[i]
old_fixed <- old_values[
old_values$item == old_item & old_values$name %in% c("a1", "d"),
c("name", "value")
]
linked_fixed <- linked_values[
linked_values$item == new_item & linked_values$name %in% c("a1", "d"),
c("name", "value")
]
aligned <- merge(old_fixed, linked_fixed, by = "name", sort = FALSE)
mean_abs_distance[i] <- mean(abs(aligned$value.x - aligned$value.y))
}

expect_lt(mean(mean_abs_distance), 1e-6)
})
Loading