Skip to content

[#746] establish default order for replicas listed by an iRODSDataObject#815

Open
d-w-moore wants to merge 21 commits intoirods:mainfrom
d-w-moore:746.m
Open

[#746] establish default order for replicas listed by an iRODSDataObject#815
d-w-moore wants to merge 21 commits intoirods:mainfrom
d-w-moore:746.m

Conversation

@d-w-moore
Copy link
Copy Markdown
Collaborator

@d-w-moore d-w-moore commented Apr 15, 2026

The parent data object's modify_time and replica_status fields , as well as some others, actually pertain more to individual replicas.

#747 was an old PR meant to address the issue and contains much discussion as well.

On consideration, I think a minor release is the proper place to address this, and I'm doing it by

  • setting a default sorter that for data_objects.get ( or anytime running the iRODSDataObject constructor) sorts replicas of the data object first by the replica-"goodness" and secondly by reverse chronology of the replica modify_time (ie most recent first.) The replica at array position [0] will then determine the values of the fields discussed above.
  • deciding for the time being not to deprecate anything. yet. To me it makes natural sense to allow modify_time and replica_status to be accessed from the "head" object.

So, this PR replaces the old one, #747 , due to being new work and being based on top of source code conveniently ruff-formatted.

@korydraughn
Copy link
Copy Markdown
Contributor

  • setting a default sorter that for data_objects.get ( or anytime running the iRODSDataObject constructor) sorts replicas of the data object first by the replica-"goodness" and secondly by reverse chronology of the replica modify_time (ie most recent first.) The replica at array position [0] will then determine the values of the fields discussed above.

Keep in mind that for a minor release, we cannot change the behavior of any public APIs. If the default sorter results in the output being different, then that's a no go. The default sorter must mirror the original behavior.

  • deciding for the time being not to deprecate anything. yet.

What are you referring to in regard to deprecation?

To me it makes natural sense to allow modify_time and replica_status to be accessed from the "head" object.

What does this mean?

@d-w-moore
Copy link
Copy Markdown
Collaborator Author

  • setting a default sorter that for data_objects.get ( or anytime running the iRODSDataObject constructor) sorts replicas of the data object first by the replica-"goodness" and secondly by reverse chronology of the replica modify_time (ie most recent first.) The replica at array position [0] will then determine the values of the fields discussed above.

Keep in mind that for a minor release, we cannot change the behavior of any public APIs. If the default sorter results in the output being different, then that's a no go. The default sorter must mirror the original behavior.

  • deciding for the time being not to deprecate anything. yet.

What are you referring to in regard to deprecation?

We'd discussed in the old issue/PR convo's whether we might not just deprecate the iRODSDataObject fields like replica status and modify_time that are really just a reflection of the corresponding attribute of replicas[0]

To me it makes natural sense to allow modify_time and replica_status to be accessed from the "head" object.

What does this mean?

Just that .replicas[0].FIELD is mirrored in .FIELD, but that is pretty natural.
I guess we could actually just make them properties, rather than duplicating the data. But that is low priority.

@d-w-moore
Copy link
Copy Markdown
Collaborator Author

@korydraughn - I'm fine with changing the default order back to sorting on replica number for this minor release, even if it will allow attributes such as dataObject.modify_time to continue to misrepresent the "information advertised" .... It's but a minor code change to allow the application writer to sort differently if they so desire....

Comment thread irods/test/data_obj_test.py Outdated
Comment thread irods/data_object.py
Comment thread irods/test/data_obj_test.py Outdated
Comment thread irods/test/data_obj_test.py Outdated
@korydraughn
Copy link
Copy Markdown
Contributor

We'd discussed in the old issue/PR convo's whether we might not just deprecate the iRODSDataObject fields like replica status and modify_time that are really just a reflection of the corresponding attribute of replicas[0]

Oh right. That still sounds like an acceptable approach.

Just that .replicas[0].FIELD is mirrored in .FIELD, but that is pretty natural. I guess we could actually just make them properties, rather than duplicating the data. But that is low priority.

I'm not yet convinced that is the proper approach. Feels like it should be handled via support functions which simplify the find-replica step.

Do instances of iRODSDataObject always have the list of replicas? If so, then they can sort/search the list of replicas for what they need. Perhaps that's how the iRODSDataObject constructor works in this PR?

@d-w-moore d-w-moore force-pushed the 746.m branch 2 times, most recently from 0cc7227 to a9c4e99 Compare April 27, 2026 13:38
Comment thread irods/test/data_obj_test.py
Comment thread irods/data_object.py Outdated
Comment thread irods/data_object.py Outdated
@d-w-moore
Copy link
Copy Markdown
Collaborator Author

d-w-moore commented May 5, 2026

Have I addressed this already, @korydraughn ?

I'm not yet convinced that is the proper approach. Feels like it should be handled via support functions which simplify the find-replica step.

If not, perhaps you could elaborate? So should we be changing the interface in regards to how replicas are actually found; or do we actually only want one replica represented by any iRODSDataObject instance? I could not tell which interpretation you meant.

@korydraughn
Copy link
Copy Markdown
Contributor

I don't know if you've addressed it.

If not, perhaps you could elaborate?

I'm highlighting that providing the list of replicas is all that's needed. We do not need to mirror properties from replica N to the iRODSDataObject instance. Doing that will be wasteful if the user doesn't use it. The user can filter the list of replicas however they like.

I didn't see a response to this question.

Do instances of iRODSDataObject always have the list of replicas? If so, then they can sort/search the list of replicas for what they need. Perhaps that's how the iRODSDataObject constructor works in this PR?

@d-w-moore
Copy link
Copy Markdown
Collaborator Author

d-w-moore commented May 6, 2026

I didn't see a response to this question.

Do instances of iRODSDataObject always have the list of replicas? If so, then they can sort/search the list of replicas for what they need. Perhaps that's how the iRODSDataObject constructor works in this PR?

The replica list is always there, and - yes - initialized and sorted in the iRODSDataObject constructor, which now takes the option of replica_sort_function. The sorting default doesn't change with this PR; it is still numerically by ascending replica number.

In this PR we at least allow the possibility of feeding in an alternative value for replica_sort_function , either directly into that constructor or indirectly via <session>.data_objects.get( path, ...). We also add _REPLICA_FITNESS_SORT_KEY_FN as a sort option (see the new test) which is different than the default sort order, in that it sorts primarily by the "goodness" of replica status and secondarily descending by mtime. This seems likely to become the default in v4.0.0.

@korydraughn
Copy link
Copy Markdown
Contributor

Makes sense to me.

The leading underscore on the sort functions - consider making those public (i.e. remove leading underscore?). That will keep users from needing to implement their own version of them.

@d-w-moore
Copy link
Copy Markdown
Collaborator Author

Makes sense to me.

The leading underscore on the sort functions - consider making those public (i.e. remove leading underscore?). That will keep users from needing to implement their own version of them.

sounds good. and, will wait for tests to pass before squashing.

@alanking
Copy link
Copy Markdown
Contributor

alanking commented May 6, 2026

Please notify when ready for one last pass over.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants