Read/write support for INSDC XML records — BioProject, Study, Sample, Experiment, Run, Analysis, Submission, Receipt — as a direct dependency of BioFSharp.
| Package | Purpose |
|---|---|
BioFSharp.FileFormats.INSDC |
C# type model auto-generated from the ENA SRA XSDs via dotnet-xscgen. |
BioFSharp.IO.INSDC |
F# wrapper exposing read / readString / write / writeString per INSDC entity. |
The C# split exists because there is no F# equivalent of XmlSchemaClassGenerator. Both packages target netstandard2.0.
.
├── build/ FAKE build project
├── docs/ Placeholder — no fsdocs site is published from this repo
├── plans/implementation.md Authoritative implementation plan
├── src/
│ ├── BioFSharp.FileFormats.INSDC/ C# generated type model
│ │ ├── schemas/ Committed ENA XSDs
│ │ └── Generated/ Tool output — do not hand-edit
│ └── BioFSharp.IO.INSDC/ F# wrapper
└── tests/BioFSharp.INSDC.Tests/ xunit tests, with committed ENA fixtures
First-time setup:
dotnet tool restore # installs the pinned dotnet-xscgenThen:
build.cmd # Windows
./build.sh # macOS / LinuxOther targets:
build.cmd runtests
build.cmd pack
build.cmd regenerateInsdcTypes # only when the XSDs changedotnet xscgen derives C# type names mechanically from the XSDs, which produces verbose names like AnalysisTypeAnalysisTypeTranscriptomeAssembly. We clean these up via src/BioFSharp.FileFormats.INSDC/schemas/typename-substitutions.txt, passed to the tool with --tnsf. The substitution file:
- Has one rule per line in the form
A:<xscgen-default-name>=<substitute>(theA:prefix matches any type/member; lines starting with#are comments). - Documents its naming rules (A–F) in a header block — read those before adding rules so renames stay consistent.
- Is the only place to change a generated type's name; never hand-edit files under
Generated/.
To add or change a substitution:
- Edit
typename-substitutions.txt. The left side is the name xscgen would emit without any substitution (the original XSD-derived path); the right side is the desired C# identifier. Pick a substitute that does not collide with another type — xscgen falls back to a generic name (e.g.<Name>Item) if the substitute clashes with an existing default. - Run
build.cmd regenerateInsdcTypes(or./build.sh regenerateInsdcTypes). - Commit both the updated substitution file and the regenerated files under
src/BioFSharp.FileFormats.INSDC/Generated/.
Caveats:
- xscgen's substitution file does not accept regex or dotted/nested names —
Foo.Barwould emit invalid C# (class Foo.Bar). Substitutes must be flat C# identifiers. - Stay consistent with the rules already documented in the file's header. If a rename does not fit any existing rule, add a new lettered rule alongside the others.
See AGENTS.md for repo conventions and plans/implementation.md for the implementation roadmap.
Documentation lives in the base BioFSharp docs rather than in this repo.
