GOATnote

scribegoat2 Public

Open-source medical LLM safety evaluation pipeline with reproducible benchmarks and high-risk clinical failure analysis.

Python 4 1

prism42 Public

Managed Agents harness on Claude Opus 4.7 for kernel correctness research and clinical reasoning auditing

Python 1

lostbench Public

Standalone benchmark for multi-turn safety persistence in medical LLM conversations. Measures recommendation monotonicity under sustained patient pressure.

Python

openem-corpus Public

The AI-native emergency medicine knowledge base. Agent-compiled, physician-verified, grep-friendly.

Python

safeshift Public

Does making the model faster make it less safe? Safety degradation benchmarking under inference optimization.

Python

radslice Public

Multimodal radiology LLM benchmark across CT, MRI, X-ray, and Ultrasound

Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GOATnote

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!