Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
-
Updated
Dec 7, 2024 - Jupyter Notebook
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models
CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning
Black-box AI reliability certification via self-consistency sampling and conformal calibration
The official PyTorch implementation for the Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence
Object oriented package for solving self-consistent mean field theory of interacting lattice systems.
KG-RAG + ToT + multi-agent LLMs for evidence-grounded QA with Neo4j and fine-tuning; reproducible medical case study & evaluation.
Perl implementation of Markov Chain for the course BIO331
Fixed Point solver for generic functions
GSM8K-Consistency is a benchmark database for analyzing the consistency of Arithmetic Reasoning on GSM8K.
Decoding-time harness that lifts small open-weight LLMs (Qwen 2.5 2B Q4) on verifier-friendly benchmarks via parallel self-consistency, per-sample program verifiers, and raw-anchored majority voting.
Subject of Electronic structure for my master's degree
Tactical next-action + reasoning prediction on 348 football match contexts (Shipd Project Eris). 4-component ensemble with task-coupling: DeBERTa-v3-base / large, cross-encoder MCQ scorer, zero-shot NLI, and a three-pass Qwen3.5-35B-A3B-Int4 + Gemma-4-26B-A4B-it MoE fusion with PRM rerank. W&B-instrumented. Target combined ≥ 0.80
An evaluation of prompting techniques (Zero-Shot CoT, Few-Shot, Self-Consistency) on the Mistral-7B model for mathematical reasoning. This project systematically benchmarks 7 distinct methods on the GSM8K dataset.
Advanced prompt engineering techniques: Chain-of-Thought, Tree-of-Thoughts, ReAct, Self-Consistency
Self consistent model based filter design for 3-phase PLLs.
Evaluation framework for self-hosted LLMs. Systematic prompt ablation (baseline, CoT, few-shot, self-consistency voting) on Llama 3.1 8B via lm-evaluation-harness, with Wilson CI statistical analysis, determinism validation, and load testing under concurrency. Found chain-of-thought degrades accuracy 25pp at small scale.
CAP6640-Spring2026: Benchmarks GPT-3.5, GPT-4, Claude Haiku, and Gemini on GSM8k and TruthfulQA, measuring accuracy, self-consistency, and confidence calibration.
A consistency-based firewall for high-stakes Retrieval Augmented Generation (RAG). Queries the model multiple times and incinerates the output if entropy is high (divergent answers), preferring silence over hallucination.
Developing an autonomous system for prompt selection for Large Language Models (LLMs), enhancing performance across tasks by balancing generality and specificity. This project automates diverse, high-quality prompt creation and selection, reducing manual intervention and maximizing LLM utility across applications.
Add a description, image, and links to the self-consistency topic page so that developers can more easily learn about it.
To associate your repository with the self-consistency topic, visit your repo's landing page and select "manage topics."