Bug
On the Linux TransformersVlmEngine path, the falcon_ocr preset currently needs a small bundle of Falcon-specific compatibility fixes. Forcing eager attention is necessary, but it is not sufficient on its own.
This appears to be separate from:
Inference is local. The failures happen while loading and running the local model through Transformers with trust_remote_code=True.
Steps to reproduce
On Linux, with a Docling setup that resolves falcon_ocr to the Transformers engine:
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import VlmConvertOptions, VlmPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.pipeline.vlm_pipeline import VlmPipeline
vlm_options = VlmConvertOptions.from_preset("falcon_ocr")
pipeline_options = VlmPipelineOptions(
vlm_options=vlm_options,
allow_external_plugins=True,
enable_remote_services=True,
)
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_cls=VlmPipeline,
pipeline_options=pipeline_options,
)
}
)
converter.convert("sample.pdf")
Actual behavior
On the Linux/Transformers path we hit a sequence of Falcon-specific failures:
- attention backend dispatch: Falcon-OCR does not support SDPA yet, so the model must be loaded with public
attn_implementation="eager"
- config initialization: the eager setting must be present on the actual Falcon config object used for model init, not just inferred later
- generation config loading: the Falcon repo does not ship
generation_config.json, so GenerationConfig.from_model_config(...) fallback is needed
- prompt formatting: Falcon does not ship a usable chat template for the generic Transformers VLM path
- inference path: Falcon's remote-code model needs to use its native OCR generation entrypoints instead of Docling's generic chat-template processor flow
Relevant stack shape:
docling.models.inference_engines.vlm.factory.create_vlm_engine
AutoInlineVlmEngine
TransformersVlmEngine
model_cls.from_pretrained(...)
FalconOCRForCausalLM
The underlying module is loaded from the HF cache, e.g.:
.../.cache/huggingface/modules/transformers_modules/.../modeling_falcon_ocr.py
Expected behavior
falcon_ocr should initialize and run successfully on Linux when Docling routes it through the Transformers engine.
Suggested fix direction
Docling should treat Falcon-OCR as a small Transformers compatibility special case on the Linux path:
- honor explicit public
attn_implementation overrides
- default Falcon-OCR to eager attention on the Transformers preset
- preload the Falcon config with eager attention before model construction
- fall back when
generation_config.json is missing
- bypass the generic chat-template prompt path and use Falcon's native OCR generation flow
Status
Tracked in #3279.
Docling version
2.86.0
Python version
3.13.13
Additional environment details
transformers==5.5.3
- observed on Linux CUDA path
- Apple Silicon MLX path is not affected because it avoids the Transformers engine for this preset
Bug
On the Linux
TransformersVlmEnginepath, thefalcon_ocrpreset currently needs a small bundle of Falcon-specific compatibility fixes. Forcing eager attention is necessary, but it is not sufficient on its own.This appears to be separate from:
TokenizersBackend)Inference is local. The failures happen while loading and running the local model through Transformers with
trust_remote_code=True.Steps to reproduce
On Linux, with a Docling setup that resolves
falcon_ocrto the Transformers engine:Actual behavior
On the Linux/Transformers path we hit a sequence of Falcon-specific failures:
attn_implementation="eager"generation_config.json, soGenerationConfig.from_model_config(...)fallback is neededRelevant stack shape:
docling.models.inference_engines.vlm.factory.create_vlm_engineAutoInlineVlmEngineTransformersVlmEnginemodel_cls.from_pretrained(...)FalconOCRForCausalLMThe underlying module is loaded from the HF cache, e.g.:
Expected behavior
falcon_ocrshould initialize and run successfully on Linux when Docling routes it through the Transformers engine.Suggested fix direction
Docling should treat Falcon-OCR as a small Transformers compatibility special case on the Linux path:
attn_implementationoverridesgeneration_config.jsonis missingStatus
Tracked in #3279.
Docling version
2.86.0
Python version
3.13.13
Additional environment details
transformers==5.5.3