Falcon-OCR on Linux/Transformers needs multiple compatibility fixes, not just eager attention

### Bug

On the Linux `TransformersVlmEngine` path, the `falcon_ocr` preset currently needs a small bundle of Falcon-specific compatibility fixes. Forcing eager attention is necessary, but it is not sufficient on its own.

This appears to be separate from:
- #3270 (MLX engine overrides for Apple Silicon)
- #3273 (Transformers v5 tokenizer alias issue for `TokenizersBackend`)

Inference is local. The failures happen while loading and running the local model through Transformers with `trust_remote_code=True`.

### Steps to reproduce

On Linux, with a Docling setup that resolves `falcon_ocr` to the Transformers engine:

```python
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import VlmConvertOptions, VlmPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.pipeline.vlm_pipeline import VlmPipeline

vlm_options = VlmConvertOptions.from_preset("falcon_ocr")
pipeline_options = VlmPipelineOptions(
    vlm_options=vlm_options,
    allow_external_plugins=True,
    enable_remote_services=True,
)
converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(
            pipeline_cls=VlmPipeline,
            pipeline_options=pipeline_options,
        )
    }
)
converter.convert("sample.pdf")
```

### Actual behavior

On the Linux/Transformers path we hit a sequence of Falcon-specific failures:

- attention backend dispatch: Falcon-OCR does not support SDPA yet, so the model must be loaded with public `attn_implementation="eager"`
- config initialization: the eager setting must be present on the actual Falcon config object used for model init, not just inferred later
- generation config loading: the Falcon repo does not ship `generation_config.json`, so `GenerationConfig.from_model_config(...)` fallback is needed
- prompt formatting: Falcon does not ship a usable chat template for the generic Transformers VLM path
- inference path: Falcon's remote-code model needs to use its native OCR generation entrypoints instead of Docling's generic chat-template processor flow

Relevant stack shape:

- `docling.models.inference_engines.vlm.factory.create_vlm_engine`
- `AutoInlineVlmEngine`
- `TransformersVlmEngine`
- `model_cls.from_pretrained(...)`
- `FalconOCRForCausalLM`

The underlying module is loaded from the HF cache, e.g.:

```text
.../.cache/huggingface/modules/transformers_modules/.../modeling_falcon_ocr.py
```

### Expected behavior

`falcon_ocr` should initialize and run successfully on Linux when Docling routes it through the Transformers engine.

### Suggested fix direction

Docling should treat Falcon-OCR as a small Transformers compatibility special case on the Linux path:

- honor explicit public `attn_implementation` overrides
- default Falcon-OCR to eager attention on the Transformers preset
- preload the Falcon config with eager attention before model construction
- fall back when `generation_config.json` is missing
- bypass the generic chat-template prompt path and use Falcon's native OCR generation flow

### Status

Tracked in #3279.

### Docling version

2.86.0

### Python version

3.13.13

### Additional environment details

- `transformers==5.5.3`
- observed on Linux CUDA path
- Apple Silicon MLX path is not affected because it avoids the Transformers engine for this preset


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Falcon-OCR on Linux/Transformers needs multiple compatibility fixes, not just eager attention #3278

Bug

Steps to reproduce

Actual behavior

Expected behavior

Suggested fix direction

Status

Docling version

Python version

Additional environment details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Falcon-OCR on Linux/Transformers needs multiple compatibility fixes, not just eager attention #3278

Description

Bug

Steps to reproduce

Actual behavior

Expected behavior

Suggested fix direction

Status

Docling version

Python version

Additional environment details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions