Skip to content

Commit a62c1ac

Browse files
committed
LayoutLMv2
LayoutLMv2
1 parent c3a7196 commit a62c1ac

10 files changed

Lines changed: 33439 additions & 0 deletions

deep-learning/.DS_Store

0 Bytes
Binary file not shown.

deep-learning/Transformer-Tutorials/LayoutLMv2/CORD/Fine_tuning_LayoutLMv2ForTokenClassification_on_CORD.ipynb

Lines changed: 3001 additions & 0 deletions
Large diffs are not rendered by default.

deep-learning/Transformer-Tutorials/LayoutLMv2/CORD/Prepare_CORD_for_LayoutLMv2.ipynb

Lines changed: 2406 additions & 0 deletions
Large diffs are not rendered by default.

deep-learning/Transformer-Tutorials/LayoutLMv2/DocVQA/Fine_tuning_LayoutLMv2ForQuestionAnswering_on_DocVQA.ipynb

Lines changed: 3459 additions & 0 deletions
Large diffs are not rendered by default.

deep-learning/Transformer-Tutorials/LayoutLMv2/FUNSD/Fine_tuning_LayoutLMv2ForTokenClassification_on_FUNSD.ipynb

Lines changed: 7766 additions & 0 deletions
Large diffs are not rendered by default.

deep-learning/Transformer-Tutorials/LayoutLMv2/FUNSD/Fine_tuning_LayoutLMv2ForTokenClassification_on_FUNSD_using_HuggingFace_Trainer.ipynb

Lines changed: 4528 additions & 0 deletions
Large diffs are not rendered by default.

deep-learning/Transformer-Tutorials/LayoutLMv2/FUNSD/Inference_with_LayoutLMv2ForTokenClassification.ipynb

Lines changed: 3968 additions & 0 deletions
Large diffs are not rendered by default.

deep-learning/Transformer-Tutorials/LayoutLMv2/FUNSD/True_inference_with_LayoutLMv2ForTokenClassification_+_Gradio_demo.ipynb

Lines changed: 1158 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# LayoutLMv2 notebooks
2+
In this directory, you can find several notebooks that illustrate how to use LayoutLMv2 both for fine-tuning on custom data as well as inference. I've split up the notebooks according to the different downstream datasets:
3+
4+
- CORD (form understanding)
5+
- DocVQA (visual question answering on documents)
6+
- FUNSD (form understanding)
7+
- RVL-DIP (document image classification)
8+
9+
I've implemented LayoutLMv2 (and LayoutXLM) in the same way as other models in the Transformers library. You have:
10+
- `LayoutLMv2ForSequenceClassification`, which you can use to classify document images (an example dataset is [RVL-CDIP](https://www.cs.cmu.edu/~aharley/rvl-cdip/)). This model adds a sequence classification head on top of the base `LayoutLMv2Model`, and returns `logits` of shape `(batch_size, num_labels)` (similar to `BertForSequenceClassification`).
11+
- `LayoutLMv2ForTokenClassification`, which you can use to annotate words appearing in a document image (example datasets here are [CORD](https://github.com/clovaai/cord), [FUNSD](https://guillaumejaume.github.io/FUNSD/), [SROIE](https://rrc.cvc.uab.es/?ch=13), [Kleister-NDA](https://github.com/applicaai/kleister-nda)). This model adds a token classification head on top of the base `LayoutLMv2Model`, and treats form understanding as a sequence labeling/named-entity recognition (NER) problem. It returns `logits` of shape `(batch_size, sequence_length, num_labels)` (similar to `BertForTokenClassification`).
12+
- `LayoutLMv2ForQuestionAnswering`, which you can use to perform extractive visual question answering on document images (an example dataset here is [DocVQA](https://docvqa.org/)). This model adds a question answering head on top of the base `LayoutLMv2Model`, and returns `start_logits` and `end_logits` (similar to `BertForQuestionAnswering`).
13+
14+
The full documentation (which also includes tips on how to use `LayoutLMv2Processor`) can be found [here](https://huggingface.co/transformers/model_doc/layoutlmv2.html).
15+
16+
The models on the hub can be found [here](https://huggingface.co/models?search=layoutlmv2).
17+
18+
Note that there's also a Gradio demo available for LayoutLMv2, hosted as a HuggingFace Space [here](https://huggingface.co/spaces/nielsr/LayoutLMv2-FUNSD).

0 commit comments

Comments
 (0)