Make extraction models robust
Well-designed models work reliably across different document variations.
Start with AI generation
The best way to create a robust model is to let Moby generate it:
- Click Generate with AI
- Provide a prompt, upload a workpaper, or select sample documents
- Review and refine the suggested fields
AI generation creates descriptive field names and handles common variations automatically.
Example of a good model
Here's what a well-designed invoice extraction model looks like:
| Field Name | Description |
|---|---|
| Invoice Number | Invoice reference, usually at top. May appear as "Invoice #", "Inv No.", or just a number. |
| Invoice Date | Date of invoice, typically near the invoice number. Format varies (DD/MM/YYYY, MM/DD/YYYY, etc.) |
| Vendor Name | Company that issued the invoice. Usually in header or letterhead. |
| Total Amount | Final amount due, usually at bottom right. May be labeled "Total", "Amount Due", "Grand Total". Includes tax. |
| Due Date | Payment deadline. May appear as "Due Date", "Payment Due", or "Pay By". |
What makes this good:
- Descriptive field names (not just "Amount" or "Date")
- Descriptions explain where to find the value
- Mentions common label variations
- Notes format differences
Refine descriptions for edge cases
If extraction misses values, improve the field description:
Before: "Invoice total"
After: "Total amount due on the invoice, usually at the bottom right. May be labeled as 'Total', 'Amount Due', 'Grand Total', or 'Balance Due'. Includes tax and shipping if applicable."
Test across clients
A model that works for one client's invoices may need adjustment for another.
Testing workflow:
- Create initial model based on sample documents
- Test on 5-10 documents from each client type
- Note any extraction errors
- Refine descriptions to handle variations
- Re-test and iterate
When to create new models
Create a new model when:
- Document format is significantly different
- Different fields need to be extracted
- Existing model accuracy is below 90%
Use the same model when:
- Documents are similar in structure
- Only minor variations in layout
- Same fields need to be extracted