Skip to main content

Make extraction models robust

Well-designed models work reliably across different document variations.

Start with AI generation

The best way to create a robust model is to let Moby generate it:

  1. Click Generate with AI
  2. Provide a prompt, upload a workpaper, or select sample documents
  3. Review and refine the suggested fields

AI generation creates descriptive field names and handles common variations automatically.

Example of a good model

Here's what a well-designed invoice extraction model looks like:

Field NameDescription
Invoice NumberInvoice reference, usually at top. May appear as "Invoice #", "Inv No.", or just a number.
Invoice DateDate of invoice, typically near the invoice number. Format varies (DD/MM/YYYY, MM/DD/YYYY, etc.)
Vendor NameCompany that issued the invoice. Usually in header or letterhead.
Total AmountFinal amount due, usually at bottom right. May be labeled "Total", "Amount Due", "Grand Total". Includes tax.
Due DatePayment deadline. May appear as "Due Date", "Payment Due", or "Pay By".

What makes this good:

  • Descriptive field names (not just "Amount" or "Date")
  • Descriptions explain where to find the value
  • Mentions common label variations
  • Notes format differences

Refine descriptions for edge cases

If extraction misses values, improve the field description:

Before: "Invoice total"

After: "Total amount due on the invoice, usually at the bottom right. May be labeled as 'Total', 'Amount Due', 'Grand Total', or 'Balance Due'. Includes tax and shipping if applicable."

Test across clients

A model that works for one client's invoices may need adjustment for another.

Testing workflow:

  1. Create initial model based on sample documents
  2. Test on 5-10 documents from each client type
  3. Note any extraction errors
  4. Refine descriptions to handle variations
  5. Re-test and iterate

When to create new models

Create a new model when:

  • Document format is significantly different
  • Different fields need to be extracted
  • Existing model accuracy is below 90%

Use the same model when:

  • Documents are similar in structure
  • Only minor variations in layout
  • Same fields need to be extracted