The new feature targets use cases such as invoices, purchase orders, resumes, fraud detection, and contracts, replacing traditional data extraction methods that rely on manual labeling, rigid templates, and layout-dependent rules.
According to Oracle, Generative Extraction enables businesses to specify the fields they wish to extract using natural language, without extensive model training or complex configuration. The system understands the meaning of the fields and consistently extracts them from semi-structured and unstructured documents, even when formats and layouts vary significantly.
How the New Technology Works
The solution leverages state-of-the-art multimodal vision models, which analyze document content and return structured outputs in JSON format.
Additionally, it incorporates specialized pre- and post-processing logic to improve accuracy, ensure result consistency, and reduce hallucinations that often occur with general AI models.
Key capabilities include:
- Understanding fields through natural language descriptions.
- Learning from a limited number of examples when higher accuracy is required.
- Supporting multi-page, multilingual, and mixed-layout documents.
- Normalizing values into a unified data schema.
- Full compatibility with existing Custom KV workflows, without requiring changes to the processing pipeline.
Why It Matters for Businesses
Oracle emphasizes that general generative AI models alone are insufficient for high-accuracy data extraction in environments with diverse document types. Generative Extraction is specifically designed for production use, with built-in guardrails to ensure predictable behavior and consistent results at scale.
Moreover, it significantly reduces time-to-market for new applications by minimizing the need for labeling, model retraining, and rule maintenance. This allows businesses to automate document-intensive processes more quickly and scale operations more efficiently.
