Digitize Document Routing with Intelligent Indexing

Business Results

Automate document routing and reduce manual labor costs

96% reduction in training time

60% faster inferencing time

46% faster data preprocessing

65% accuracy of prediction

Performance benchmark comparing stock API to oneAPI as performed by Accenture.

View All Reference Kits

Background

Enterprises use intelligent document analysis (IDA) to examine documents (such as policies, contracts, and legal agreements) for specific terms, and then identify those documents that may pose a risk to the business. IDA can also identify a particular document (such as legal, finance, or marketing) so that it can be categorized and routed to an appropriate department.

Paper-based documents still account for 46% of all records, which represents substantial costs to public sector organizations. An average government agency receives and manually routes approximately 3.5 million documents annually.^† Manual routing takes seven to ten minutes per document to read the letter or document before routing it. This manual process is time-consuming and costly.

The majority of documents managed by intelligent document processing (IDP) solutions are structured or semi-structured, leaving a significant portion of unstructured documents unmanaged. AI can make automated processing and categorizing of documents—structured, semi-structured, and unstructured—more cost-effective.

Solution

Term frequency-inverse document frequency (TF-IDF) was used to measure and quantify the importance or relevance of string representations in the documents. A support vector classification (SVC) model was trained to categorize the documents. The publicly available dataset^‡ used in the training contained about 200K topic-related documents obtained from HuffPost*. Dataset text was cleaned using stop word removal, stemming, and tokenization. The supervised training model classifies the document based on the headline into 42 predetermined categories, such as entertainment or politics.

The data ingest and text processing was optimized using Intel® Distribution of Modin* and processed 46% faster than stock Modin. Training and inferencing of the SVC model were optimized using Intel® Extension for Scikit-learn*. The optimizations improved training time by 96% and inferencing time by 60%. Reviewing and sorting the documents had an accuracy of 65%. Intel Distribution of Modin and Intel Extension for Scikit-learn are part of Intel’s end-to-end AI software portfolio of tools and framework optimizations that are powered by oneAPI.

Technology

Optimized with Intel oneAPI for Better Performance

Data processing with TF-IDF and Intel Distribution of Modin
SVC with Intel Extension for Scikit-learn
Amazon EC2* M6i with 3rd generation Intel® Xeon® Scalable processor

Benefits

Data scientists can build a better IDP solution to address the semi-structured and unstructured documents. The time saved in training and inference allows data scientists to put more AI models into production.

Government organizations can automate the processing and categorization of more incoming semi-structured and unstructured documents and realize cost savings.

Benefits include:

Less time needed to build the machine learning pipeline with an instruction set from data ingest to model development to deployment
Compute savings from faster data preprocessing, model training, and inferencing time using oneAPI optimizations from Intel
Optimized performance using your compute of choice (such as CPU, GPU, or FPGA) with oneAPI interoperability across hardware architectures

Download Kit

Additional Resources

Intel Extension for Scikit-learn

Intel® AI Software Portfolio

Developer Resources from Intel and Accenture

References

† IDC Survey Spotlight: What Types of Documents Are Organizations Managing with Intelligent Document Processing (IDP) Solutions, April 2021 (Available by paid subscription only.)

‡ News Category Dataset, Kaggle, Inc. Licensed under Creative Commons 1.0 Universal (CC0 1.0) Public Domain Dedication

Stay Up to Date on AI Workload Optimizations

Sign up to receive hand-curated technical articles, tutorials, developer tools, training opportunities, and more to help you accelerate and optimize your end-to-end AI and data science workflows.

Take a chance and subscribe. You can change your mind at any time.

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Digitize Document Routing with Intelligent Indexing

Business Results

Background

Solution

Technology

Benefits

Additional Resources

References

Stay Up to Date on AI Workload Optimizations

You’re In!

Failed to submit your form.

Form Submission Failed

Product and Performance Information

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Digitize Document Routing with Intelligent Indexing

Business Results

Background

Solution

Technology

Benefits

Additional Resources

References

Stay Up to Date on AI Workload Optimizations

You’re In!

Failed to submit your form.

Form Submission Failed

Your registration cannot proceed. The materials on this site are subject to U.S. and other applicable export control laws and are not accessible from all locations.

Product and Performance Information