How long does a backfile conversion project take?

It depends on the volume of records, the condition of the source material, and the level of indexing required. A small project — a few thousand documents — might take weeks. Large county projects with hundreds of thousands of records can take months or longer, especially when OCR quality requires significant human review.

What's the difference between backfile and day-forward indexing?

Backfile indexing means going back through previously stored or scanned documents to add metadata. Day-forward indexing is the ongoing process of indexing new documents as they arrive. Most offices run both in parallel — day-forward continues while backfile progresses in batches.

Do we need to re-scan our documents?

Not necessarily. If your documents were scanned at reasonable resolution (300 DPI or higher) and the images are legible, you can usually work from the existing scans. If scans are low-resolution, skewed, or heavily degraded, re-scanning may be needed for reliable OCR.

Can backfile conversion be done in-house?

Yes, some offices handle backfile projects internally. Others use vendors or a hybrid approach. The decision depends on staff availability, volume, timeline, and whether you have the software tools to support OCR, extraction, and QC at scale.

What Is Backfile Conversion?

A practical overview for county offices planning or evaluating a backfile project.

Definition

Backfile conversion is the process of taking previously recorded documents — often stored as unindexed scans, microfilm, or paper — and converting them into searchable, indexed digital records. The goal is to make historical records as findable and usable as newly recorded documents.

A backfile project typically involves several stages: scanning (if records aren't already digital), image cleanup, OCR, metadata extraction, quality review, and import into a records management or land records system.

Why public offices do it

Most county offices have years or decades of records that were scanned but never indexed with structured metadata. These documents exist as image files — technically digital, but practically unfindable without knowing exactly where to look.

Backfile conversion makes those records searchable. Once indexed, staff can find documents by name, date, document type, parcel number, or instrument number instead of browsing folder structures or relying on institutional knowledge.

Common triggers include:

Migrating to a new document management or land records system
Responding to audit findings about records accessibility
Reducing the time it takes to fulfill public records requests
Retiring legacy systems that are no longer supported
Consolidating records from multiple departments or offices

Backfile vs. day-forward processing

Day-forward processing is the ongoing work of indexing new documents as they come in — each deed, mortgage, or lien gets indexed as it's recorded. Most offices already have a day-forward workflow, even if it's manual.

Backfile conversion is the catch-up work: going back through months, years, or decades of previously recorded documents that were stored but never properly indexed. The two workflows often run in parallel — day-forward continues while backfile progresses in batches.

The practical challenge is that backfile work can't usually disrupt day-forward operations. Staff capacity, system access, and QC bandwidth all need to accommodate both. Projects that plan for this from the start tend to go more smoothly.

What's involved

Scanning and image preparation

If records are still on paper or microfilm, the first step is scanning. For OCR to work reliably, scans should be at least 300 DPI, properly oriented, and free of heavy skew or noise. Image preparation — deskewing, cropping, despeckling — improves downstream extraction quality.

If records were scanned previously, the quality of those scans determines whether re-scanning is necessary. Low-resolution or poorly captured images often produce unreliable OCR output.

OCR and text extraction

Optical character recognition converts scanned images into machine-readable text. Accuracy depends on scan quality, document age, font clarity, and whether the text is typed or handwritten. Modern OCR engines handle clean, typed documents well. Older documents with faded ink, stamps over text, or handwritten entries produce lower accuracy and require more human review.

For a deeper look at how OCR fits into AI-assisted workflows, see the AI document indexing guide.

Metadata extraction and indexing

Once OCR text is available, metadata extraction identifies the structured fields needed for indexing: document type, recording date, grantor and grantee names, legal descriptions, parcel numbers, instrument numbers, and more.

This can be done manually (staff reading and keying in fields), semi-automatically (software suggests values, staff confirm), or with AI-assisted tools that extract fields and flag exceptions for review. The right approach depends on volume, budget, and accuracy requirements.

Quality control and exception review

No extraction method is perfect. Quality control involves reviewing extracted data against the source document, correcting errors, and handling edge cases — documents the system couldn't classify, fields it couldn't extract, or values that don't match expected patterns.

A well-designed QC workflow catches errors before data reaches the target system. The reindexing and QC guide covers this stage in detail.

Import into the target system

The final step is loading indexed data into the destination system — a land records platform, a document management system, or a state portal. This requires mapping extracted fields to the target schema, validating data against business rules, and handling records that fail validation.

Import is often more complex than it appears. Field formats, naming conventions, and required fields can differ between systems. Testing with sample batches before a full import helps catch mapping errors early.

Planning a backfile project

Before starting, offices should consider:

Scope: Which document types and date ranges to include. Trying to do everything at once often leads to delays.
Source material condition: The quality of existing scans or physical records directly affects OCR accuracy and project timeline.
Index fields: Define exactly which metadata fields are needed for each document type before extraction begins.
Acceptance criteria: What accuracy rate is acceptable? How will QC be measured? Define this upfront.
Target system requirements: Understand the import format and validation rules of the destination system before starting extraction.
Staffing: Determine whether the project will be handled in-house, by a vendor, or a combination.

Common risks

Backfile projects often take longer and cost more than initial estimates suggest. The most common reasons:

Exception volume is higher than expected. Teams estimate based on clean documents but undercount the records that need manual review — older documents, poor scans, and unusual formats all increase exceptions.
Target system import is treated as an afterthought. Starting extraction before defining the target schema and field mapping leads to rework. Test imports early, not at the end.
OCR accuracy varies across document types and eras. A model that works well on 2010-era typed deeds may struggle with 1970s handwritten instruments. Pilot across representative samples, not just easy ones.
Multi-page and attachment handling is overlooked. Documents that span multiple pages or include embedded exhibits often break workflows built for single-page instruments.
No pilot phase. Going straight to full-volume production without piloting on a representative sample is the most common source of project delays.
Staff capacity isn't accounted for. If the same team handles both day-forward and backfile, their available hours for backfile are less than planned.

Disclaimer: This guide is educational in nature. It is not legal advice or a substitute for consulting with your office's legal counsel or state records management agency.

Frequently Asked Questions

Related Guides

AI Document Indexing for County Records

How AI-assisted indexing works — OCR, extraction, exception review, and realistic expectations.

Read guide

Reindexing, Quality Control, and Imports

Cleaning up legacy index data, building QC workflows, and importing into downstream systems.

Read guide

Public Records Indexing in Ohio

State-specific guide for Ohio county recorders — auditor pre-approval, ORC formatting standards, and NHPRC digitization grants.

Read guide

Public Records Indexing in Illinois

State-specific guide for Illinois county recorders of deeds — race-notice recording, Cook County merger, and digitization grants.

Read guide