When is reindexing necessary?

Reindexing is necessary when migrating to a new system with different metadata requirements, when legacy index data is incomplete or inconsistent across decades of entry, when merging records from multiple sources, or when adopting new metadata standards the existing index doesn't support.

What's the difference between reindexing and backfile conversion?

Backfile conversion indexes documents that were never indexed — scanned images with no metadata. Reindexing corrects, expands, or reformats index data that already exists but is inadequate for current needs. Both may involve OCR and extraction, but reindexing starts with existing data that needs repair rather than a blank slate.

How do you measure indexing quality?

Common metrics include field-level accuracy rate (percentage of fields matching the source document), exception rate (percentage of documents requiring manual review), and completeness rate (percentage of required fields populated). Define acceptance criteria before starting — not after the first batch ships.

What format is index data typically exported in?

Common formats include CSV, XML, and fixed-width text files. The required format depends on the target system. Some land records platforms accept direct API imports; others require specific file formats with particular field delimiters and encoding.

Why run a pilot batch before full production?

A pilot batch — typically 500 to 2,000 records — reveals the real exception rate, surfaces edge cases in document types and formatting, and exposes mismatches between your extraction schema and the target system's import requirements. It is far cheaper to discover these issues on a small batch than to rework tens of thousands of records.

What is metadata normalization?

Metadata normalization converts inconsistent field values into a single standard format — standardizing dates, expanding abbreviations, correcting common misspellings, and mapping legacy codes to current classification schemes. The goal is consistent, searchable data across the entire record set.

Reindexing, Quality Control, and Import Workflows

The downstream stages of an indexing project — where accuracy is built and data reaches production.

When reindexing is necessary

Reindexing is the process of correcting, expanding, or reformatting metadata on documents that were previously indexed. It's distinct from backfile conversion (indexing documents for the first time) and is usually triggered by one of several situations:

System migration: The new system requires fields the old system didn't capture, or uses different formats. A platform from 2005 may have stored grantor/grantee names in a single field; the replacement may require separate first, middle, and last name fields.
Inconsistent legacy data: Decades of different staff, different systems, and different standards have left index data that doesn't conform to a single schema.
Merging data sources: Combining records from multiple departments or systems into a single repository, each with different field names and validation rules.
Metadata standard changes: Adopting state-mandated or industry metadata standards that differ from what was previously used — common when counties move to statewide recording platforms.

Reindexing projects often involve both automated re-extraction and manual review. The goal is to bring legacy index data up to a consistent standard before importing it into the target system.

Common quality issues in legacy index data

Before building a QC workflow, it helps to understand the error profile of your existing data. The most common issues:

Inconsistent date formats: The same office may have records indexed as MM/DD/YYYY, YYYY-MM-DD, and Month DD, YYYY depending on who entered the data and when.
Truncated or abbreviated names: "Wm." for William, "Jas." for James, partial last names cut off by field-length limits in older systems.
Missing required fields: Instrument numbers, recording dates, or document types left blank — sometimes because the field didn't exist in an earlier system version.
Duplicate records: The same document indexed twice under different instrument numbers, often caused by system migrations that didn't de-duplicate.
Incorrect document-type codes: A deed recorded as a mortgage, or a lien release classified as "miscellaneous" because the clerk didn't have a matching code.
OCR artifacts in index fields: If earlier indexing relied on OCR without human review, errors like "l" for "1" and "O" for "0" may be embedded in the data.

Understanding what's wrong with your existing data is the first step toward scoping the effort to fix it.

Exception handling and validation

Exception handling routes documents that don't pass automated checks to a human reviewer. It is the most important mechanism for maintaining data quality at scale.

What triggers an exception

Low-confidence extraction: The system extracted a value but isn't confident it's correct — common with handwritten entries, faded ink, or stamps overlapping text
Validation rule failures: A date falls outside the expected range, a legal description doesn't match the expected format, or a party name contains unexpected characters
Missing required fields: A mandatory field couldn't be populated from the source document
Classification uncertainty: The system can't determine the document type with sufficient confidence

For more on how AI-assisted extraction handles confidence scoring, see the AI indexing guide.

Building a validation layer

Validation rules should reflect the actual requirements of your target system, not abstract notions of "clean data." Practical validation includes:

Date-range checks (recording dates that predate the county's existence are likely errors)
Format validation (instrument numbers should follow a known pattern)
Cross-field consistency (a document typed as "Deed" should have grantor and grantee fields populated)
Duplicate detection (flag records with identical instrument numbers or suspiciously similar metadata)

The goal is not to eliminate all exceptions — it's to ensure the exceptions that reach reviewers are genuine ambiguities rather than easily catchable formatting issues.

Metadata normalization

Normalization converts inconsistent field values into a single standard format. This is often the most labor-intensive part of a reindexing project, and the part most frequently underestimated.

Common normalization tasks

Name standardization: Expanding abbreviations ("Wm." → "William"), correcting common misspellings, and splitting combined name fields into first/middle/last components
Date format unification: Converting all dates to a single format — typically ISO 8601 (YYYY-MM-DD) for system interoperability
Document-type mapping: Creating a crosswalk between legacy codes and the target system's classification scheme — often not a one-to-one mapping
Legal description cleanup: Standardizing section/township/range formatting and expanding abbreviated lot and block references
Address standardization: Normalizing street names, directional prefixes, and unit designators to USPS or local conventions

Automation vs. manual normalization

Some tasks are well suited to automated rules — date format conversion can usually be handled programmatically. Others, like resolving ambiguous name abbreviations or mapping document types with no clear equivalent, require human judgment. A realistic project plan accounts for both.

Import and export considerations

Moving validated index data into the production records system is the final step — and often more complex than expected.

Field mapping

The fields in your extraction schema rarely match the target system's schema exactly. Field mapping defines how each extracted field translates to a field in the destination — including data type conversions, format transformations, and concatenation or splitting of values.

Export format requirements

The required format depends entirely on the target system. Key considerations:

Character encoding (UTF-8 vs. ASCII vs. Windows-1252 — legacy systems may not handle Unicode)
Field delimiters and text qualifiers (commas, tabs, pipes — and how the system handles values containing the delimiter)
Header rows and field ordering (some systems require fields in a specific sequence)
Maximum field lengths (values that exceed limits will be silently truncated or rejected)

Validation at import

Most target systems have their own validation rules. Records that passed your internal QC may still fail at import if the target system has different constraints. Test imports with sample batches early to discover mismatches before running full-volume loads.

Error handling and rollback

Define what happens when records fail to import:

Reject the individual record and log the error for manual correction
Reject the entire batch and investigate before retrying
Import what succeeds and queue failures for review

Whatever approach you choose, maintain a clear audit trail of what was imported, what was rejected, and what was corrected.

Linking images to index records

Index data alone isn't enough — the target system needs to associate each record with the corresponding document image(s). This requires consistent file naming, path conventions, and often a manifest file that maps records to image files.

Why pilot batches matter

A pilot batch is a small, representative sample — typically 500 to 2,000 records — processed through the full workflow before committing to production volume:

Reveal the real exception rate: The percentage of records requiring manual intervention is the single most important variable in scoping a reindexing project. A pilot gives you an empirical number rather than a guess.
Surface edge cases: Unusual document types, unexpected formatting, and data quality issues that weren't visible in initial sampling.
Test the full import path: Validate that your export format, field mapping, and image linking work correctly in the target system.
Calibrate staffing and timelines: Manual review speed and throughput on real data — not estimates based on clean test documents.
Build stakeholder confidence: Showing decision-makers a concrete sample of cleaned, imported records is more persuasive than a project plan.

The cost of a pilot is small relative to the cost of discovering problems at scale. If a vendor or internal team resists running one, treat that as a risk signal.

Disclaimer: This guide is educational in nature. It is not legal advice or a substitute for consulting with your office's legal counsel or state records management agency.

Frequently Asked Questions

Related Guides

What Is Backfile Conversion?

A practical guide to planning and executing a backfile conversion project from scanning through import.

Read guide

AI Document Indexing for County Records

How AI-assisted indexing works — OCR, extraction, exception review, and realistic expectations.

Read guide

Public Records Indexing in Louisiana

State-specific guide for Louisiana parish clerks of court — civil law system, acts of sale, authentic acts, and separate mortgage records.

Read guide

Public Records Indexing in Iowa

State-specific guide for Iowa county recorders — Iowa Land Records portal, e-recording in all 99 counties, and Declaration of Value requirements.

Read guide