CRM Data Cleanup Engine
A read-only Google Apps Script toolkit that audits and de-risks a 60,000+ contact CRM. It dedupes companies, rescues thousands of orphaned contacts, and uses an LLM to classify the lead base against an Ideal Customer Profile — all outputting to reviewable Google Sheets, with a hard guardrail that nothing mutates the live CRM until a human signs off.
01 Overview
The company's CRM had grown to 60K+ contacts with duplicate companies, orphaned contacts with no company link, and an unscored lead base — dirty data that was quietly poisoning the lead-scoring model. This toolkit audits and repairs it safely.
The defining constraint: it is strictly read-only against the live CRM. Every proposed change lands in a reviewable Google Sheet, and nothing is written back until a human signs off.
02 How it works
Three tools make up the suite — a dedupe tool (domain, name, and fuzzy matching) to collapse duplicate companies; an orphan-contact recovery tool to re-link thousands of contacts that lost their company association; and a Gemini ICP classifier that reads each prospect's website and returns an ICP yes/no with reasoning and a calibrated confidence score.
To work at this scale within Apps Script's 6-minute execution cap, it builds a full
read-only mirror of the base that is resumable via cursor pagination. A two-pass
client-protection guard builds a multi-thousand-domain exclusion list while refusing to ever
exclude a domain tied to a customer or partner record.
03 Engineering highlights
- Three-tool suite — domain/name/fuzzy dedupe, orphan-contact recovery, and a Gemini ICP classifier with calibrated confidence + reasoning.
- Full read-only mirror of the base, resumable via
cursor paginationto survive the6-minuteApps Script execution cap. - Two-pass client-protection guard builds a multi-thousand-domain exclusion list that never excludes a customer/partner domain.
- Collapsed a large paid LLM run into a free domain-join by baking pre-computed lender verdicts into the script.
- Phase-2 archive exporter bridges removed records to reporting so analytics survive deletion.
04 Outcome
Dirty data was poisoning the company's lead-scoring model; this toolkit is the foundation that makes scoring trustworthy — with a human-in-the-loop guarantee that the live CRM is never touched without sign-off.