Ancestry's indexing experiment with firms in China
I follow genealogist Michele Lewis on TikTok. She recently found an unusual Ancestry.com transcription from the 1820 Federal Census. Check out the handwritten first name. What does it look like to you?
Now, I get it that a 200-year-old handwritten scrawl can be hard to read. But how could a transcriber even consider "Elizabether" in this case? I think I know the answer. In 2008, I worked for an online technology publication, The Industry Standard (no longer online).
I interviewed Tim Sullivan, CEO of The Generations Network, which was Ancestry.com's official corporate owner until 2009. The article was published on October 3, 2008, on the website of The Industry Standard (see image below). In the interview, Sullivan noted that computers from this period were "not even close" to being able to read handwritten records, especially those from disparate sources such as census records which have many different styles of handwriting. So Ancestry turned to human transcriptionists. Paid transcriptionists, not volunteers like on FamilySearch. Sullivan told me:
"The vast majority of the investment we've made in the last 10 years is not in acquisitions costs or imaging costs, it's in the indexing costs."At the time, Sullivan said Ancestry was paying $10 million per year to transcribe old records. To cut costs, Ancestry hired overseas partners in China where English was not widely spoken, but they can get census records transcribed for less money:
So how did The Generations Network import the data from millions of old census forms into its online database? Sullivan says the company spent about $75 million over 10 years to build its "content assets" including the census data, and much of that cost went into partnering with Chinese firms whose employees read the data and entered it into Ancestry.com's database. The Chinese staff are specially trained to read the cursive and other handwriting styles from digitized paper records and microfilm. The task is ongoing with other handwritten records, at a cost of approximately $10 million per year, he adds.
If you have ever tried to read old handwriting in an unfamiliar language, I am sure you can appreciate how difficult this task would be. But the lack of quality checks and nonsensical transcriptions is stunning.
Keep in mind that Ancestry charges customers lots of money (up to 25% more as of January) but its main focus is generating profit for a string of private equity firms. Its current owner is a Wall Street PE firm, Blackstone Inc.
It's not clear if Ancestry still outsources its transcriptions to overseas firms, or if the OCR technology is good enough to hand off the task to computers.
Regardless, what's especially frustrating is Ancestry customers have attempted to correct this particular error. The actual name is "Christopher Orr." They've added the correct annotation multiple times, but Ancestry still shows the name from that 200-year-old census return as "Elizabether Orr." Lots of people searching for this ancestor will never find him, thanks to Ancestry's cost-cutting moves 15 years ago and lack of quality checks to correct such errors.
As Lewis notes at the end of her video, "Maybe you're going to have the hand-search the indexes one at a time" to determine what the actual name is.
Archive of "Google stays mum on plans for public documents, Ancestry.com points to OCR hurdle." By Ian Lamont. Published 10/3/2008, The Industry Standard.