History and character

[Under construction, more to come here]

The CME is rather misleadingly named, since it is not a 'corpus' in the sense that linguists use the term, but rather a hodge-podge accumulation of about 300 texts taken mostly from out-of-copyright editions, some re-keyed afresh, some derived from other sites, gathered over several decades, and XML-encoded in a variety of schemas, mostly and loosely TEI-based. This makes reducing them to a common format for purposes of corpus analysis rather a chore, a fact for which we apologize in advance.

The CME is in fact only one of three 'corpora' of early English that Michigan hosts: another is the body of quotations contained as evidence within the MED itself (nearly a million of them, in fact); another is the earliest tranche of printed books transcribed for our sister project, the Early English Books Online Text Creation Partnership (EEBO-TCP for short). The three accumulations of text are described in more detail in this document, prepared for a symposium on early English corpora sponsored by Mark Faulkner at Trinity College, Dublin in 2017. The interface descriptions are now obsolete, but those of the content are accurate.