Today in 1860, Herman Hollerith was born in Buffalo, New York.
Commemorating IBM’s 100th anniversary in 2011, The Economist wrote:
In 1886, Herman Hollerith, a statistician, started a business to rent out the tabulating machines he had originally invented for America’s census. Taking a page from train conductors, who then punched holes in tickets to denote passengers’ observable traits (e.g., that they were tall, or female) to prevent fraud, he developed a punch card that held a person’s data and an electric contraption to read it. The technology became the core of IBM’s business when it was incorporated as Computing Tabulating Recording Company (CTR) in 1911 after Hollerith’s firm merged with three others.
In his patent application, “Art of Compiling Statistics,” Hollerith explained the use of his machine in the context of a population survey, highlighting its usefulness in the statistical analysis of “big data”:
The returns of a census contain the names of individuals and various data relating to such persons, as age, sex, race, nativity, nativity of father, nativity of mother, occupation, civil condition, etc. These facts or data I will for convenience call statistical items, from which items the various statistical tables are compiled. In such compilation the person is the unit, and the statistics are compiled according to single items or combinations of items… it maybe required to know the numbers of persons engaged in certain occupations, classified according to sex, groups of ages, and certain nativities. In such cases persons are counted according to combinations of items. A method for compiling such statistics must be capable of counting or adding units according to single statistical items or combinations of such items. The labor and expense of such tallies, especially when counting combinations of items made by the usual methods, are very great.
James Cortada in Before the Computer quotes Walter Wilcox of the U.S. Bureau of the Census:
While the returns of the Tenth (1880) Census were being tabulated at Washington, John Shaw Billings [Director of the Division of Vital Statistics] was walking with a companion through the office in which hundreds of clerks were engaged in laboriously transferring data from schedules to record sheets by the slow and heartbreaking method of hand tallying. As they were watching the clerks he said to his companion, “there ought to be some mechanical way of doing this job, something on the principle of the Jacquard loom.”
Says Cortada: “It was a singular moment in the history of data processing, one historians could reasonably point to and say that things had changed because of it. It stirred Hollerith’s imagination and ultimately his achievements.” Cortada describes the results of the first large-scale machine learning project:
The U.S. Census of 1890… was a milestone in the history of modern data processing…. No other occurrence so clearly symbolized the start of the age of mechanized data handling…. Before the end of that year, [Hollerith’s] machines had tabulated all 62,622,250 souls in the United States. Use of his machines saved the bureau $5 million over manual methods while cutting sharply the time to do the job. Additional analysis of other variables with his machines meant that the Census of 1890 could be completed within two years, as opposed to nearly ten years taken for fewer data variables and a smaller population in the previous census.