From Cave Paintings to the Internet A Chronological and Thematic Database on the History of Information and Media Indexing & Seaching Information Timeline

Theme

300 BCE – 30 CE

The Earliest Surviving Monolingual Dictionary Circa 250 BCE

An edition of the Erya.(View Larger)

The earliest surviving monolingual dictionary is the Chinese dictionary called the Eyra.

"The Erya has been described as a dictionary, glossary, synonymicon, thesaurus, and encyclopaedia. Karlgren (1931: 46) explains that the book "is not a dictionary in abstracto, it is a collection of direct glosses to concrete passages in ancient texts." The received text contains 2094 entries, covering about 4300 words, and a total of 13,113 characters. It is divided into nineteen sections, the first of which is subdivided into two parts. The title of each chapter combines shi ("explain; elucidate") with a term describing the words under definition. Seven chapters (4, 8, 9, 10, 12, 18, and 19) are organized into taxonomies. For instance, chapter 4 defines terms for: paternal clan (宗族), maternal relatives (母黨), wife's relatives (妻黨), and marriage (婚姻). The text is divided between the first three heterogeneous chapters defining abstract words and the last sixteen semantically-arranged chapters defining concrete words. The last seven – concerning grasses, trees, insects and reptiles, fish, birds, wild animals, and domestic animals – describe more than 590 kinds of flora and fauna. It is a valuable document of natural history and historical biogeography" (Wikipedia article on Eyra, accessed 05-08-2008).

Filed under: Indexing & Seaching Information, Linguistics / Translation / Speech, Natural History, Organization of Information / Taxonomy | Bookmark or share this entry »

The Origins of Bibliography Circa 200 BCE

A digital recreation of the Library of Alexandria.

Kallimachos (Callimachus), a renowned poet and head of the Alexandrian Library, compiles a catalogue of its holdings which he calls Pinakes (Tables or Lists).

Supposedly extending to 120 papyrus scrolls, this catalogue amounted to a systematic survey of Greek literature up to its time. It also represented the origins of bibliography. Only a few fragments survived the eventual destruction of the library, together with a scattering of references to it in other ancient works.

Callimachus’s bibliographical methods would not be out of place in a modern library; an analysis of the eight remaining fragments of the Pinakes shows that Callimachus

"1. divided the authors into classes and within these classes if necessary into subdivisions;

"2. arranged the authors in the classes or subdivisions alphabetically;

"3. added to the name of each author (if possible) biographical data;

"4. listed under an author’s name the titles of his works, combining works of the same kind to groups (no more than that can be deduced from the eight citations); and

"5. cited the opening words of each work as well as

"6. its extent, i.e., the number of lines" (Blum, p. 152).

The surviving fragments of Kallimachos's Pinakes were first published in print in Hymni, epigrammata et fragmenta, edited by Theodor J. G. F. Graevius et al. (Utrecht, 1697). That edition included the first edition of the commentary by Ezechiel Spanheim, and also incorporated the 420 fragments collected and elucidated by the English theologian, classical scholar and critic Richard Bentley, whose reading of these fragments represents “the earliest example of a really critical method applied to such a work" (Dictionary of National Biography).

Breslauer & Folter, Bibliography. Its History and Development (1984) no. 1.  Blum, Kallimachos. The Alexandrian Library and the Origins of Bibliography. Translated by Hans H. Wellisch (1991).

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Survival of Information | Bookmark or share this entry »

30 CE – 500 CE

One of the Earliest, Most Widely-Used Cross-Indexing Systems Circa 280 CE – 340 CE

A portrait of Eusebius of Caesarea. (View Larger)

The Eusebian canons or Eusebian sections, also known as Ammonian Sections, are the system of dividing the four Gospels used between late Antiquity and the Middle Ages.  The sections are indicated in the margin of nearly all Greek and Latin manuscripts of the Bible, and usually summarized in Canon Tables at the start of the Gospels . There are about 1165 sections: 355 for Matthew, 235 for Mark, 343 for Luke, and 232 for John; the numbers, however, vary slightly in different manuscripts.  These tables represent a way for the reader to move back and forth between related sections in the texts, and are an early organizational structure and cross-indexing system.

"Until the nineteenth century it was mostly believed that these divisions were devised by Ammonius of Alexandria, at the beginning of the third century (c. 220), in connection with a Harmony of the Gospels, now lost, which he composed. It was traditionally believed that he divided the four Gospels into small numbered sections, which were similar in content where the narratives are parallel. He then wrote the sections of the three last Gospels, or simply the section numbers with the name of the respective evangelist, in parallel columns opposite the corresponding sections of the Gospel of Matthew, which he had chosen as the basis of his Harmony. Now it is believed that the work of Ammonius was restricted to what Eusebius of Caesarea (265-340) states concerning it in his letter to Carpianus, namely, that he placed the parallel passages of the last three Gospels alongside the text of Matthew, and the sections traditionally credited to Ammonius are now ascribed to Eusebius, who was always credited with the final form of the tables.

"The tables themselves were usually placed at the start of a Gospel Book, and in illuminated copies were placed in round-headed arcade-like frames of which the general form remained remarkably consistent through to the Romanesque period. This form was derived from Late Antique book-painting frames like those in the Chronography of 354. In many examples the tables are the only decoration in the whole book, perhaps other than some initials. In particular, canon tables, with Evangelist portraits, are very important for the study of the development of manuscript painting in the earliest part of the Early Medieval period, where very few manuscripts survive, and even the most decorated of those have fewer pages illuminated than was the case later" (Wikipedia article on Eusebian Canons, accessed 11-26-2008).

Wright, Alex. Glut: Mastering Information Through the Ages (2007) 83-85.

Filed under: Indexing & Seaching Information, Manuscript Illumination, Manuscripts & Manuscript Copying, Organization of Information / Taxonomy, Religious Texts / Religion | Bookmark or share this entry »

800 – 900

The Book of Kells Circa 800

The decorated commencement of St. John's Gospel. (View Larger)

The Book of Kells, sometimes known as the Book of Columba, contains a richly decorated copy of the Four Gospels in a Latin text based on the Vulgate edition (completed by St Jerome in 384 CE). The gospels are preceded by prefaces, summaries of the gospel narratives and concordances of gospel passages—a kind of cross-indexing system—compiled in the fourth century by Eusebius of Caesarea.

The book "was transcribed by Celtic monks ca. 800. The text of the Gospels is largely drawn from the Vulgate, although it also includes several passages drawn from the earlier versions of the Bible known as the Vetus Latina. It is a masterwork of Western calligraphy and represents the pinnacle of Insular illumination. It is also widely regarded as Ireland's finest national treasure."

"The illustrations and ornamentation of the Book of Kells surpass that of other Insular Gospels in extravagance and complexity. The decoration combines traditional Christian iconography with the ornate swirling motifs typical of Insular art. Figures of humans, animals and mythical beasts, together with intricate knotwork and interlacing patterns in vibrant colours, enliven the manuscript's pages. Many of these minor decorative elements are imbued with Christian symbolism and so further emphasize the themes of the major illustrations.

"The manuscript today comprises 340 folios and, since 1953, has been bound in four volumes. The leaves are on high-quality calf vellum, and the unprecedentedly elaborate ornamentation that covers them includes ten full-page illustrations and text pages that are vibrant with historiated initials and interlinear miniatures and mark the furthest extension of the anti-classical and energetic qualities of Insular art. The Insular majuscule script of the text itself appears to be the work of at least three different scribes. The lettering is in iron-gall ink, and the colors used were derived from a wide range of substances, many of which were imports from distant lands" (Wikipedia article on The Book of Kells, accessed 11-22-2008).

The Book of Kells is preserved at Trinity College, Dublin.

Filed under: Art , Book History, Indexing & Seaching Information, Manuscript Illumination, Manuscripts & Manuscript Copying, Religious Texts / Religion, Survival of Information | Bookmark or share this entry »

900 – 1000

Massive Byzantine Encyclopedic Dictionary Circa 950

The Suda, or Souda, a massive Byzantine encyclopedic dictionary of the Mediterranean world written in Greek, contains 30,000 entries, many drawn from ancient sources that were since lost. Little is known regarding its compilation except that it must have been compiled before the 12th century writer, Eustathius of Thessalonica, who frequently quotes from it.

"The Suda is somewhere between a grammatical dictionary and an encyclopedia in the modern sense. It explains the source, derivation, and meaning of words according to the philology of its period, using such earlier authorities as Harpocration and Helladios. There is nothing especially important about this aspect of the work. It is the articles on literary history that are valuable. These entries supply details and quotations from authors whose works are otherwise lost. They use older scholia to the classics (Homer, Thucydides, Sophocles, etc.), and for later writers, Polybius, Josephus, the Chronicon Paschale, George Syncellus, George Hamartolus, and so on.

"This lexicon represents a convenient work of reference for persons who played a part in political, ecclesiastical, and literary history in the East down to the tenth century. The chief source for this is the encyclopedia of Constantine VII Porphyrogenitus (912-59), and for Roman history the excerpts of John of Antioch (seventh century). Krumbacher (Byzantinische Literatur, 566) counts two main sources of the work: Constantine VII for ancient history, and Hamartolus (Georgios Monachos) for the Byzantine age" (Wikipedia article on Suda, accessed 02-02-2010).

The most significant edition of the Suda is Suda On Line: Byzantine Lexicography.

"The purpose of the Suda On Line is to open up this stronghold of information by means of a freely accessible, keyword-searchable, XML-encoded database with translations, annotations, bibliography, and automatically generated links to a number of other important electronic resources. To date over 170 scholars have contributed to the project from eighteen countries and four continents. Of the 30,000-odd entries in the lexicon, over 25,000 have been translated as of this date, and more translations are submitted every day." 

 

Filed under: Indexing & Seaching Information, Linguistics / Translation / Speech, Organization of Information / Taxonomy | Bookmark or share this entry »

The Earliest Universal Bibliography 988 – 990

Muhammad ib Ishaq (Abu al Faraj) called Ibn Abi al-Nadiim (Abi Ya'qub Ishaq al-Warraq al-Baghdadi), a bookseller, stationer and "court companion" of Baghdad, publishes  Al- Fihrist, an annotated index of the books of all nations extant in the Arabic language and script.

The English translator of al-Nadim's work, Bayard Dodge, suggests that Al-Nadim, working in his father's bookshop, "wished to assemble a catalogue to show customers and to help in the procuring and copying of manuscripts to be sold to scholars and book collectors" (Dodge p. xxiii).  This was the earliest universal bibliography.

"It is reasonable to believe that when al-Nadim died the original copy of his manuscript was placed in the royal library at Baghdad, while other copies made by scribes about the time of his death were assigned to his family bookstore, where some of them were probably sold to customers who came to purchase interesting books. Farmer says: ' Yagut (d. 626/1299) averred that he used a copy of the Fihrist in the handwriting of al-Nadim himself. The lexicographer al-Saghani (650/1252) made a similar claim. Either of these autograph copies may have been in the Caliph's library, which was destroyed utterly in the sacking of Baghdad in 656/1258)' "(Dodge p. xxii).

This work did not appear in print until an edition of the Arabic text was issued by orientalist Gustav Flügel in Leipzig, 1871-72.

The text was first edited from the earliest manuscripts and translated into English by Bayard Dodge as The Fihrist of al-Nadim. A Tenth-Century Survey of Muslim Culture, 2 vols., New York, 1970. For the translation of part one Dodge used MS 3315 in the Chester Beatty Library, Dublin:

"We know nothing about the history of the manuscript until it was placed in the library of the great mosque at 'Akka, when the notorious Ahmad Pasha-al-Jazzar was ruler there at the time of Napoleon Bonaparte. After the fall of Ahmad Pasha, the manuscript was evidently stolen from the mosque. It was probably at this time that it became divided, as the Beatty Manuscript includes on the first half of Al-Fihrist. In the course of time the dealer Yahudah sold his first half to Sir Chester Beatty, who placed it in his library at Dublin" (Dodge p. xxviii).

For the translation of part two Dodge used MS 1934 which "forms part of the Shahid 'Ali Pasha collection which is now cared for in the library adjacent to the Sulaymaniyah Mosque at Istanbul. In the library catalogue it is described as 'Suleymaniye G. Kutuphanesi kismi Shetit Ali Pasha 1934" (Dodge p. xxx).

Dodge indicated that he believed that each separate portion represents half of the same manuscript made shortly after the death of al-Nadim.

Filed under: Bibliography, Book Trade, Destruction / Looting of Information, Indexing & Seaching Information, Manuscripts & Manuscript Copying, Organization of Information / Taxonomy, Publishing, Survival of Information | Bookmark or share this entry »

1100 – 1200

The Emergence of Concordances and Subject Indexes Circa 1190 – 1290

"In the course of the thirteenth century a flood of texts appeared that belonged to a genre virtually unknown before, works such as the alphabetical collections of biblical distinctiones, the great verbal concordances to the scriptures, alphabetical subject indexes to the writings of Aristotle and the Fathers, and location lists of books. These are works designed to be used, rather than read. Moreover, in many cases -- for example, the concordance, or subject index to the works of Augustine -- these new tools helped one to use, rather than to read, the texts to which they were devoted. Tools such as these are unknown in classical antiquity. They are alien to the Hebrew and Byzantine traditions until imported from the Latins. And they emerge with striking suddenness in the West, to the point that one may say that before the 1190s such tools did not exist, and that by 1290 the dissemination and new creation of such tools were commonplace" (M. Rouse & R. Rouse, "The Development of Research Tools in the Thirteenth Century", Authentic Witnesses: Approaches to Medieval Texts and Manuscripts (1991) 221.)

Filed under: Indexing & Seaching Information, Manuscripts & Manuscript Copying, Organization of Information / Taxonomy | Bookmark or share this entry »

1200 – 1300

Biblical Concordances, Tools for Preachers 1239

"The development of the concordance should be examined in the context of the methods used to 'distinguish' words found in the text of the Bible. The collections of biblical distinctiones that abound in western Europe from the end of the twelfth century are the earliest of alphabetical tools save the dictionaries. Distinction collections provide one with the various figurative and symbolic means of a noun that is found in Scripture, illustrating each meaning with a scriptural passage" Rouse & Rouse, Authentic Witnesses: Approaches to Medieval Texts and Manuscripts [1991] 222-23).

"The first concordance (Saint Jacques I), which was compiled at Saint Jacques in Paris under the direction of Hugh of Saint Cher, was probably already in existence by 1239. This pioneering work originated the reference system used thereafter: each appearance of a word was noted according to book of the Bible, chapter of the book (following the chapter divisions attributed to Stephen Langton), and relative location with the chapter, indicated by means of one of the first seven letters of the alphabet A--G. The production of this major work over a period time required an impressive organization of man-power. There survive, in the fifteenth-century bindings of manuscripts from Saint Jacques, four quires of what must be the penultimate draft of this concordance, revealing something of their methods: each quire was written by a different copyist responsible only for a fixed portion of the alphabet, as one can see from the blank space each left when he had finished his assigned task. Corrections were then noted, so that it would be ready for the final copy. A drawback of Saint Jacques I is the fact that its words are not cited in context. This version survives in eigthteen manuscripts, thirteen of which date from the thirteenth century" (Rouse & Rouse, op. cit., 224-25.)

Filed under: Indexing & Seaching Information, Manuscripts & Manuscript Copying | Bookmark or share this entry »

The First Alphabetical Subject Indexes Circa 1250

"Paris was of course a major center of the devising and use of alphabetical tools in the thirteenth century. The several motive forces that created the various indexing tools, devices, and procedures flowed into and out from Paris. By the middle of the thirteenth century, it in fact becomes pointless to try to dinstiguish between Cistercian tools and university tools. The two communities shared at least one activity in common, that of preaching to the laity. After the foundation of a Cistercian house of studies at Paris, the Collège Saint-Bernard, the two institutions shared personnel as well. The indexing method that had been peculiarity Cistercian, the use of marginal letters and changing albphabets as reference systems, was picked up and used by the schools; the A – G reference system, developed by the Paris Dominicans for the concordances, was adapted for their particular needs by the Cistercians of Bruges. Books from the Paris schools invaded Cistercian (as well as Benedictine) libraries, to the point of eclipsing the monastic scriptoria, while indexed Cistercian florilegia from Villers and Clairvaux made their way into the studies of the masters, and the shops of the stationers, in Paris and Oxford.

"On of the archetypical contributions of the University of Paris in this field is the application of indexing techniques to the works of Aristotle. Distinctiones, biblical concordances, and Cistercian indexes were, as we have seen, devoted to those works which constitute the very core of the Christian tradition. At the Paris schools, however, we see for the first time the development of reference works designed to facilitate access to texts for strictly scholarly purposes, without the remotest connection to sermon-preparation. By mid-century, there were alphabetical indexes to the majority of works in the Latin Aristotelian corpus, Old Logic, New Logic, the Ethica, the Libri naturales. Since these reference tools are anonymous, it is obviously impossible to prove that they originate at Paris; but the combination of the two activities, Aristotelian studies and creation of indexes, can point nowhere else at this period" (Rouse & Rouse, Authentic Witnesses: Approaches to Medieval Texts and Manuscripts [1991] 228-28).

Filed under: Indexing & Seaching Information, Manuscripts & Manuscript Copying, Organization of Information / Taxonomy | Bookmark or share this entry »

The Arrangement and Cataloguing of Books Circa 1270

Humbert de Romans, Dominican scholar who promulgated the notion of arranging books by subject matter.

"The arrangement and cataloguing of books within the individual colleges and other university institutions were also influenced by the changes in book usage reflected in the union catalogs and location lists. In monastic institutions, book collections had traditionally been kept in book chests or armaria — though the individual volumes themselves doubtless were, for much of the time, parceled out among the members of the house. We find, however, in the writings of the Dominican Humbert of Romans, about 1270, instructions that books in the armaria should be physically arranged by subject matter, and that certain ones of them should be chained at lecterns for the common use of all, rather than being either locked away in a chest or loaned for the use of only one person. Before the end of the thirteenth century, both the Collège de Sorbonne in Paris and University College in Oxford had such a collection of chained books attached to reading benches. Early in the next century, about 1320, a member of the Sorbonne compiled a subject catalog of the hundreds of individual texts bound together in some three hundred chained codexes of his college. This development — arrangement of manuscripts by subject matter, affixing chains to selected books, an index of the content of a whole collection — corresponds in its way, in both purpose and inguenuity, to the making of concordances, distinction collections, subject indexes, and union catalogs; and it is in such a context that it should be considered. The common goal of all these devices was to facilitate access to desired information" (Rouse & Rouse, Authentic Witnesses: Approaches to Medieval Texts and Manuscripts [1991] 238-39).

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

Organization of the Sorbonne Library, and the Way it Was Physically Arranged 1290

"We have seen that the first catalog of the college [The Sorbonne] was classified; the text of the 1290 catalog provides a full view of this classification system. It was a system common to the intellectual world of the thirteenth century, namely, the Scriptures, glossed and postillated books; Peter Lombard's Sentences, and questions and summas on the Sentences, whole works on the saints and doctors of the Church; questions and distinctions of the master; and whole works of the ancient philosophers, followed by works outside the realm of theology and philosophy — medicine, the quadrivium, jurisprudence and perhaps verancular writings. In this scheme, constructed for theologians, the works are arranged in descending order of their relative authority: Holy scripture, Doctors of the Church, modern masters, and ancient philosophers. This hierarchy of authority was detailed for example by St. Bonaventure: 'Sunt ergo libri sunt sacrae scripturae. . .; secundi libri sunt orignalia sanctorum, tertii, sententiae magistrorum, quarti, doctrinarum mundialium sive philosophorum.' It was only natural that this hierarchy also appeared in the organization of medieval book collections such as that at the Sorbonne.

"It has been suggested, furthermore, on the basis of the first catalog, that the books were grouped by subject and author in armaria similar to those described by Humbert of Romans ca. 1270, and that the classification of the catalog is a reflection of this arrangement. It is impossible, however, to judge on the basis of the catalog alone whether or not it reflects the physical arrrangement of the books themselves. We are fortunate in this instance to have collateral evidence which reveals the arrangement of certain books in the library just after the turn of the century.

"In 1306, Thomas Hibernicus, a fellow of the Sorbonne, unintentionally but effectively preserved a picture of the arrangement of the manuscripts of the major authors in the armaria, in the process of completing his Manipulus florum. This is a collection of extracts from the authorities grouped according to some 265 topics alphabetically arranged— abstinencia, abusio, acceptio, accidia, adiutorium, etc. Under of the the some 265 topics the extracts appear in a set order without significant variation: quotations from Augustine, Ambrose, Jerome, Gregory, Bernard, Hilary, Chrysostom, Isidore, and so on, concluding with the ancients. At the end of the Manipulus florum Thomas has appended a bibliography of 476 works, each with incipit and explicit, compiled from the Sorbonne's manuscripts. The authors in the bibliography are presented in virtually the same order as the extracts, works of Augustine, Ambrose, Jerome, etc. The order preserved here, the order in which Thomas used the books, is apparently that of the grouping of the books in the armaria of the library. The order is virtually the same as the order of authors in the catalogs of 1290 and 1338, originalia Augustine, Ambrosii, Hieronimi, Gregorii, Bernardi, etc. The combined evidence of the 1290 catalog and the Manipulus florum certainly implies, if does not prove, that the organization of the catalog reflects the physical arrangement of the manuscripts in armaria" (Rouse & Rouse, "The Early Library of the Sorbonne," Authentic Witnesses: Approaches to Medieval Texts and Manuscripts [1991] 370-72).

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Manuscripts & Manuscript Copying, Organization of Information / Taxonomy | Bookmark or share this entry »

1300 – 1400

Logical Machines for the Production of Knowledge 1305

A portrait of Ramon Llull. (View Larger)

Majorcan writer and philosopher Ramon Llull (Ramon Lull) publishes in his Ars generalis ultima or Ars magna  (the "The Ultimate General Art") a method of combining religious and philosophical attributes selected from a number of lists, which he invented about 1275. It is believed that Llull's inspiration for the Ars magna came from observing Arab astrologers using a mechanical device called a zairja to calculate ideas.

Llull's method

"was intended as a debating tool for winning Muslims to the Christian faith through logic and reason. Through his detailed analytical efforts, Llull built an in-depth theological reference by which a reader could enter in an argument or question about the Christian faith. The reader would then turn to the appropriate index and page to find the correct answer.

"Llull also invented numerous 'machines' for the purpose. One method is now called the Lullian Circle, each of which consisted of two or more paper discs inscribed with alphabetical letters or symbols that referred to lists of attributes. The discs could be rotated individually to generate a large number of combinations of ideas. A number of terms, or symbols relating to those terms, were laid around the full circumference of the circle. They were then repeated on an inner circle which could be rotated. These combinations were said to show all possible truth about the subject of the circle. Llull based this on the notion that there were a limited number of basic, undeniable truths in all fields of knowledge, and that we could understand everything about these fields of knowledge by studying combinations of these elemental truths.

"The method was an early attempt to use logical means to produce knowledge. Llull hoped to show that Christian doctrines could be obtained artificially from a fixed set of preliminary ideas. For example, one of the tables listed the attributes of God: goodness, greatness, eternity, power, wisdom, will , virtue, truth and glory. Llull knew that all believers in the monotheistic religions - whether Jews, Muslims or Christians - would agree with these attributes, giving him a firm platform from which to argue.

"The idea was developed further by Giordano Bruno in the 16th century, and by Gottfried Leibniz in the 17th century for investigations into the philosophy of science.

"Leibniz gave Llull's idea the name ars combinatoria, by which it is now often known. Some computer scientists have adopted Llull as a sort of founding father, claiming that his system of logic was the beginning of information science" (Wikipedia article on Ramon Llull, accessed 04-02-2009).

Filed under: Computer / Internet Culture, Indexing & Seaching Information, Mathematics / Logic, Religious Texts / Religion | Bookmark or share this entry »

Medieval Union Catalogue of Manuscripts Circa 1320

Oxford Franciscans compile, on the basis of on-site surveys, the Registrum Anglie de libris doctorum et auctorum ueterum — a manuscript union catalogue of some 1400 manuscript books in England, Scotland and Wales. It lists the works of 98 authors owned by 189 monastic or cathedral libraries.

"Although none of these libraries is Franciscan, the master list is organized geographically according to the division of Great Britain into the custodiae of the Franciscan order. The three surviving manuscripts of the Registrum date from the beginning of the fifteenth century; it is nevertheless possible to establish from external evidence that the Registrum must date from the first or second decade of the fourteenth century" (Rouse & Rouse, Authentic Witnesses. Approaches to Medieval Texts and Manuscripts [1991] 237-38).

Registrum Anglie de libris doctorum et auctorum veterum. Edited with an introduction and notes by Richard H. Rouse and Mary A. Rouse. The Latin text established by R. A. B. Mynors (1991).

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Manuscripts & Manuscript Copying | Bookmark or share this entry »

Medieval Union Catalogue of Manuscripts Names 694 Authors Circa 1350

The Benedictine monk Henry of Kirkestede, prior of the royal abbey of St. Edmund at Bury St Edmunds in Suffolk, and traditionally known as Boston Burienis, compiled a union catalogue of manuscripts in English libraries entitled Catalogus de libris autenticis et aposcrifis. He named 674 authors and assigned to them about 3900 works.

Richard H. Rouse & Mary A. Rouse, eds., Henry of Kirkested, Catalogus de libris autenticis et aposcrifis (2004).

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Manuscripts & Manuscript Copying | Bookmark or share this entry »

1400 – 1450

The First Bible Concordance in Hebrew 1448

French Jewish philosopher and controversialist,  Isaac Nathan ben Kalonymus publishes Meïr Netib, a concordance to the Hebrew Bible upon which he worked from 1437 to 1447, with a philosophico-exegetical introduction,  Petiḥat Meïr Netib.

"The Meïr Netib was the first Bible concordance in Hebrew, and was distinguished from the similar Latin work of Arlotus of Prato in that its vocabulary was arranged in the order of the roots. In the introduction the author says that his work aimed to facilitate the study of Biblical exegesis and to prevent Jewish converts to Christianity from making, in their religious controversies, incorrect quotations from the Bible, as was often the case with Geronimo de Santa Fé. The "Meïr Netib," with its complete introduction, was first published at Venice (erroneously under the name of Mordecai Nathan) in 1523; in 1556 it was published at Basel by Buxtorf, but with only a part of the introduction."

Filed under: Indexing & Seaching Information, Publishing, Religious Texts / Religion | Bookmark or share this entry »

1450 – 1500

The First Printed Book Issued with Pagination Circa 1473 – 1474

The first printed book to be issued with pagination rather than foliation is Werner Rolewinck's Fasciculus temporum published by Nicholaus Götz, probably in Cologne. ISTC no. ir00253000.

"Pagination began in England in the XIIIth century, making its way slowly from there to the continent where it was used, with very few exceptions, only in the northern parts of Europe and as far south as the middle and upper Rhine valley. Its first appearance in a printed book (Rolewinck's Fasciculus temporum , ca. 1474-4; H. 6917)) was in Cologne, one of many examples of the influence of regional characteristics of manuscripts on printed books. In retrospect it seems surprising that the advantages of foliation, pagination and alphabetical indexing were realized so late, but the reasons are quite clear. A manuscript, being unique, served one or few readers, the printed book many. When texts were produced by printing, all copies were identical and care was taken regularly to number folios or pages and to prepare careful tables of contents and indexes. During the manuscript period citations were cumbersome, since they had to refer to chapters or other clearly defined parts of texts. Accurate citations developed as the direct result of printing, when it became clear that references by edition and folio (or page) were the simplest and most accurate form. This occurred first in the text, then in marginal notations and ultimately in footnotes" (Hirsch, Printing, Selling and Reading 1450-1550 [1967] 6).

Filed under: Book History, Indexing & Seaching Information, Printing / Typography | Bookmark or share this entry »

1500 – 1550

Unprecedented Blending of Scientific Exposition, Art and Typography June 1543

At the age of only 29, physician, surgeon, and anatomist Andreas Vesalius publishes De humani corporis fabrica libri septem in Basel,  revolutionizing the science and teaching of human anatomy.

Throughout this encyclopedic 400,000 word book on the structure and workings of the human body Vesalius provided a fuller and more detailed description of human anatomy than any of his predecessors, correcting errors in the traditional anatomical teachings of Galen, which had been obtained from primate rather than human dissection, and arguing that knowledge of human anatomy was to be obtained only from human sources.  Even more revolutionary than his criticism of Galen and other medieval authorities was Vesalius's assertion that the dissection of cadavers must be performed by the physician himself-- a direct contradiction of the medieval doctrine that dissection was a task to be performed by menials while the physician lectured from the traditional authorities.  Only through actual dissection, Vesalius argued, could the physician learn human anatomy in sufficient detail to teach it accurately.  This "hands-on" principle remained Vesalius's most lasting contribution to the teaching of anatomy; it is graphically represented in the Fabrica's woodcut title page (the earliest illustration of an anatomical theatre), which shows Vesalius with his right hand plunged into an opened cadaver, conducting an anatomical demonstration. Because it was then legal only to dissect the cadavers of executed criminals, and these cadavers were always in short supply, Vesalius urged physicians to take their own initiative in obtaining material for dissection.  The Fabrica contains several amusing and unrepentant anecdotes of how students had robbed graves to obtain cadavers, especially those of women, since female criminals were rarely executed in those days.

The Fabrica also broke new ground in its unprecendented blending of scientific exposition, art and typography. Although earlier anatomical books, such as those by Berengario da Carpi had contained some notable anatomical illustrations, they had never appeared in such number or been executed in such minute precision as in the Fabrica, and they had usually been introduced rather haphazardly with little or no relationship to the text.  In contrast, Vesalius sent his woodblocks to the printer with precise instructions as to placement within the text, and with exact marginal references which brought about direct relationship of text to illustrations, or even details within illustrations.  The series of historiated initials, in which putti and dwarfed men humorously perform some of the more grisly actions associated with dissection, have been called pictorial footnotes to the text.  The book remains the typographic masterpiece of Johannes Oporinus of Basel, one of the most widely learned and iconoclastic of the scholar printers, whose success with this book apparently caused Vesalius to entrust to Oporinus all of his later publications.

The Fabrica's magnificent title page and the spectacular series of hundreds of anatomical woodcuts (full-page and smaller) spread throughout the book remain the most famous series of anatomical illustrations ever published.  Although the illustrations were attributed traditionally to an associate of Titian, Jan Stephan von Calcar who drew and, possibly engraved, the three woodcuts of skeletons in Vesalius's first series of anatomical charts, Tabulae anatomicae sex (1538), there is no reliable basis for this attribution.  The Fabrica woodcuts were produced by an unknown artist or artists in Titian's workshop.  Vesalius commissioned the illustrations and supervised their production.  It is also quite possible that he personally drew some of the lesser illustrations for the Fabrica, as we know that he made the drawings for the first three of the Tabulae anatomicae sex.  The woodblocks for the Fabrica were preserved in Munich until their destruction in World War II.

A notable feature of the Fabrica not usually considered is Vesalius' "Index of Notable Subjects and Words" published at the end of the work. Arranged alphabetically by subject, and either by first name or surname somewhat inconsistently, this index to page number and line number on a given page amounts to a detailed outline of what Vesalius considered his significant original contributions.  For example, under Galen he indexed to each specific anatomical detail where he disagreed with Galen's writings.

♦ You can page through a digital facsimile of the 1543 Fabrica at the National Library of Medicine website at this link.

Filed under: Art , Art and Science, Medicine, Technology, Book History, Book Illustration, Indexing & Seaching Information, Medicine, Publishing | Bookmark or share this entry »

The First Universal Bibliography Since the Invention of Printing 1545 – 1555

Swiss physician, bibliographer, naturalist and alpinist Conrad Gessner (Gesner) issues the first volume of his Bibliotheca Universalis, sive Catalogus omnium scriptorum locupletissimus, in tribus linguis, Latin, Graeca, & Hebraica: extantium & non extantium veterum & recentiorum. . .(1545) at the press of Christopher Froschauer in Zurich. Froschauer published Gessner's Appendix: Bibliothecae supplementing the work in 1555.

The first "universal" bibliography published since the invention of printing, the Bibliotheca universalis was an international bibliography of authors who wrote in Latin, Greek, and Hebrew, alphabetically arranged by their first names in accordance with medieval usage. Short biographical data preceded the lists of works, with indications of printing places and dates, printers and editors, where applicable. Gessner listed about 12,000 titles in the Bibliotheca universalis, expanded to about 15,000 in his Appendix.

Escaping the Labyrinth

"The technique of book production had changed radically as a result of print, but problems of information had not been simplified. This moved publishers and scholars to develop tools equal to the new situation. But such tools did not prove completely adequate to the task of helping the reader faced with the problem of selection, a problem which had now become more complicated. The predicament suggested to Gesner an encompassing labyrinth made up of a multitude of books. He confessed the profound sense of freedom he experienced when he finished his massive work in 1545: 'In truth I rejoice and thank God because I have finally gotten out of the labyrinth in which was trapped for almost three years' " (Balsamo, Bibliography: History of a Tradition [1990] 32).

Breslauer & Folter, Bibliography: Its History and Development  (1984) no. 14.

♦ Ironically Gessner, a physician, did not complete the intended medical section of his Bibliotheca universalis (liber xxi) and it was never published.

Besterman, The Beginnings of Systematic Bibliography 2nd ed (1940) 15-18.


Technically, in this project Gessner was preceded by Muhammad ib Ishaq (Abu al Faraj) called Ibn Abi Al-Nadim who in 988 CE published the Fihrist, an index of the books of all nations which were extant in the Arabic language and script. It is noticed in this database. Chronologically Al-Nadim's work was the earliest attempt at a universal bibliography, but it did not appear in a printed edition until 1871-72, and it had no influence on the development of bibliography in Europe.

Filed under: Bibliography, Indexing & Seaching Information, Organization of Information / Taxonomy, Science | Bookmark or share this entry »

The First General Subject Index 1548 – 1549

Conrad Gessner (Gesner) issues from Zurich Pandectarum sive Partitionum universalium libri XXI.

Pandectarum was the first general subject index, which Gessner intended as a key to his Bibliotheca Universalis (1545).

"Gesner's bibliographical system with its division of knowledge into various categories represents an enlarged version of the medieval system, adapted to his own times" (Breslauer & Folter, Bibliography: Its History and Development [1984] no. 16).

Besterman, The Beginnings of Systematic Bibliography 2nd ed (1940) no. XVII.

Filed under: Bibliography, Indexing & Seaching Information, Organization of Information / Taxonomy | Bookmark or share this entry »

1550 – 1600

Index Librorum Prohibitorum 1559

Using the print technology that it hopes to control, the Sacred Congregation of the Inquisition, in charge of censorship for the Catholic Church, begins publication in Rome of the Index Librorum Prohibitorum (List of Prohibited Books). This was updated through 32 editions, the last of which appeared in 1948.

“The various editions also contain the rules of the Church relating to the reading, selling and censorship of books. The aim of the list was to prevent the reading of immoral books or works containing theological errors and to prevent the corruption of the faithful. The list was not simply a reactive work. Catholic authors had the possibility to defend their writings and could prepare a new edition with the necessary corrections or elisions either to avoid or to limit a ban . . . . Pre-publication censorship was encouraged.”

Filed under: Book History, Censorship , Indexing & Seaching Information, Religious Texts / Religion | Bookmark or share this entry »

Renaissance Information Retrieval Device 1588

In Le diverse et artificose machine, elegantly published from his home in Paris, Agostino Ramelli describes and illustrates, among numerous remarkable inventions, a revolving book wheel. It is one of the earliest "information retrieval" devices. Ramelli writes:

"This is a beautiful and ingenious machine, very useful and convenient for anyone who takes pleasure in study, especially those who are indisposed and tormented by gout. For with this machine a man can see and turn through a large number of books without moving from one spot. Moveover, it has another fine convenience in that it occupies very little space in the place where it is set, as anyone of intelligence can clearly see from the drawing.

"This wheel is made in the manner shown, that is, it is contructed so that when the books are laid on its lecturns they never fall or move from the place where they are laid even as the wheel is turned and revolved all the way around. Indeed, they will always remain in the same position and will be displayed to the reader in the same way as they were laid on their small lecturns, without any need to tie or hold them with anything. This wheel may be made as large or small as desired, provided the master craftsman who constructs it observes the proportions of each part of its components. He can do this very easily if he studies carefully all the parts of these small wheels of ours and the other devices in this machine. These parts are made in sizes proportionate to each other. To give fuller understanding and comprehension to anyone who wishes to make and operate this machine, I have shown here separately and uncovered all the devices needed for it, so that anyone may understand them better and make use of them for his needs." (Ramelli, The Various Ingenious Machines of Agostino Ramelli. A classic Sixteenth-Century Illustrated Treatise on Technology. Translated from the Italian and French with a biographical study of the author by Martha Teach Gnudi. Techical annotations and a pictorial glossary by Eugene S. Ferguson [1987] 508-9)

Filed under: Art and Science, Medicine, Technology, Book History, Book Illustration, Indexing & Seaching Information, Technology | Bookmark or share this entry »

The First "Books in Print" 1595

Bookseller and bibliographer Andrew Maunsell publishes The First Part [the Seconde Parte] of the Catalogue of English printed Bookes.

Maunsell produced the first trade bibliography of English books, giving author, translator where applicable, a title full enough to ensure definite identification, format, and printer or bookseller and date. It listed those books printed in the preceding fifty to sixty years and which were still available from publishers and booksellers. The first part, consisting of 123 pages, listed theology--excluding anti-Reformation literature. The much shorter second part, consisting of 27 pages, listed "the Sciences Mathematicall, as Arithmetick, Geometrie, Astronomie, Astrologie, Musick, and the Arte of VVarre, and Nauigation; and also of Phisick and Surgerie."

In his Beginnings of Systematic Bibliography (2nd ed 1940) Theodore Besterman characterized Maunsell's work as the one in which a "a real technique of book-description is made use of for the first time" (p. 29). The Catalogue is also an alphabetical subject bibliography, with the larger subjects sub-divided and in each section works arranged alphabetically by author's surname—one of the earliest uses of the surname for indexing.

In his dedication to "Worshipfull the Master, Wardens, and Assistants of the Companie of Stationers and to all other Printers and Booke-sellers in generall" Maunsell wrote of learned men that

"have written Latine Catalogues, [Conrad] Gesner, Simler, and our countrman John Bale. They make their Alphabet by the Christen name, I by the Sir name; They mingle Diuinitie, Law Phiscke, &c. together, I set Diuinitie by itselfe; They set downe Printed and not Printed, I onely Printed, and none but such as I have seene. . . Concerning the Books which are without Authors names called Anonymi, I have placed them either upon the Title they bee entiuled by, or else upon the matter they entreate of, and sometimes upon both, for the easier finding of them."

Maunsell then explained his cross-indexing system, and how it should be used throughout the work.

Breslauer & Folter, Bibliography: Its History and Development (1984) no. 36.

Filed under: Bibliography, Book Trade, Indexing & Seaching Information, Organization of Information / Taxonomy | Bookmark or share this entry »

1600 – 1650

Depiction of Record Keeping by Pieter Breughel the Younger 1620 – 1640

A painting by Pieter Breughel the Younger, of which one copy dated 1621 entitled the Village Lawyer is in the Museum voor Schone Kunster, Ghent, Belgium, and another copy dated 1620-40, and entitled Paying the Tax is in the Armand Hammer collection at the University of Southern California Fisher Museum of Art, perhaps caricatures the way paper accounting or legal records were maintained at the time. Records are shown in piles of bundles on tables, in bundles on shelves, in what appears to be sacks of bundles hanging on walls, in sheets of paper bundled together that may be tacked up on walls, and in piles on the floor. In short the methods of organizing and storing information appear sloppy, inefficient, and possibly chaotic.

Filed under: Accounting / Business Machines, Art , Data Storage / Memory, Indexing & Seaching Information | Bookmark or share this entry »

1650 – 1700

A Universal Bibliography but Only for "A and B" 1699

Christoph Hendreich publishes only the first volume (A-B) of Pandectae Brandeburgicae Continentes I. Bibliothecam. . . Auctorum inpressorum [!] & Manuscr. partem. . . nomina plurimorum, Anonymorum, Pseudonymorum & c. explicata. . . II. Indicem materiarum praecipuarum.

Named for the Great Elector of Brandenburg whom Hendreich served as librarian, this was was an attempt to produce a universal author bibliography of books and manuscripts. The first volume covering letters A and B listed 50,000 works by 15,000 authors, reflective of the significant growth of information by the end of the seventeenth century. The author, who died in 1702 did not live to complete any further volumes.

Breslauer & Folter, Bibliography: Its History and Development (1984) no. 92.

Filed under: Bibliography, Indexing & Seaching Information, Organization of Information / Taxonomy | Bookmark or share this entry »

1750 – 1800

The Central Enterprise of the French Enlightenment 1751 – 1780

French philosopher, art critic, and writer Denis Diderot and French mathematician, mechanician, physicist and philosopher Jean le Rond d'Alembert write and edit the Encyclopédie ou dictionnaire des sciences, des arts et des métiers, par une société‚ de gens de lettres in 17 folio volumes of text plus 11 folio volumes (i.e., 10 volumes in 11) of plates. The first 7 volumes were published in Paris, but volumes 8 to 17 had to be published under a false Neuchâtel imprint. The main work appeared between 1751 and 1772. A supplement of 4 volumes plus one plate volume was published in Paris and Amsterdam from 1776 to 1777. The Table analytique et raisonnée for the set was published in 2 folio volumes in Paris and Amsterdam in 1780. Altogether there were 35 volumes, with 71,818 articles, and 3,129 plates.

The central enterprise of the French Enlightenment, the Encyclopédie embodied that movement's liberal, anti-clerical and scientific spirit, its preoccupation with man as a creature of nature, and its conception of culture and society as mutable products of the evolutionary processes of history. As such, the work challenged the twin authorities of the French monarchy and the Catholic Church, both of which derived their power from the traditional belief in a divinely ordained, unchanging order. Well aware of the dangers of affronting such powerful authorities, the philosophes who contributed to the Encyclopédie relied heavily on irony and subterfuge in their attacks on the established order, but the epistemological basis of these attacks was clearly stated in the Encyclopédie's "Discourse préliminaire," written by d'Alembert, who, "although he formally acknowledged the authority of the church, . . . made it clear that knowledge came from the senses and not from Rome or Revelation" (Darnton, The Business of Enlightenment: A Publishing History of the Encyclopédie 1775-1800 [1979] 7).

"The Encyclopédie was an innovative encyclopedia in several respects. Among other things, it was the first encyclopedia to include contributions from many named contributors, and it was the first general encyclopedia to lavish attention on the mechanical arts. Still, the Encyclopédie is famous above all for representing the thought of the Enlightenment. According to Denis Diderot in the article 'Encyclopédie,' the Encyclopédie's aim was 'to change the way people think.' "(Wikipedia article on Encyclopédie, accessed 01-26-2010).

The first seven volumes of the Encyclopédie were produced in relative safety, due in part to the support of powerful protectors, notably Madame de Pompadour, but official tolerance came to an end in 1759, when the Encyclopédie was condemned by the Parlement of Paris and placed on the Index librorum prohibitorum by Pope Clement XIII. Diderot was forced to complete the remaining ten volumes in secret and to publish them under a false Neuchâtel imprint.  "In truth, secular authorities did not want to disrupt the commercial enterprise, which employed hundreds of people. To appease the church and other enemies of the project, the authorities had officially banned the enterprise, but they turned a blind eye to its continued existence" (Wikipedia).

A high percentage of the Encyclopédie's 71,818 articles were written by Diderot and d'Alembert themselves, with another large portion, about 400 articles, written by the Baron d'Holbach. Other famous contributors included Jean-Jacques Rousseau and Voltaire. The most prolific contributor was the French scholar Louis de Jaucourt who wrote 17,266 articles, or about 8 per day between 1759 and 1765.   

The Encyclopédie was a considerable commercial success, resulting in a print run of 4250 copies (Wikipedia), much larger than the typical print run of most publications at the time.

Lough, Essays on the Encyclopédie of Diderot and d'Alembert (1968) provides an authoritative bibliographical study and identifies the authors of a significant percentage of the unsigned articles. 

Carter & Muir, Printing and the Mind of Man (1967) no. 200.  Hook & Norman, The Haskell F. Norman Library of Science and Medicine (1991) no. 637.

♦ There are numerous versions of the Encyclopédie online. The ARTFL Encyclopédie Database from the University of Chicago contains "20.8 million words, 400,000 unique forms, 18,000 pages of text, 17 volumes of articles, and 11 volumes of plate legends." There is also the Encyclopedia of Diderot and d'Alembert Collaborative Translation Project at the University of Michigan. The entire searchable French text and all the illustrations are available at http://diderot.alembert.free.fr/ (accessed 04-21-2010).

Filed under: Book Illustration, Indexing & Seaching Information, Organization of Information / Taxonomy, Publishing, Science, Technology | Bookmark or share this entry »

Diderot on Information Overload 1755

French writer and philosopher Denis Diderot publishes in the Encyclopédie ou dictionnaire des sciences, des arts et des métiers, par une société‚ de gens de lettres an article entitled Encyclopédie. In that he explained that a primary reason for undertaking this enormous writing and publishing project was to manage information overload by providing a rational and comprehensible order to what was already an almost impossibly large and disorganized body of information. 

I preface my remarks About the Database with a brief quotation from Diderot's article. Equally relevant is this somewhat longer quotation, which places Diderot's partially self-deprecating thoughts in better context:

"As long as the centuries continue to unfold, the number of books will grow continually, and one can predict that a time will come when it will be almost as difficult to learn anything from books as from the direct study of the whole universe. It will be almost as convenient to search for some bit of truth concealed in nature as it will be to find it hidden away in an immense multitude of bound volumes. When that time comes, a project, until then neglected because the need for it was not felt, will have to be undertaken.

"If you will reflect on the state of literary production in those ages before the introduction of printing, you will form a mental picture of a small number of gifted men who are occupied with composing manuscripts and a very numerous body of workmen who are busy transcribing them. If you look ahead to a future age, and consider the state of literature after the printing press, which never rests, has filled huge buildings with books, you will find again a twofold division of labor. Some will not do very much reading, but will instead devote themselves to investigations which will be new, or which they will believe to be new (for if we are even now ignorant of a part of what is contained in so many volumes published in all sorts of languages, they will know still less of what is contained in those same books, augmented as they will be by a hundred—a thousand—times as many more). The others, day laborers incapable of producing anything of their own, will be busy night and day leafing through these books, taking out of them fragments they consider worthy of being collected and preserved. Has not this prediction already begun to be fulfilled? And are not several of our literary men already engaged in reducing all big books to little ones, among which there are still to be found many that are superfluous. Let us assume that their extracts have been competently made, and that these have been arranged in alphabetical order and published in an orderly series of volumes by men of intelligence—you have an encyclopedia!

"Thus we have now undertaken, in the interests of learning and for the sake of the human race, a task to which our grandsons would have had to devote themselves; but we have done so under more favorable circumstances, before a superabundance of books should have accumulated to make its execution extremely laborious" (translation in Baker (ed) The Old Regime and the French Revolution (1987) 85-86).

Filed under: Book History, Indexing & Seaching Information, Organization of Information / Taxonomy, Printing / Typography | Bookmark or share this entry »

166.5 Volumes of Text but No Comprehensive Index! 1782 – 1832

L'Encyclopédie méthodique ou par ordre de matières par une société de gens de lettres, de savants et d'artistes; précédée d'un Vocabulaire universel, servant de Table pour tout l'Ouvrage, ornée des Portraits de MM. Diderot et d'Alembert, premiers Éditeurs de l'Encyclopédie is published in 206 volumes by French publisher and writer Charles-Joseph Panckoucke and his daughter Therese-Charlotte Agasse.

The Encyclopédie méthodique was a revised and expanded version, arranged by subject matter, of the alphabetically-arranged Encyclopédie, ou dictionnaire raisonné des sciences. . . . compiled by Diderot et d'Alembert.

"Two sets of Diderot's Encyclopédie, and its supplements, were cut up into articles. Each subject category was entrusted to an exclusive editor, whose job was to collect all articles relating to his subject, and exclude those belonging to others. Great care was to be taken of those articles that were of a doubtful nature, which were not to be omitted. For certain topics, such as air, which belonged equally to chemistry, physics and medicine), the methodical arrangement had the unexpected effect of breaking up a single article into several parts. Each volume was to have its own introduction, a table of contents, and a history of the Encyclopédie. The whole work was to be linked together by a Vocabulaire Universel (Vol. 1 - 4), with references to all locations where each word appears.

"The prospectus, issued early in 1782, proposed three editions, each with seven volumes of 250 to 300 plates:

84 volumes;

43 volumes, with 3 columns per page; and

53 volumes of about 100 sheets, with 2 columns per page. . . .

"The livraisons (home-deliveries) was to be in two volumes each, the first (Jurisprudence, Vol. 1., Literature, Vol. 1,) to appear in July 1782, and the whole to be finished by 1787. The number of subscribers, 4072, was so great that the subscription list of 672 livres was closed on April 30. Twenty-five printing offices were employed, and in November 1782, the first livraison (Jurisprudence, Vol. 1, and half volume each, of arts et métiers and histoire naturelle) was issued (Wikipedia article on Encyclopédie methodique, accessed 01-21-2010).

"The Encyclopédie méthodique was issued in parts piecemeal, each instalment consisting of a number of half-volumes of different dictionaries. Though initial progress was encouraging, it quickly became apparent that more wholesale revision of Diderot’s original was called for than Panckoucke had envisaged. Not only were there inadequacies in the original work; many of the disciplines had moved on since 1751. In some cases the developments occurred while the Encyclopédie was being published: Chémie reflects the new theories of Lavoisier regarding combustion which were being formulated as the early volumes were published, and the publication of Système anatomique was long delayed, in part because of the way in which the discipline was being restructured (by its editor Vicq-d’Azyr and others) in the 1790s. Several new dictionaries were added to the scheme to cover subjects that had originally been overlooked, such as music, architecture and forestry. By 1788, a year after the dictionary was supposed to have been completed, it had reached 53 volumes, the original projected total, and was obviously less than halfway to completion.  

"As the publication grew more and more unwieldy, Panckoucke resorted to a number of measures to ensure its continued financial viability. He attempted to placate his impatient subscribers with a series of announcements emphasising the unprecedented scale of the undertaking, the great difficulties he was having in bringing it to fruition and the considerable improvements that were being made. He added an Atlas encyclopédique to the original scheme and a series of natural history plates with accompanying text (entitled Tableau encyclopédique et máthodique des trois règnes de la nature) which subscribers could pay for as an optional extra. In 1790 a number of new dictionaries were introduced on lighter subjects with titles such as Amusemens des sciences mathématiques and Dictionnaire des jeux familiers to attract more subscribers. Meanwhile publication of some of the major series was stalled owing to the editors’ other engagements, indispositions or deaths. Subscribers had to stockpile the individual parts of each series in order, sometimes for many years, before having them bound together. The extremely complex publishing history is one reason why sets of the Méthodique are rarely found complete—and why there is widespread disagreement among bibliographers over what a complete set of the Méthodique should actually comprise.

"The outbreak of the Revolution threw more obstacles in Panckoucke’s way. Printing in Paris grew prohibitively expensive as an explosion of new journals and pamphlets took up the printers’ time and bills for wages and paper grew larger. Panckoucke responded by opening a huge print-shop of his own and turning to provincial printers to maintain the momentum of his great project. He started yet another dictionary on the Assemblée nationale constituante, intended as a supplement and successor to the dictionaries of jurisprudence, commerce and economy which had been completed just in time to be rendered obsolete by the fall of the Bastille. This particular series petered out after just one volume. Inevitably the Revolution hit Panckoucke’s customer base; many wealthy subscribers fled into exile or lost their fortune, depriving him of over 2000 subscribers. At the same time his writers, involved in political work or journalism, were finding it harder and harder to produce copy. At least one fell foul of the Revolution: Jean-Marie Roland de la Platière, editor of Man ufactures, arts et métiers, committed suicide in 1793 on hearing of the condemnation of his wife. In 1794, stricken by depression, Panckoucke admitted defeat and signed over the Encyclopédie, along with his entire business, to his son-in-law Henri Agasse.  

"Agasse continued to issue numbers of the encyclopedia until his death in 1816, when it was taken over by his widow, Panckoucke’s daughter Pauline. She finally brought “l’entreprise la plus vaste du dix-huitième siècle” to a close in 1832 with the last volume of Histoire naturelle. It is difficult to imagine that many of the original subscribers were still around to see it completed. By this time it extended to (according to the most generally accepted estimate) 166½ volumes of text" (http://www.lib.cam.ac.uk/deptserv/rarebooks/encmeth.html, accessed 01-21-2010).

"When 'completed', the encyclopedia suffered one great weakness. Many dictionaries have a classed index of articles; that of economie politique, being a very excellent example, giving the contents of each article, so that any passage can be found easily. As the Vocabulaire Universel, the key and index to the entire work, was not published, it was difficult to carry out any research or to find all the articles on any particular subject. The original parts had often been subdivided, and had been so added on to by other dictionaries, supplements and appendices, such that, without going into great detail, an exact account could not be given of the work, which contained 88 alphabets, 83 indexes, 166 introductions, discourses, prefaces, etc.

"Probably no more an unmanageable body of dictionaries has ever been published, except Jacques Paul Migne's Encyclopédie théologique, Paris, 1844-1875, with 168 volumes, 101 dictionaries, and 119,059 pages. Encyclopédie méthodique par ordre des matières occupied a thousand workers in production, and 2,250 contributors" (Wikipedia article on Encyclopédie methodique).

 

Filed under: Indexing & Seaching Information, Organization of Information / Taxonomy, Publishing | Bookmark or share this entry »

1800 – 1850

The First Thematic Index of a Composer's Work, Based on Mozart's Own Index 1805

Composer and music publisher Johann Anton André publishes Thematisches Verzeichniss sämmtlicher Kompositionen von W. A. Mozart.

This was:

"the first thematic index of a composer's works (and probably the first book [on music] produced by lithographic process). André, a composer and, as music publisher, successor to his equally famous father, Johann, had in 1800 acquired Mozart's manuscripts, including his [Mozart's own] 'Verzeichniss aller meiner Werke,' on which this index is based" (Breslauer & Folter, Bibliography: Its History and Development [1984] no. 116).

Filed under: Bibliography, Indexing & Seaching Information, Manuscripts & Manuscript Copying, Music , Printing / Typography | Bookmark or share this entry »

Panizzi's 91 Rules for Standardizing the Cataloguing of Books 1841

Antonio Panizzi, Keeper of the Department of Printed Books at the British Museum (now the British Library), publishes 91 Rules for Compilation of the Catalogue. 

These rules represented the first attempt to standardize cataloguing.  They appeared in the Catalogue of Printed Books in the British Museum, Volume 1, pp. v-ix.

Various of the rules reflect social attitudes of the day. For example:

"V. Works of Jewish Rabbis, as well as works of Oriental writers in general, to be entered under their first name."

Concerning the rules and the catalogue Panizzi wrote in his preface to the first volume:

"The rules on which this Catalogue is based were sanctioned by the Trustees on the 13th of July, 1839; and, with the exception of such modifications as have been found necessary in order to accelerate the progress of the work, they have been strictly adhered to. Some additional rules, the want of which was not foreseen at the commencement, are printed in italics.

"The application of the rules was left by the Trustees to the discretion of the Editor, subject to the condition that a Catalogue of the printed books in the library up to the close of the year 1838 be completed within the year 1844. With a view to the fulfillment of this undertaking it was deemed indispensable that the Catalogue should should be put to press as soon as any portion of the manuscript could be prepared; consequently the early volumes must present omissions and inaccuracies, which it is hoped, will diminish in number as the work proceeds.

"In giving to the world the first volume of a Catalogue, which promises to be of an unprecedented extent, the Editor thinks that it would be premature to name each gentleman in his department to whose zeal and talents he is indebted for much that will add to its usefulness. He looks forward to a continuation of the same assistance; and he, therefore, reserves till after the conlusion of the work the particular expression of his obligations.

"British Museum, July 15th, 1841

"A. Panizzi"

Filed under: Bibliography, Indexing & Seaching Information, Libraries | Bookmark or share this entry »

1850 – 1875

Early Proposal for a National Union Catalogue 1852

Charles C. Jewett, librarian of the Smithsonian Institution, publishes On the Construction of Catalogues of Libraries and Their Publication by Means of Separate Stereotyped Titles With Rules and Examples.

Jewett described a plan for a national union catalogue of public libraries.

"His [Jewett's] intention was to secure general uniformity of bibliographic records through a system of "stereotyping" each title. This plan would have made it possible for libraries to print annual editions of their catalogs, incorporating the titles acquired 'during the previous year in each new edition, and for the Smithsonian to print a general union catalog which would have included' both its own holdings and those of all the public libraries. The uniformity Jewett sought was to be achieved not just through stereotyping but also through use of a single set of general cataloging rule which would be used by all the libraries. In the same year Jewel published a report titled On the Construction of Catalogues of Libraries which, among other things, set forth the first American cataloging rules for establishing headings for author entries. The report contained thirty-nine rules which were based on those of Panizzi. In fact Jewett acknowledged outright that he used some of Panizzi's rules verbatim. And Jewett's stated goal of serving the needs of users also reflected Panizzi s ideas. Though his project never came to final fruition, years later his goal of compiling a union catalog was met in the United States when the National Union Catalog began publication in 1953 and in Germany as early as 1899 when the Prussian Instructions was compiled under Jewett's influence" (Huford, The Pragmatic Basis of Catalog Codes: Has the User Been Ignored [2007] 29).

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

Roget's Thesaurus April 29, 1852

Peter Mark Roget  publishes his Thesaurus of English Words and Phrases Classified so as to Facilitate the Expression of Ideas and to Assist in Literary Composition, the manuscript for which he had originally written in 1805, nearly 50 years before publication. The 15,000 words it contained were arranged conceptually rather than alphabetically, incorporating 1002 concepts, in six classes derived from Aristotelian, Leibnizian principles of classification:

  1. Abstraction Relations
  2. Space
  3. Matter
  4. Intellect
  5. Volition
  6. Affections

The Thesaurus contained synonyms, in contrast to a dictionary, which contains definitions and pronunciations

"Roget's Thesaurus is composed of six primary classes. Each class is composed of multiple divisions and then sections. This may be conceptualized as a tree containing over a thousand branches for individual "meaning clusters" or semantically linked words. These words are not exactly synonyms, but can be viewed as colours or connotations of a meaning or as a spectrum of a concept. One of the most general words is chosen to typify the spectrum as its headword, which labels the whole group.

"Roget's schema of classes and their subdivisions is based on the philosophical work of Leibniz (see Leibniz — Symbolic thought), itself following a long tradition of epistemological work starting with Aristotle. Some of Aristotle's Categories are included in Roget's first class "abstract relations". The Wikipedia "category schemes" are also based on the same principles" (Wikipedia article on Roget's Thesaurus, accessed 11-28-2008).

"In information technology, a thesaurus represents a database or list of semantically orthogonal topical search keys. In the field of Artificial Intelligence, a thesaurus may sometimes be referred to as an ontology.

"Thesaurus databases, created by international standards, are generally arranged hierarchically by themes and topics. Such a thesaurus places each term in context, allowing a user to distinguish between "bureau" the office and "bureau" the furniture. A thesaurus of this type is often used as the basis of an index for online material. The Art and Architecture Thesaurus, for example, is used to index the national databases of museums" (Wikipedia article on Thesaurus, accessed 11-28-2008).

The printing of the first edition was 1000 copies. The original manuscript for Roget's Thesaurus is preserved in the Karpeles Manuscript Library Museum

Kendall, The Man Who Made Lists: Love, Death, Madness, and the Creation of Roget's Thesaurus (2008).

Filed under: Indexing & Seaching Information, Organization of Information / Taxonomy | Bookmark or share this entry »

The Basis for a Catalogue Code 1856

Bibliographer Andrea Crestadoro, an acquaintance of Anthony Panizzi, exasperated with delays in production of the British Museum Catalogue of Printed Books, publishes anonymously The Art of Making Catalogues of Libraries, or a Method to Obtain a Most Perfect Complete and Satisfactory Printed Catalogue of the British Museum Library by a Reader Therein.

Crestadoro's booklet served as basis for a catalogue code. "In it he advocated the idea of the 'inventorial' catalog which would have detailed entries arranged in order of accession. The library patron was to be provided access to the entries through an alphabetical index of names and subjects. The Public Library of Manchester, England adopted this approach for its catalog and hired Crestadoro to implement it there in 1864. Like Panizzi, Crestadoro intended to have his catalog serve the needs of catalog users, but the rules of his code were not based on an empirical investigation of those needs" (Huford, The Pragmatic Basis of Catalog Codes: Has the User Been Ignored [2007] 29).

At the end of his pamphlet Crestadoro advocated production of a universal catalogue of all publications.

Filed under: Bibliography, Indexing & Seaching Information, Libraries | Bookmark or share this entry »

1875 – 1900

The Last Library Cataloguing Code Written by One Person 1876

Charles Ammi Cutter publishes Rules for a Printed Dictionary Catalogue, the last library cataloguing code written by one person.

"In his prefatory note, Cutter claimed to be the first investigator of the 'first principles of cataloguing' and the first to 'set forth the rules in a systematic way.' One of the principles he expostulated was that 'the convenience of the user should be preferred to the ease of the cataloguer.' Cutter urged catalogers to do such things as select the customary use of the names of subjects and the best known form of the author's name so that this goal might be fulfilled. The code's introduction lists objectives and means to bring about this convenience. These objectives and means have been studied for years by students of cataloging code history. Exactly how the 'convenience of the user' would be determined Cutter did not specify; he himself, it would seem, relied upon his own experience rather than any systematic study of user needs or behavior. No one else did such a study during these years either: such things as survey research and transaction log analysis were twentieth century phenomena" (Huford, The Pragmatic Basis of Catalog Codes: Has the User Been Ignored? [2007] 29]

Filed under: Bibliography, Indexing & Seaching Information, Libraries | Bookmark or share this entry »

The First Extensively Used Scientific Method of Criminal Identification 1879

Alphonse Bertillon first publishes a description of his method of anthropometry.

He developed this system, which used five measurements-- head length, head breadth, length of middle finger, length of left foot, and length of forearm from elbow to extremeity of middle finger  — as a means for identifying people. It was the first scientific method for the identification of criminals. Until this time, criminals could only be identified based on eyewitness accounts, which were known to be unreliable. Bertillon first employed his method, which was eventually called "Bertillonage" in the successful identification of a criminal in 1883. It became the first extensively used scientific method of criminal identification.

Filed under: Indexing & Seaching Information, Organization of Information / Taxonomy, Science | Bookmark or share this entry »

Index Medicus Begins 1879

Under the direction of John Shaw Billings, the Library of the Surgeon General's Office (to be redesignated in 1956 the National Library of Medicine) begins publication of the Index Medicus -- an effort to index all of medical periodical literature.

Index Medicus finally ceased publication in print in 2004.

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Medicine, Science | Bookmark or share this entry »

A Landmark in Efforts to Organize Information and Make it Searchable 1880

John Shaw Billings begins publication of the The Index-Catalogue of the Library of the Surgeon-General’s Office.

This became a landmark in the history of efforts to organize information and to make it searchable, and a primary general reference for the history of medicine and science. The fifith and final series was issued in 1961. The finished set of printed books contained "over 4.5 million. . . references to over 3.7 million bibliographic items.  2.5 million items are primarily journal articles; 250,000 items are monographs (books, pamphlets, and reports); approximately 300,000 items are dissertations (theses); and 16,000 are journal titles. Series 1 and Series 2 include portraits as separate citations but Series 3, 4, and 5 indicate portraits in descriptive notes for monographs and dissertations."

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Medicine, Organization of Information / Taxonomy | Bookmark or share this entry »

Fingerprints as a System of Identification October 8, 1880

In a letter published in the journal Nature, Henry Faulds, a physician and missionary working in Japan, is the first to propose the use of fingerprints as a system of identification, including the scientific identification of criminals: "On the Skin-Furrows of the Hand."

Faulds wrote: 

"I am sanguine that the careful study of these patterns may be useful in several ways.

1. We may perhaps be able to extend to other animals the analogies found by me to exist in the monkeys.

2. These analogies may admit of further analysis, and may assist, when better understood, in ethnological classifications.

3. It so, those which are found in ancient pottery may become of immense historical importance.

4. The fingers of mummies, by special preparation, may yield results for comparison. I am very doubtful, however, of this.

5. When bloody finger-marks or impressions of clay, glass, &c., exist, they may lead to the scientific identification of criminals " (http://www.clpex.com/Articles/History/Faulds1880.htm, accessed 03-27-2010).

Filed under: Crimes / Forgeries / Hoaxes , Indexing & Seaching Information, Science | Bookmark or share this entry »

Finger Prints as a Means of Identification 1892

Francis Galton publishes a detailed statistical model of fingerprint analysis and identification, and encourages their use in forensic science in his book, Finger Prints.

Filed under: Indexing & Seaching Information, Organization of Information / Taxonomy, Science | Bookmark or share this entry »

An Analog Search Engine 1895

Paul Otlet and Henri La Fontaine found the Institut International de Bibliographie. "In 1895, Otlet and La Fontaine also began the creation of a a collection of index cards, meant to catalog facts, that came to be known as the "Repertoire Bibliographique Universel" (RBU), or the 'Universal Bibliographic Repertory'. By the end of 1895 it had grown to 400,000 entries; later it would reach a height of over 15 million.

"In 1896, Otlet set up a fee-based service to answer questions by mail, by sending the requesters copies of the relevant index cards for each query; scholar Alex Wright has referred to the service as an 'analog search engine'. By 1912, this service responded to over 1,500 queries a year. Users of this service were even warned if their query was likely to produce more than 50 results per search.

"Otlet envisioned a copy of the RBU in each major city around the world, with Brussels holding the master copy. At various times between 1900 and 1914, attempts were made to send full copies of the RBU to cities such as Paris, Washington, D.C. and Rio de Janeiro; however, difficulties in copying and transportation meant that no city received more than a few hundred thousand cards" (Wikipedia article on Paul Otlet, accessed 03-02-2009).

In 1931 the Institut International de Bibliographie was renamed the Institut International de Documentation, IID (International Federation for Information and Documentation.)

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

The Cumulative Book Index February 1898

Halsey William Wilson publishes the first issue of the Cumulative Book Index

"As a bookseller, Wilson had to constantly search through publishers' catalogs in order to keep track of currently published books that his customers might want. It was tedious and time-consuming work that prompted him to long for a comprehensive, up-to-date index of published works. He eventually decided to create such an index himself. What made the concept work economically was Wilson's idea to keep the publication current by placing each entry on a printer's "slug," which could then be later sorted with slugs from new entries. It may have been an obvious solution to someone who had experience as a job printer, but it was a revolutionary concept in bibliographical publishing. In February 1898 Wilson first published Cumulative Book Index, a comprehensive alphabetic list of currently published books in English, featuring the key elements of future Wilson indexes: the listing of author, title, and subject. The work sold for $1 to 300 subscribers, who would then receive periodically updated versions."

Filed under: Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

1900 – 1910

The Reader's Guide to Periodical Literature 1901

Halsey William Wilson publishes the first issue of the Reader’s Guide to Periodical Literature.

Filed under: Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy, Publishing | Bookmark or share this entry »

LC Cards 1901

The Library of Congress begins making printed Library of Congress catalogue cards (LC cards) available to libraries, thus promoting the development of catalogue card systems.

Filed under: Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

1910 – 1920

"Die Brucke" and its Goals for a World Information Clearing House 1911

Karl Wilhelm Bührer and Adolf Saager publish Die Organisierung der geistigen Arbeit durch die Brücke (The Organization of Intellectual Work through the Bridge). This book described the aims of The Bridge, an institution founded on 11 June 1911 with the financial support of Wilhelm Ostwald who donated his Nobel Prize money for the purpose.

Concerning The Bridge Thomas Hapke wrote:

" 'Die Brücke is planned as a central station, where any question which may be raised with respect to any field of intellectual work whatever finds either direct answer or else indirect, in the sense that the inquirer is advised as to the place where he can obtain sufficient information' (Ostwald, 1913, p. 6, English original).

"The Bridge was supposed to be the information office for the information offices, a 'bridge' between the 'islands' where all other institutions—associations, societies, libraries, museums, companies, and individuals— 'were working for culture and civilization' (Die Brücke, 1910–1911). The organization of intellectual work was intended to occur 'automatically' through the general introduction of standardized means of communication— the monographic principle, standardized formats, and uniform indexing (Registraturvermerke) for all publications. The following facilities were planned: a collection of addresses, a Brückenarchiv as a 'comprehensive, illustrated world encyclopedia on sheets of standardized formats,' which should contain a world dictionary and a world museum catalog; a rückenmuseum; and a head office and Hochschule (college) for organization. 'Close cooperation' with the Institut Internationale de Bibliographie in Brussels was also planned."

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Museums, Organization of Information / Taxonomy, Paper / Papyrus / Parchment / Vellum | Bookmark or share this entry »

1930 – 1940

An Electronic Machine for Searching Through Information December 29, 1931

Emanuel Goldberg of Zeiss Ikon receives U.S. Patent No. 1,838,389 for a "Statistical Machine."

The patent, applied for in 1928, and similar patents obtained in other countries, describe an electronic machine for searching through data encoded on reels of film, using "radiating energy to actuate a recorder when the explored indications upon the search plate and record element are identical, the indications on one of said elements being penetrable by the rays and the indication on the other element being impenetrable by the rays."

Vannevar Bush incorporated technology similar to this in the Rapid Selector machine on which he began development in 1938. The existence of Goldberg's patent prevented Bush from patenting his Rapid Selector. Bush's machine became famous after publication in 1945 of his article, "As We May Think" describing the Memex.

Filed under: Electronic Media, Indexing & Seaching Information, Technology | Bookmark or share this entry »

Bradford's Law January 26, 1934

In a paper entitled "Sources of Information on Specific Subjects," (Engineering 137 [1934], 85-6), Samuel C. Bradford publishes Bradford's Law of the "exponentially diminishing returns of extending a library search."

Filed under: Indexing & Seaching Information, Libraries | Bookmark or share this entry »

H. G. Wells and the "World Brain" 1938

H. G. Wells publishes a book of essays and speeches entitled World Brain which includes an essay entitled "The Idea of a Permanent World Encyclopaedia."

This essay first appeared in the new Encyclopédie Française, August, 1937. Another essay entitled "The Brain Organization of the Modern World" described Wells' vision for

". . .a sort of mental clearing house for the mind, a depot where knowledge and ideas are received, sorted, summarized, digested, clarified and compared." (p. 49)

Wells believed that technological advances such as microfilm could be utilized towards this end so that

"any student, in any part of the world, would be able to sit with his projector in his own study at his or her convenience to examine any book, any document, in an exact replica" (p. 54).

Filed under: Indexing & Seaching Information, Internet & Networking , Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

Vannevar Bush's "Rapid Selector" 1938

Vannevar Bush begins development of the Rapid Selector machine for information retrieval from rolls of microfilm. He will publish a general description of the aims of this machine in his 1945 article, As We May Think.

Filed under: Indexing & Seaching Information | Bookmark or share this entry »

1945 – 1950

"As We May Think" July 1945

Vannevar Bush publishes an article entitled "As We May Think" in the Atlantic Monthly (Vol. 176, No. 1 [1945] 641-49) describing the Memex, an electromechanical microfilm machine evolved from his "Rapid Selector "project, capable of making permanent associative links in information. This hypothetical  machine foreshadowed aspects of the personal computer and hyperlinks on the Internet. (See Reading 13.1.)

Filed under: Computers & the Human Brain, Indexing & Seaching Information, Internet & Networking , Organization of Information / Taxonomy | Bookmark or share this entry »

The Illustrated Version of "As We May Think" September 1945

Vannevar Bush publishes a condensed, illustrated version of "As We May Think" in Life magazine, 19, No. 11 (1945) 112-114, 116, 121, 123-24.

Life's editors added the following subtitle: "A Top U.S. Scientist Foresees a Possible Future World in Which Man-Made Machines Will Start to Think." They also replaced the Atlantic Monthly's numbered sections with headings, and added illustrations of the "cyclops camera,' the "supersecretary" and the "Memex" in the form of a desk. This was the first published illustration of what the Memex might look like.

In From Memex to Hypertext: Vannever Bush and the Mind's Machine (1991) James Nyce and Paul Kahn published a version of "As We May Think" that shows the differences between the two 1945 published versions of Bush's essay. Nyce and Kahn also developed a brief animated film showing how the Memex might have operated. You can download it at this link: http://sloan.stanford.edu/MouseSite/Secondary.html

Filed under: Computers & the Human Brain, Indexing & Seaching Information, Organization of Information / Taxonomy | Bookmark or share this entry »

Developing Vannevar Bush's Rapid Selector 1949

Ralph R. Shaw, Director of Libraries for the U.S. Department of Agriculture, in collaboration with Engineering Research Associates of St. Paul, Minnesota, using funds provided by the Office of Technical Services of the Department of Commerce, develops the Rapid Selector machine for the electronic searching of information recorded in reels of film.

Shaw's device incorporated technology developed by Emanuel Goldberg in 1928-1931, and by Vannevar Bush starting in 1938. Shaw's Rapid Selector was an attempt to realize goals described in Bush's 1945 publication, As We May Think.

Filed under: Electronic Media, Indexing & Seaching Information, Libraries | Bookmark or share this entry »

The Origins of Humanities Computing 1949

"In 1949, an Italian Jesuit priest, Father Roberto Busa, began what even to this day is a monumental task: to make an index verborum of all the words in the works of St Thomas Aquinas and related authors, totaling some 11 million words of medieval Latin. Father Busa imagined that a machine might be able to help him, and, having heard of computers, went to visit Thomas J. Watson at IBM in the United States in search of support. Some assistance was forthcoming and Busa began his work. The entire texts were gradually transferred to punched cards and a concordance program written for the project. The intention was to produce printed volumes, of which the first was published in 1974" (Busa, R. Index Thomisticus, 1974- ).

"A purely mechanical concordance program, where words are alphabetized according to their graphic forms (sequences of letters), could have produced a result in much less time, but Busa would not be satisfied with this. He wanted to produce a "lemmatized" concordance where words are listed under their dictionary headings, not under their simple forms. His team attempted to write some computer software to deal with this and, eventually, the lemmatization of all 11 million words was completed in a semiautomatic way with human beings dealing with word forms that the program could not handle. Busa set very high standards for his work. His volumes are elegantly typeset and he would not compromise on any levels of scholarship in order to get the work done faster. He has continued to have a profound influence on humanities computing, with a vision and imagination that reach beyond the horizons of many of the current generation of practitioners who have been brought up with the Internet. A CD-ROM of the Aquinas material appeared in 1992 that incorporated some hypertextual features ("cum hypertextibus") and was accompanied by a user guide in Latin, English, and Italian. Father Busa himself was the first recipient of the Busa award in recognition of outstanding achievements in the application of information technology to humanistic research, and in his award lecture in Debrecen, Hungary, in 1998 he reflected on the potential of the World Wide Web to deliver multimedia scholarly material accompanied by sophisticated analysis tools" (Hockey, "The History of Humanities Computing," A Companion to Digital Humanities, Shreibman, Siemens, and Unsworth[eds.] [2004], accessed 03-26-2009).

Filed under: Indexing & Seaching Information, Organization of Information / Taxonomy, Religious Texts / Religion | Bookmark or share this entry »

One of the Earliest Projects in Library Automation April 1949

Librarian Sanford Larkey publishes The Army Medical Library Research Project at the Welch Medical Library. This was one of the earliest projects in library automation and information retrieval. 

Filed under: Computing & Medicine / Biology, Indexing & Seaching Information, Libraries | Bookmark or share this entry »

1950 – 1955

The First Textbook on How to Build an Electronic Computer 1950

Engineering Research Associates publishes High-Speed Computing Devices, the first textbook on how to build an electronic digital computer.

Written in the form of a “cookbook,” the book describes available computer components and how they worked. It has extensive bibliographies of the American computing literature and some of the English, and contains a brief reference to Vannevar Bush's Rapid Selector information retrieval device then under development.

Filed under: Computer & Calculator Industry, Data Processing / Computing, Indexing & Seaching Information | Bookmark or share this entry »

Coining the Expression, Information Retrieval 1950

American computer scientist Calvin Mooers coins the expression information retrieval in "the Zator Technical Bulletin No. 48 (1950), a publication of the Cambridge, Mass.-based Zator Co.- which Mooers founded in 1947-with the following definition: 'The requirements of information retrieval, of finding information whose location or very existence is a priori unknown. . . .'(http://www.garfield.library.upenn.edu/commentaries/tsv11(06)p09y19970317.pdf, accessed 01-16-2010).

Filed under: Indexing & Seaching Information | Bookmark or share this entry »

Applying New Technology to the Searching and Storage of Information 1951

Louis N. Ridenour, Ralph R. Shaw, and Albert G. Hill publish a thin volume entitled Bibliography in an Age of Science.

This book published three lectures delivered at the University of Illinois the previous year. Though it was preceded by journal articles and technical reports, this may be the first separately published book to address the problems of applying new technologies to the searching and storage of printed information in libraries.

Shaw's article includes illustrations on pp. 60-61 of the Rapid Selector prototype which was in operation at this time. This machine, which applied the ideas of Emanuel Goldberg and the Memex idea of Vannevar Bush, stored 72,000 frames of information on a 2,000 foot reel of film. The prototype could search through the data at the rate of 78,000 "codes per minute." "Improvement of this searching speed to 120,000 codes per minute is now in sight."

Filed under: Bibliography, Data Storage / Memory, Indexing & Seaching Information, Libraries | Bookmark or share this entry »

Applying Computer Methods to Library Cataloguing and Research June 24 – June 27, 1952

At a meeting of the Medical Library Association Sanford Larkey reports on advances in the Welch Medical Library Indexing Project.

This project was probably the earliest effort to apply computer methods, including punched card tabulating, in library cataloguing and information retrieval.

Filed under: Indexing & Seaching Information, Libraries | Bookmark or share this entry »

The Uniterm Indexing System 1953

Mortimer Taube proposes the Uniterm Indexing system.

Filed under: Indexing & Seaching Information, Libraries | Bookmark or share this entry »

Early Library Information Retrieval System 1954

Harley Tillet builds the perhaps the first operating library information retrieval systems on a general purpose computer (IBM 701) at the Naval Ordnance Test Station (NOTS) at Inyokern, California, later called China Lake.

"Searching started with a file of about 15,000 bibliographic records, indexed only by the Uniterms, and search output was limited to report accession numbers. The task was made even more difficult by the fact that the IBM 701, a scientific calculator, did not have any built-in character representation." (Bourne)

Filed under: Bibliography, Data Processing / Computing, Indexing & Seaching Information, Libraries | Bookmark or share this entry »

Probably the First Widely-Accepted Controlled Vocabulary 1954

Probably the first widely used controlled vocabulary for searching information was the Subject Heading Authority List issued by the National Library of Medicine.

"The first official list of subject headings published by the National Library of Medicine appeared in 1954 under the title Subject Heading Authority List. It was based on the internal authority list that had been used for publication of Current List of Medical Literature which in turn had incorporated headings from the Library's Index-Catalogue and from the 1940 Quarterly Cumulative Index Medicus Subject Headings. With the inception of Index Medicus in 1960, a new and thoroughly revised Medical Subject Headings appeared.

"With the 1954 Subject Heading Authority List, there appeared a 'Categorical Listing' of standard subheadings. 'Abnormalities,' for instance, was listed as a standard subheading for use with terms for organs, tissues, and regions, and 'anesthesia and analgesia' was to be used under surgical procedure headings. But such subheadings could be used only for subject headings which fell within the category of headings to which they were to be applied. There were over 100 such subheadings, some of which varied only slightly according to the category of main heading with which they were used. For instance, 'therapeutic use' was used under physical agents and drugs and chemicals, and 'therapy' was used with diseases. In the 1960 Medical Subject Headings, the number of subheadings was reduced to sixty-seven. They could be used under any kind of main heading if the combination was not patently foolish or impossible. These sixty-seven subheadings were applied with more generalized meanings. For instance, the subheading "therapy" was used to mean 'therapy of,' 'therapeutic use of' or just 'therapeutic aspects.' Though this solution was simpler, many problems still remained. The use of one subheading might prevent the use of another. For instance, if a paper covered the etiology, pathology, and therapy of a disease, it might occur without further subdivision, or it might occur under the subheading which seemed most appropriate to the indexer. If 'therapy' was chosen, the article would be lost to the searcher looking for the etiology of the disease under the subheading 'etiology.' In addition, if the subheading 'diseases' had been appended to the term for an anatomic part, it would not be possible to subdivide further for the therapy or complications of such diseases. A related problem was the overlap in meaning of the subheadings themselves. It was difficult, for example, to decide whether a paper on chemical biosynthesis fit best under 'chemistry' or 'metabolism.'

"Categorized lists of terms were printed for the first time in the 1963 Medical Subject Headings and contained thirteen main categories and a total of fifty-eight separate groups in subcategories and main categories. These categorized lists made it possible for the user to find many more related terms than were in the former cross-reference structure. In 1963, the second edition of Medical Subject Headings contained 5,700 descriptors, compared with 4,400 in the 1960 edition. Of the headings used in the 1960 list, 113 were withdrawn in favor of newer terms. In contrast, the 2009 edition of MeSH contains 25,186 descriptors.

"In 1960, medical librarianship was on the cusp of a revolution. The first issue of the new Index Medicus series was published. On the horizon was a computerization project undertaken by the National Library of Medicine (NLM) to store and retrieve information. The Medical Literature Analysis and Retrieval System (MEDLARS) would speed the publication process for bibliographies such as Index Medicus, facilitate the expansion of coverage of the literature, and permit searches for individuals upon demand. The new list of subject headings introduced in 1960 was the underpinning of the analysis and retrieval operation. MeSH was a new and thoroughly revised version of lists of subject headings compiled by NLM for its bibliographies and cataloging. Frank B. Rogers, then NLM director, announced several innovations as he introduced MeSH in 1960" (http://www.nlm.nih.gov/mesh/2009/introduction/intro_preface.html#pref_hist. accessed 05-04-2009).

Filed under: Indexing & Seaching Information, Libraries , Medicine, Organization of Information / Taxonomy | Bookmark or share this entry »

1955 – 1960

Machine Methods for Information Searching 1955

On the completion of the Welch Medical Library Indexing Project, five authors, including Eugene Garfield, issue the Final Report on Machine Methods for Information Searching.

Filed under: Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

The Foundation of Citation Analysis July 15, 1955

Eugene Garfield publishes "Citation Indexes for Science: A New Dimension in Documentation through Association of Ideas," Science, Vol. 122, No. 3159, 108-11. This paper may be the foundation of "bibliometrics" or citation analysis.

"Eugene Garfield . . . was deeply involved in the research relating to machine generated indexes in the mid-1950's and early 1960's. One of his earliest points of involvement was a project sponsored by the Armed Forces Medical Library (predecessor to our current National Library of Medicine). The Welch Medical Library Indexing project, as it was called, was to investigate the role of automation in the organization and retrieval of medical literature. The hope was that the problems associated with subjective human judgement in selection of descriptors and indexing terms could be eliminated. By removing the human element, one might thereby increase the speed with which information was incorporated in to the indexes. It might also increase the cost-effectiveness of the indexes. Garfield grasped early on that review articles in the journal literature were heavily reliant on the bibliographic citations that referred the reader to the original published source for the notable idea or concept. By capturing those citations, Garfield believed, the researcher could immediately get a view of the approach taken by another scientist to support an idea or methodology based on the sources that the published writer had consulted and cited as pertinent in the bibliography. As retrieval terms, citations could function as well as keywords and descriptors that were thoughtfully assigned by a professional indexer."

Filed under: Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

Mechanized Encoding of Library Information 1957

Hans Peter Luhn of IBM publishes A Statistical Approach to Mechanized Encoding of Library Information.

Filed under: Indexing & Seaching Information, Libraries | Bookmark or share this entry »

Automatic Document Indexing Program 1958

Hans Peter Luhn of IBM develops an automatic document indexing program for the production of literature abstracts.

"The complete text of an article in machine-readable form is scanned by an IBM 704 data-processing machine and analyzed in accordance with a standard program. Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the "auto-abstract."

Filed under: Indexing & Seaching Information, Libraries , Software | Bookmark or share this entry »

Keyword in Context (KWIC) Indexing November 1958

Computer scientist Hans Peter Luhn of IBM publishes Bibliography and index: Literature on information retrieval and machine translation.  This contained titles indexed by the Key Word-in-Context system, or KWIC.

"The International Conference on Scientific Information (ICSI), Washington, DC, in November 1958, where Luhn introduced his new equipment and illustrated the practical results by producing the KWIC indexes for the conference program. Two new Luhn inventions, the 9900 Index Analyzer and the Universal Card Scanner, and the new Luhn Keyword-in-Context (KWIC) indexing technique were introduced. Following the conference, newspapers all over the country carried stories about the auto-abstracting and auto-indexing." (http://www.ischool.utexas.edu/~ssoy/organizing/l391d2c.htm, accessed 04-26-2009).

Filed under: Data Processing / Computing, Indexing & Seaching Information | Bookmark or share this entry »

The Most Voluminous Printed Catalogue of a Single Library 1959 – 1972

The British Museum (now the British Library) publishes its General Catalogue of Printed Books. Photolithographic Edition to 1955 in 263 folio volumes from 1959 to 1966. These volumes reproduced the catalogue cards of 4,350,000 items.

In 1971 and 1972 the BM issued a Ten-Year Supplement, 1956-1970 in 23 volumes. This set of nearly 300 folio volumes was the "most voluminous" printed catalogue of a single library ever published in print.

Breslauer & Folter, Bibliography: Its History and Development (1984) no. 109.

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

Auto-Encoding of Documents for Information Retrieval 1959

Computer scientist Hans Peter Luhn publishes "Auto-Encoding of Documents for Information Retrieval Systems,  M. Boaz (ed) Modern Trends in Documentation (1959) 45-58.

"Luhn believed that the growing rate of information and document production necessitated the invention of methods allowing data to be retrieved from stores of documents without expensive human intervention. This paper discusses auto-encoding based on statistical procedures performed by a machine on the original text of a document already in machine-readable form. The prevalent machine-readable form of that time was primarily punched cards or paper tape and less frequently magnetic tape. The auto-encoding method used word frequency rates, a special thesaurus, and the development of multi-dimensional patterns based on word proximity. At the time, application of the method was limited to articles of 500 to 5000 words, but Luhn was confident that the logical capabilities of electronic machines, statistical methods, and "further research into the characteristics of human behavior as manifested in writing" would lead to better information dissemination and retrieval. Earlier articles by this author discuss the automatic creation of abstracts and the development of thesauri" (http://www.ischool.utexas.edu/~ssoy/organizing/l391d2b.htm, accessed 04-26-2009).

Filed under: Data Processing / Computing, Indexing & Seaching Information | Bookmark or share this entry »

1960 – 1970

Pioneering Computer-Assisted Legal Research 1960

John Horty at the Health Law Center, University of Pittsburgh, pioneers computer-assisted legal research by having the texts of relevant statutes keyed into punched cards and then transferred to computer tapes where they can be searched and retrieved by “key words in combination” (KWIC).

Filed under: Data Processing / Computing, Indexing & Seaching Information, Law / Copyrights / Patents, Software | Bookmark or share this entry »

One of the First Data Publishing and Retrieval Systems 1962

Inforonics develops and maintains "one of the first data publishing and retrieval systems used by organizations such as the U.S. Library of Congress and the Boston Public Library."

Filed under: Indexing & Seaching Information, Libraries | Bookmark or share this entry »

First Computerized Encyclopedia 1964

Systems Development Corporation develops the first computerized encyclopedia.

Filed under: Indexing & Seaching Information, Organization of Information / Taxonomy, Publishing | Bookmark or share this entry »

Science Citation Index 1964

Eugene Garfield publishes the first Science Citation Index in five printed volumes, indexing 613 journals and 1.4 million citations, using the method of citation analysis.

Two years later Science Citation Index became available on magnetic tape.

Filed under: Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy, Publishing | Bookmark or share this entry »

The First Large Scale Computer-Based Retrospective Search Service Available to the General Public January 1964

Medical Literature Analysis and Retrieval System (MEDLARS) operational at the National Library of Medicine.

It was the first large scale, computer-based, retrospective search service available to the general public.

Filed under: Computing & Medicine / Biology, Indexing & Seaching Information, Libraries , Medicine | Bookmark or share this entry »

"Libraries of the Future" 1965

J.C.R. Licklider publishes Libraries of the Future, a study of what libraries may be at the end of the twentieth century.

Licklider's book reviewed systems for information storage, organization, and retrieval, use of computers in libraries, and library question-answering systems. In his discussion he was probably the first to raise general questions concerning the transition of the book from exclusively printing on paper to electronic form.

Filed under: Book History, Data Storage / Memory, Human-Computer Interaction, Indexing & Seaching Information, Libraries | Bookmark or share this entry »

The MARC Cataloguing Standard 1965 – 1968

Programmer and systems analyst Henriette Avram completes the Library of Congress MARC (Machine Readable Cataloging) Pilot Project, creating the foundation for the national and international data standard for bibliographic and holdings information in libraries. The MARC standards consist of the MARC formats, which are standards for the representation and communication of bibliographic and related information in machine-readable form, and related documentation. . . . Its data elements make up the foundation of most library catalogs.

Filed under: Data Processing / Computing, Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

A Computer-Assisted Full-Text Inventory System 1966

Richard Gering's Data Corporation contracts with the U.S. Air Force to develop a computer-assisted, full-text system to keep track of procurement contracts and equipment inventory.

Filed under: Data Processing / Computing, Indexing & Seaching Information, Software | Bookmark or share this entry »

Lockheed's DIALOG 1966

Roger K. Summit has the DIALOG online information retrieval system operational for Lockheed Aircraft.

Filed under: Indexing & Seaching Information, Software | Bookmark or share this entry »

Full-Text Interactive Search Service 1967

Data Corporation contracts with the Ohio Bar Automated Research Corporation to create a full-text, interactive research service for Ohio statutes.

Filed under: Indexing & Seaching Information, Law / Copyrights / Patents | Bookmark or share this entry »

OCLC is Founded 1967

The colleges and universities in the state of Ohio found the Ohio College Library Center (OCLC) to develop a computerized system in which the libraries of Ohio academic institutions can share resources and reduce costs.

After the bibliographical database expanded far beyond the state of Ohio it was renamed Online Computer Library Center, retaining the same initials.

Filed under: Bibliography, Indexing & Seaching Information, Libraries | Bookmark or share this entry »

The Museum Computer Network 1967

Directors of fifteen New York-area museums form the Museum Computer Network to create a prototype system for a shared museum data-bank.

The project recruited curators and registrars to develop a data dictionary that  accommodated the diverse methods used to describe museum collections. The resulting tagged record format allowed for the description of individual objects with separate records for artist biographical information and reference citations. Jack Heller's GRIPHOS (General Retrieval and Information Processor for Humanities Oriented Studies) system provided the information storage, search, and retrieval infrastructures for the records.

Filed under: Indexing & Seaching Information, Museums, Software | Bookmark or share this entry »

Mead Purchases Data Corporation 1968

Mead Corporation purchases Data Corporation.

Filed under: Indexing & Seaching Information, Law / Copyrights / Patents | Bookmark or share this entry »

Probably the Largest Printed Bibliography, Complete in 754 Folio Volumes 1968 – 1981

Mansell begins publication of The National Union Catalog, Pre-1956 Imprints: a Cumulative Author List Representing Library of Congress Printed Cards and Titles Reported by other American Libraries. One of the largest sets of printed volumes ever published,  and most probably the largest printed bibliography, it was completed in 1981 in 754 folio volumes, containing a total of over 12,000,000 entries on 528,000 pages.

Filed under: Bibliography, Book History, Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy, Publishing | Bookmark or share this entry »

1970 – 1980

The Definitive Model for Relational Database Management Systems June 1970

Edgar F. Codd of IBM publishes "A Relational Model of Data for Large Shared Data Banks" in Communications of the ACM, 13 (6):377–387.

Codd’s model became widely accepted as the definitive model for relational database management systems. Codd postulated that data should be stored independently from hardware and that a programmer should use a nonprocedural language for accessing data. The crux of Codd’s solution was that data, rather than being stored in a hierarchical structure, would be stored in simple tables composed of rows and columns in which columns of like data would relate tables to one another. A database user or application, in Codd’s way of thinking, would not need to know the structure of the data in order to query that data.

Filed under: Indexing & Seaching Information, Software | Bookmark or share this entry »

Medline is Operational October 1971

Medline (Medical Literature Analysis and Retrieval System Online), a literature database of life sciences and biomedical information, is operational at the National Library of Medicine. It was initially a database production of the printed Index Medicus.

By 2008 Medline  ontained "more than 18 million" records from approximately 5,000 selected publications covering biomedicine and health from 1950 to the present.

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Medicine, Science | Bookmark or share this entry »

Lexis 1973

Mead Data Central introduces Lexis and NAARS services.

"LEXIS provides the full text of Ohio and New York codes and cases, the U.S. code, and some federal case law. NAARS is the National Automated Accounting Research Service, a tax database from the American Institute of Certified Public Accountants."

Filed under: Indexing & Seaching Information, Law / Copyrights / Patents | Bookmark or share this entry »

Discovery of Citation Mapping 1973

American information scientist Henry G. Small of the Institute for Scientific Information publishes "Co-Citation in the Scientific Literature; A New Measure of the Relationship between Two Documents," Journal of the American Society for Information Science 24 (1973) 265-9.

Small's paper first described what he called "citation mapping," which enabled the use of citation data to create maps visualizing the structure of scientific activity. Citation mapping was co-discovered by Irina Marshakova in Moscow.

Filed under: Cartography / Geography / Voyages / Travels, Indexing & Seaching Information | Bookmark or share this entry »

SQL 1974

Donald D. Chamberlin and Raymond F. Boyce of IBM develop a Structured English Query Language (“SEQUEL”) to apply Edgar F. Codd’s model of relational databases. SEQUEL later became SQL, presumably because trademark conflicts caused IBM to switch from the original name.

Chamberlin & Boyce's original paper on SEQUEL may be downloaded at http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf, accessed 02-06-2010).

Filed under: Indexing & Seaching Information, Software | Bookmark or share this entry »

"A Sweeping and Controversial Program" 1974

The New York Public Library and Columbia, Harvard, and Yale universities found RLG  (Research Libraries Group). The New York Times calls this "a sweeping and controversial program of combined operations."

Filed under: Indexing & Seaching Information, Libraries | Bookmark or share this entry »

Ellison Founds Software Development Laboratories 1977

Lawrence Ellison founds Software Development Laboratories. Renamed Relational Software in 1979, the company introduced its first Relational Database Management System (RDBMS), Oracle V2. To give the impression of reliability there was no version 1.

Filed under: Indexing & Seaching Information, Software | Bookmark or share this entry »

A Printed Book Entitled Toward Paperless Information Systems 1978

F. W. Lancaster , a professor of information science, publishes a book printed on paper entitled Toward Paperless Information Systems.

Filed under: Book History, Data Storage / Memory, Indexing & Seaching Information | Bookmark or share this entry »

dBase 1978

C. Wayne Ratliff, working as a contractor at the Jet Propulsion Laboratory, writes a database program he calls "Vulcan" (after Mr. Spock of Star Trek) to help him win the office football pool.

Written for his kit-built IMSAI 8080 microcomputer running PTDOS, Ratliff based the program on JPLDIS (Jet Propulsion Laboratory Display Information System), a mainframe (Univac 1108) database product. 

In early 1980, Ratliff and George Tate entered into a marketing agreement.

"Ratliff had given up trying to sell copies of the software for $50 each. Tate thought the product would sell better at $695, so they made a deal and dBASE II was the result. The program was renamed dBASE II because of a belief that a product called "version one" wouldn't sell. The software originally ran on a CP/M computer and then was ported to the IBM PC. In mid-1983 Ashton-Tate purchased the dBASE II technology and copyright from Ratliff, and he joined Ashton-Tate as vice president of new technology."

dBase II became the first best-selling database program for the PC.

Filed under: Data Processing / Computing, Indexing & Seaching Information, Software | Bookmark or share this entry »

1980 – 1990

Nexis 1980

Mead Data Central introduces the NEXIS service, providing online texts of various print publications.

Filed under: Indexing & Seaching Information, Law / Copyrights / Patents, Libraries , Publishing | Bookmark or share this entry »

756 Folio Volumes, Obsolete within 25 Years April 21 – June 6, 1981

The Grolier Club of New York holds an exhibition entitled Bibliography: Its History and Development "to mark the completion of The National Union Catalogue: Pre-1956 Imprints", which began publication in 1968 and is finally complete in 754 folio volumes in 1981.

In 1984 The Grolier Club published an annotated bibliography of the exhibition with the same title by Bernard Breslauer and Roland Folter. In that volume Breslauer and Folter described the NUC, as it came to be known, as "the most extensive general bibliographical compilation of all times" (no. 169, p. 213).  With respect to bibliographical compilations printed on paper the statement remains true, though NUC was superceded around 1995 by bibliographical databases such as OCLC on the Internet.

Filed under: Bibliography, Indexing & Seaching Information, Organization of Information / Taxonomy | Bookmark or share this entry »

IBM DB2 1982

IBM introduces the IBM DB2 relational database management system for mainframe computers.

Filed under: Indexing & Seaching Information, Software | Bookmark or share this entry »

Oracle Corporation 1983

Relational Software renames itself Oracle Corporation to align itself with its flagship relational database management system, Oracle version 3.

Filed under: Computer & Calculator Industry, Indexing & Seaching Information, Software | Bookmark or share this entry »

Keyboarding over 350,000,000 Characters 1983

Work begins on computerizing the text of the Oxford English Dictionary, defining "414,825 words backed by five million quotations, of which some two million were actually printed in the dictionary text." This required retyping the entire text into a database.

Editing an entry of the NOED using LEXX

"And so the New Oxford English Dictionary (NOED) project began. More than 120 keyboarders of International Computaprint Corporation in Tampa, Florida, and Fort Washington, Pennsylvania, USA, started keying in over 350,000,000 characters, their work checked by 55 proof-readers in England. Retyping the text alone was not sufficient; all the information represented by the complex typography of the original dictionary had to be retained, which was done by marking up the content in SGML. A specialized search engine and display software were also needed to access it. Under a 1985 agreement, some of this software work was done at the University of Waterloo, Canada, at the Centre for the New Oxford English Dictionary, led by F.W. Tompa and Gaston Gonnet; this search technology went on to become the basis for the Open Text Corporation. Computer hardware, database and other software, development managers, and programmers for the project were donated by the British subsidiary of IBM; the colour syntax-directed editor for the project, LEXX, was written by Mike Cowlishaw of IBM. The University of Waterloo, in Canada, volunteered to design the database."

The second edition of the OED was published on paper in 1989. 

Filed under: Book History, Indexing & Seaching Information, Organization of Information / Taxonomy, Publishing, Software | Bookmark or share this entry »

The Perseus Digital Library Project 1985

The Perseus Digital Library Project begins at Tufts University. Though the project is ostensibly about Greek and Roman literature and culture, it will evolve into an exploration of the ways that digital collections can enhance scholarship with new research tools that take libraries and scholarship beyond the physical book.

"Since planning began in 1985, the Perseus Digital Library Project has explored what happens when libraries move online. Two decades later, as new forms of publication emerge and millions of books become digital, this question is more pressing than ever. Perseus is a practical experiment in which we explore possibilities and challenges of digital collections in a networked world.

"Our flagship collection, under development since 1987, covers the history, literature and culture of the Greco-Roman world. We are applying what we have learned from Classics to other subjects within the humanities and beyond. We have studied many problems over the past two decades, but our current research centers on personalization: organizing what you see to meet your needs.

"We collect texts, images, datasets and other primary materials. We assemble and carefully structure encyclopedias, maps, grammars, dictionaries and other reference works. At present, 1.1 million manually created and 30 million automatically generated links connect the 100 million words and 75,000 images in the core Perseus collections. 850,000 reference articles provide background on 450,000 people, places, organizations, dictionary definitions, grammatical functions and other topics."

Filed under: Electronic Media, Indexing & Seaching Information, Linguistics / Translation / Speech, Preservation & Conservation of Information | Bookmark or share this entry »

The First Digital Image Database of Cultural Materials 1987

To photograph, store, and organize the art work of the painter, Andrew Wyeth, Fred Mintzer, Henry Gladney and colleagues at IBM develop a high resolution digital camera for photographing art works and a PC-based database system to store and index the images. The system was used by Wyeth's staff to photograph, store, and organize about 10,000 images. "Pictures were scanned at a spatial resolution of 2500 by 3000 pixels and a color depth of 24 bits-per-pixel, and were color calibrated." This was the first digital image database of cultural materials.

Filed under: Art , Art and Science, Medicine, Technology, Data Storage / Memory, Imaging / Photography , Indexing & Seaching Information, Organization of Information / Taxonomy, Preservation & Conservation of Information | Bookmark or share this entry »

OCLC Acquires Publisher of the Dewey Classification System 1988

OCLC acquires Forest Press, publisher of the Dewey Decimal Classification system.

Filed under: Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

International Standard for Computer-to-Computer Information Retrieval 1988

Z39.50 becomes the international standard defining a protocol for computer-to-computer information retrieval.

Z39.50 made it possible for a user to search and retrieve information from other computer systems without knowing the search syntax used by those other systems.

Filed under: Indexing & Seaching Information, Internet & Networking | Bookmark or share this entry »

1990 – 2000

The First "Search Engine" but Not a "Web Search Engine" 1990

Alan Emtage, Bill Heelan, and Peter J. Deutsch, students at McGill University, write ARCHIE, a program designed to index FTP archives.

ARCHIE was the first “search engine,” as distinct from a “web search engine.”

Filed under: Human-Computer Interaction, Indexing & Seaching Information, Internet & Networking | Bookmark or share this entry »

The WAIS System for Searching Text is Introduced 1991

Brewster Kahle of Thinking Machines invents the Wide Area Information Server or WAIS system.  It is client-server text searching system that uses the ANSI Standard Z39.50 Information Retrieval Service Definition and Protocol Specifications for Library Applications" (Z39.50:1988) to search index databases on remote computers.

Filed under: Indexing & Seaching Information, Internet & Networking , Libraries | Bookmark or share this entry »

The Gopher Protocol September 1991

Mark P. McCahill and team at the University of Minnesota develop the Gopher protocol, "a simple way to navigate distributed information resources on the Internet.," but without hyperlinks, a significant disadvantage to the World Wide Web.

They announced the Internet Gopher on USENET. Its central goals were:

"* A file-like hierarchical arrangement that would be familiar to users

"* A simple syntax

"* A system that can be created quickly and inexpensively

"* Extending the file system metaphor to include things like searches

" The source of the name "Gopher" is claimed to be threefold:

"1. Users instruct it to 'go for' information

"2. It does so through a web of menu items analogous to gopher holes

"3. The sports teams of the University of Minnesota are the Golden Gophers (Wikipedia article on Gopher (protocol), accessed 06-04-2009).

Filed under: Computer / Internet Culture, Indexing & Seaching Information, Internet & Networking | Bookmark or share this entry »

The Electronic Dewey 1993

OCLC publishes Electronic Dewey, the first library classification system published in electronic form.

Filed under: Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

First Library of Digital Images on the Internet 1993

Fred Mintzer and colleagues at IBM photograph and develop a database of about 20,000 digital images for the Vatican Library. It is the first library of digital images on the Internet.

Filed under: Imaging / Photography , Indexing & Seaching Information, Internet & Networking , Libraries | Bookmark or share this entry »

The First Successful Online Bookseller Service 1993

Richard Weatherford establishes Interloc, "the first successful online bookseller service."

Arguing that "our mission is to help booksellers find books for their own customers," Weatherford opened the database to booksellers only.

Filed under: Book Trade, eCommerce, Indexing & Seaching Information | Bookmark or share this entry »

Development of Neural Networks 1993

Psychologist, neural scientist and cognitive scientist James A. Anderson publishes "The BSB Model: A simple non-linear autoassociative network," M. Hassoun (Ed), Associative Neural Memories: Theory and Implementation (1993).

Anderson's neural networks have been applied to models of human concept formation, decision making, speech perception, and models of vision.

Anderson, J. A., Spoehr, K. T. and Bennett, D.J.  "A study in numerical perversity: Teaching arithmetic to a neural network,"  D.S. Levine and M. Aparicio (Eds.) Neural Networks for Knowledge Representation and Inference, (1994).

Filed under: Artificial Intelligence, Computers & the Human Brain, eCommerce, Human-Computer Interaction, Indexing & Seaching Information, Linguistics / Translation / Speech | Bookmark or share this entry »

The First Web Search Engine? June 1993

Matthew Gray at MIT develops the web crawler, World Wide Web Wanderer, to measure the size of the web.

Later in the year the World Wide Web Wanderer was used to generate an index called the "Wandex", providing thewhat was probably the first web search engine.

Filed under: Indexing & Seaching Information, Internet & Networking | Bookmark or share this entry »

World Wide Web Worm 1994

An early web search engine, the World Wide Web Worm, has an index of 110,000 pages and web-accessible documents. It receives an average of 1500 queries per day.

Filed under: Indexing & Seaching Information, Internet & Networking | Bookmark or share this entry »

Yahoo! Founded April 1994

Jerry Yang and David Filo, Electrical Engineering graduate students at Stanford,  change the name of "Jerry's Guide to the World Wide Web" to "Yahoo!", for which the official expansion is "Yet Another Hierarchical Officious Oracle". Filo and Yang select the name because they like the word's general definition, which comes from Gulliver's Travels by Jonathan Swift: "rude, unsophisticated, uncouth." Its URL is akebono.stanford.edu/yahoo. They will create the Yahoo! domain on January 18, 1995.

Filed under: eCommerce, Indexing & Seaching Information, Internet & Networking , Social Media / Wikis | Bookmark or share this entry »

The First Full Text Web Search Engine April 20, 1994

The first "full text" crawler-based web search engine, Web Crawler, created by Brian Pinkerton at the University of Washington, becomes operational. "Unlike its predecessors, it let users search for any word in any web page, which became the standard for all major search engines since. It was also the first one to be widely known by the public".

Filed under: Indexing & Seaching Information, Internet & Networking , Software | Bookmark or share this entry »

Altavista December 15, 1995

Web search engine Altavista is launched. It receives 300,000 hits on its first day.

Filed under: Indexing & Seaching Information | Bookmark or share this entry »

The IBM DB2 Universal Database 1996

IBM announces the DB2 Universal Database, the first fully scalable, Web-ready database management system. It is called “universal” because it can sort and query alphanumeric data as well as text documents, images, audio, video and other complex objects. In 1996 IBM databases manage about 70 percent of the world’s business information.

Filed under: Indexing & Seaching Information, Software | Bookmark or share this entry »

Over One Billion Documents 1996

LexisNexis online services exceed one billion documents.

Filed under: Indexing & Seaching Information, Law / Copyrights / Patents, Libraries , Preservation & Conservation of Information | Bookmark or share this entry »

A Search Engine Initially Called "BackRub" January 1996

Larry Page and Sergey Brin, students of computer science at Stanford, begin collaboration at on a search engine called BackRub, named for its unique ability to analyze the "back links" pointing to a given website.

"Larry, who had always enjoyed tinkering with machinery and had gained some notoriety for building a working printer out of Lego™, took on the task of creating a new kind of server environment that used low-end PCs instead of big expensive machines. Afflicted by the perennial shortage of cash common to graduate students everywhere, the pair took to haunting the department's loading docks in hopes of tracking down newly arrived computers that they could borrow for their network."

"Google founders Larry Page and Sergey Brin developed BackRub, the predecessor to the Google search engine, while working on an early library digitization project at Stanford that was funded in part by the National Science Foundation’s Digital Libraries Initiative. And PageRank, Google’s core search algorithm, which orders sites in search results based on the number of other sites that link to them, is simply a computer scientist’s version of citation analysis, long used to rate the influence of articles in scholarly print journals" Roush, "The Infinite Library Does Google's plan to digitize millions of print books spell the death of libraries; or their rebirth?" (Technology Review.com, May 2005, http://www.technologyreview.com/web/14408/, accessed 03-19-2009).

Citation analysis, referenced in this database, was pioneered by Eugene Garfield beginning in 1955.

Filed under: eCommerce, Indexing & Seaching Information, Internet & Networking , Software | Bookmark or share this entry »

Searchenginewatch.com Begins April 1996

Seachenginewatch.com goes online as "A Webmaster's Guide to Search Engines."

Filed under: Indexing & Seaching Information, Internet & Networking | Bookmark or share this entry »

Digital Scriptorium November 1997

Digital Scriptorium, an image database of medieval and renaissance manuscripts that unites scattered resources from many institutions into an international tool for teaching and scholarly research, first appears on the web.

Filed under: Art , Indexing & Seaching Information, Libraries , Manuscripts & Manuscript Copying | Bookmark or share this entry »

Altavista Claims 20,000,000 Queries Per Day November 1997

Web search engine Altavista claims to handle 20,000,000 queries per day.

Filed under: Indexing & Seaching Information, Internet & Networking | Bookmark or share this entry »

W3C Releases XML 1998

W3C releases the eXtensible Markup Language (XML) specification, allowing web pages to be tagged with descriptive labels.

Filed under: Indexing & Seaching Information, Internet & Networking , Software | Bookmark or share this entry »

PageRank is Published on Paper January 29, 1998

Larry Page, Sergey Brin, Rajeev Motwani, and Terry Winograd of the Stanford Database Group publish on paper The PageRank Citation Ranking: Bringing Order to the Web. "The worldwide web creates many new challenges for information retrieval. It is very large and heterogeneous. Current estimates are that there are over 150 million web pages with a doubling life of less than one year."

Filed under: Indexing & Seaching Information, Internet & Networking , Organization of Information / Taxonomy | Bookmark or share this entry »

The Bibliometrics of Science February 14, 1998

According to his paper, Mapping the World of Science, Eugene Garfield's Science Citation Index built on the principles of citation analysis, covered nearly 20,000,000 printed source articles and 300 million cited printed references over a 50-year period.

Filed under: Cartography / Geography / Voyages / Travels, Indexing & Seaching Information, Organization of Information / Taxonomy | Bookmark or share this entry »

MSN Search Circa September – December 1998

Microsoft launches MSN Search, a search engine, index and web crawler.

Filed under: Indexing & Seaching Information | Bookmark or share this entry »

Google is Founded September 7, 1998

Larry Page and Sergey Brin found Google.

They described the technology in a paper entitled  "The Anatomy of a Large-Scale Hypertextual Web Search Engine", Computer Networks and ISDN Systems, 30, 107-117.

The first Google index included 26,000,000 web pages.

Filed under: eCommerce, Indexing & Seaching Information, Organization of Information / Taxonomy | Bookmark or share this entry »

Where's George? December 23, 1998

Database consultant Hank Estrin creates and makes operational Where's George?, a website that tracks the natural geographic circulation of American paper money.

"A hit is when a bill registered with Where's George? is re-entered into the database. Where's George? does not have specific goals other than tracking currency movements, but many users like to collect interesting patterns of hits, called bingos. The most common bingo involves getting at least one hit in all 50 states (called "50 State Bingo"). Another Bingo, FRB Bingo, is when a user gets hits on bills from all 12 Federal Reserve Banks.

"Most bills do not receive any responses, or hits, but many bills receive two or more hits. The average hit rate is slightly over 11.1%. Double- and triple-hitters are common, and bills with 4 or 5 hits are not unheard of. Almost daily a bill receives its 6th hit. The site record is held by a $1 bill with 15 entries.

"To increase the chance of having a bill reported, users (called "Georgers") may write or stamp text on the bills encouraging bill finders to visit www.wheresgeorge.com and track the bill's travels. Bills that are entered into the database, but not marked, are known as stealths" (Wikipedia article on Where's George, accessed 05-04-2009).

Filed under: Economics , Games / Simulations , Indexing & Seaching Information, Social Media / Wikis | Bookmark or share this entry »

Early English Books Online 1999

The Early English Books Online project, a joint effort between the University of Michigan, Oxford University and ProQuest Information and Learning, begins to provide searchable texts of all 125,000 English books printed from 1475 to 1700. This is a development of a project that began in 1938 to microfilm all English books in the timeframe.

Filed under: Indexing & Seaching Information, Libraries , Publishing | Bookmark or share this entry »

2000 – 2005

Predecessor of the Wikipedia March 9, 2000 – September 2003

Using money from the dot.com Bomis, American entrepeneur Jimmy Wales founds the web-encyclopedia, Nupedia, hiring philosopher Larry Sanger as editor-in-chief.

"Unlike Wikipedia, Nupedia was not a wiki; it was instead characterized by an extensive peer-review process, designed to make its articles of a quality comparable to that of professional encyclopedias. Nupedia wanted scholars to volunteer content for free. Before it ceased operating, Nupedia produced 24 articles that completed its review process (three articles also existed in two versions of different lengths), and 74 more articles were in progress.

"In June 2008, CNET hailed Nupedia as one of the greatest defunct websites in history" (Wikipedia article on Nupedia, accessed 05-23-2009).

Filed under: Indexing & Seaching Information, Organization of Information / Taxonomy, Publishing | Bookmark or share this entry »

OED Online March 14, 2000

The Oxford English Dictionary Online (OED Online) becomes available to subscribers.

Filed under: Indexing & Seaching Information, Linguistics / Translation / Speech, Organization of Information / Taxonomy, Publishing | Bookmark or share this entry »

Google Launches AdWords October 23, 2000

"Google Launches Self-Service Advertising Program

"Google's AdWords Program Offers Every Business a Fully Automated, Comprehensive and Quick Way to Start an Online Advertising Campaign /

"MOUNTAIN VIEW, Calif. - October 23, 2000 - Google Inc., developer of the award-winning Google search engine, today announced the immediate availability of AdWords(TM), a new program that enables any advertiser to purchase individualized and affordable keyword advertising that appears instantly on the google.com search results page. The AdWords program is an extension of Google's premium sponsorship program announced in August. The expanded service is available on Google's homepage or at the AdWords link at http://adwords.google.com, where users will find all the necessary design and reporting tools to get an online advertising campaign started" (http://www.google.com/press/pressrel/pressrelease39.html, accessed 06-09-2009).

Filed under: eCommerce, Indexing & Seaching Information | Bookmark or share this entry »

The Wikipedia Begins January 15, 2001

American entrepeneur Jimmy Wales, American philosopher Larry Sanger, and others found Wikipedia, the Free Encyclopedia, as an English language project.

"In its first year, Wikipedia generated 20,000 articles, and had acquired 200 regular volunteers working to add more (this compares with the 55,000 articles in the Columbia [Encyclopedia], all subject to rigorous standards of editing and fact-checking, though this in itself was a small-scale enterprise compared to the behemoths of the industry like the Encyclopaedia Britannica, whose 1989 edition covered 400,000 different topics). By the end of 2002, the number of entries on Wikipedia had more than doubled. But it was only in 2003, once it became apparent that there was nothing to stop it continuing to double in size (which is what it did), that Wikipedia started to attract attention outside the small tech-community that had noticed its launch. In early 2004, there were 188,000 articles; by 2006, 895,000. In 2007 there were signs that the pace of growth might start to level off, and only in 2008 did it begin to look like the numbers might be stabilising. The English-language version of Wikipedia currently has more than 2,870,000 entries, a number that has increased by 500,000 over the last 12 months. However, the English-language version is only one of more than 250 different versions in other languages. German, French, Italian, Polish, Dutch and Japanese Wikipedia all have more than half a million entries each, with plenty of room to add. Xhosa Wikipedia currently has 110. Meanwhile, the Encyclopaedia Britannica had managed to increase the number of its entries from 400,000 in 1989 to 700,000 by 2007" (Runciman, "Like Boiling a Frog," Review of "The Wikipedia Revolution" by Andrew Lih, London Review of Books, 28 May 2009, accessed 05-23-2009).

Filed under: Computers & Society, Indexing & Seaching Information, Organization of Information / Taxonomy, Publishing, Social Media / Wikis | Bookmark or share this entry »

Google Acquires Deja.com February 21, 2001

Google acquires Deja.com's (formerly Deja News Research Service) Usenet archive dating back to 1995, including 500,000,000 messages.

In its press release announcing the acquisition Google states that it is performing 70,000,000 searches per day.

Filed under: Indexing & Seaching Information | Bookmark or share this entry »

OCLC Serves More than 50,000 Libraries, Contains 56 Million Records 2004

At this time OCLC (Online Computer Library Center) serves more than 50,540 libraries of all types in the U.S. and 84 countries and territories around the world. OCLC WorldCat contains 56 million catalogue records, representing 894 million holdings.

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy, Preservation & Conservation of Information | Bookmark or share this entry »

18th Century Collections Online 2004

Thomson-Gales announces Eighteenth Century Collections Online.  Providing fully searchable digital texts for the 150,000 titles published in England during the 18th century, the publishers perhaps over-enthusiastically characterize the project as

"the most ambitious single digitization project ever undertaken. It delivers every significant English-language and foreign-language title printed in Great Britain during the eighteenth century, along with thousands of important works from the Americas." The project is expected to include the searchable texts of 26,000,000 pages.

Filed under: Book History, Indexing & Seaching Information, Publishing | Bookmark or share this entry »

The National Digital Newspaper Program March 2004

The National Endowment for the Humanities and the Library of Congress found the National Digital Newspaper Program (NDNP). " Ultimately over a period of approximately 20 years, NDNP will create a national, digital resource of historically significant newspapers from all the states and U.S. territories published between 1836 and 1922. This searchable database will be permanently maintained at the Library of Congress (LC) and be freely accessible via the Internet. An accompanying national newspaper directory of bibliographic and holdings information on the website will direct users to newspaper titles available in all types of formats."

Filed under: Indexing & Seaching Information, Libraries , News Media / Journalism, Preservation & Conservation of Information | Bookmark or share this entry »

The Index-Catalogue Goes Online May 1, 2004

The Index-Catalogue of the Surgeon-General's Office, a 61 volume bibliographical resource for the history of medicine and science, which began publication in 1870 under the direction of John Shaw Billings, is made available online by the National Library of Medicine.

This was the culmination of a data conversion project which began in 1996.

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Medicine, Organization of Information / Taxonomy | Bookmark or share this entry »

The Google Print Project October 2004

At the Frankfurt Book Fair Google announces the Google Print project to scan and make searchable on the Internet the texts of more than ten million books from the collections of the New York Public Library, and the libraries of Michigan, Stanford, Harvard and Oxford Universities.

The project was renamed Google Books in December 2005.

Filed under: Education / Reading / Literacy, Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy, Preservation & Conservation of Information | Bookmark or share this entry »

2005 – 2010

Kosmix.com 2005

"With the vision of connecting people to information that makes a difference in their lives," Venky Harinarayan and Anand Rajaraman found Kosmix.com.

Filed under: Human-Computer Interaction, Indexing & Seaching Information, Organization of Information / Taxonomy, Social Media / Wikis | Bookmark or share this entry »

Moratorium on Scanning Books August 11, 2005

In response to copyright problems Google announces a moratorium on the scanning of copyrighted books for its Google Print Library Project.

Filed under: Indexing & Seaching Information, Law / Copyrights / Patents, Preservation & Conservation of Information | Bookmark or share this entry »

Universally Accessible Digital Archive October 3, 2005

The Open Content Alliance in association with Yahoo and the Internet Archive announce plans to build a universally accessible digital archive of published information.

Filed under: Archives, Indexing & Seaching Information, Preservation & Conservation of Information | Bookmark or share this entry »

300 Years to Index All the World's Information October 8, 2005

Google CEO Eric Schmidt speculates that it may take three hundred years to index all the world's information and make it searchable.

" 'We did a math exercise and the answer was 300 years,' Schmidt said in response to an audience question asking for a projection of how long the company's mission will take. 'The answer is it's going to be a very long time.'

"Of the approximately 5 million terabytes of information out in the world, only about 170 terabytes have been indexed, he said earlier during his speech."

Filed under: Indexing & Seaching Information, Internet & Networking , Libraries | Bookmark or share this entry »

Google Books December 2005

The Google Print project morphs into Google Books.

Filed under: Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy, Preservation & Conservation of Information, Publishing | Bookmark or share this entry »

The Google Librarian Newsletter December 19, 2005

Google issues their first monthly newsletter for librarians, the Google Librarian Newsletter.

"Librarians and Google share the same mission: to organize the world's information and make it universally accessible and useful. The goal of this newsletter is to highlight ways that we can work together to fulfill that mission, for patrons, students, and users."

Filed under: Indexing & Seaching Information, Libraries | Bookmark or share this entry »

Zillow.com February 8, 2006

Rich Barton and Lloyd Frink, former Microsoft executives and founders of Expedia launch the online real estate service company, Zillow.com.

"Zillow allows users to see the value of millions of homes across the United States, not just those up for sale. In addition to giving value estimates of homes, it offers several unique features including value changes of each home in a given time frame (such as 1, 5, or 10 years), aerial views of homes, and prices of homes in the area. Where it can access appropriate data, it also provides basic information on a given home, such as square footage and the number of bedrooms and bathrooms. Users can also get current estimates of homes if there was a significant change made, such as a recently remodeled kitchen. Zillow provides an application programming interface (API) and developer support network.

"As a part of its API, Zillow assigns a numerical integer to each of the 70 million homes in its database, which is plainly visible as CGI parameters to the URLs to individual entries on its website. The identifier is not obfuscated and is assigned in sequence for each house or condo on the side of a street. Zillow reports on individual units, such as providing street address, latitude and longitude. When integrated with the features of a typical online reverse telephone directory and wiki-mapping services such as WikiMapia, it allows for nationwide "seating assignments" of U.S. neighborhoods for each house that has a listed phone number with a real human name" (Wikipedia article on Zillow.com.)

Filed under: eCommerce, Indexing & Seaching Information | Bookmark or share this entry »

Making Handwritten Manuscripts Searchable February 9, 2006

Using object detection technology, researchers at the University of Buffalo, the University of Massachusetts at Amherst, and the Adaptive Information Cluster at Dublin City University, in association with Google, develop software for scanning historical manuscripts in a way that recognizes handwriting to make electronic texts of these manuscripts searchable.

Filed under: Indexing & Seaching Information, Manuscripts & Manuscript Copying, Software , Writing / Palaeography / Calligraphy | Bookmark or share this entry »

Access to Nearly One Million Archive Collection Descriptions March 2006

RLG opens ArchiveGrid, a new search engine providing access to nearly a million archive collection descriptions in thousands of libraries, museums, and archives.

Filed under: Archives, Indexing & Seaching Information, Libraries , Manuscripts & Manuscript Copying, Museums | Bookmark or share this entry »

The Changing Nature of the Catalogue. . . . March 17, 2006

Reflecting the influence of the Internet on physical library access and usage, the Library of Congress publishes The Changing Nature of the Catalogue and its Integration with Other Discovery Tools by Karen Calhoun.

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Organization of Information / Taxonomy | Bookmark or share this entry »

A Critical Review at the Library of Congress April 3, 2006

Representing the Library of Congress Professional Guild, Thomas Mann publishes A Critical Review of Karen Calhoun's paper published on March 17. This review rebuts various assertions in the Calhoun report.

Filed under: Bibliography, Indexing & Seaching Information, Libraries , Preservation & Conservation of Information | Bookmark or share this entry »

Google's AdWords to Place Ads in Print Newspapers November 6, 2006

Google and various print newspapers, including The New York Times, announce that they will test a modified version of Google's AdWords program to place advertisements in print newspapers.

Filed under: eCommerce, Indexing & Seaching Information, News Media / Journalism, Printing / Typography | Bookmark or share this entry »

DROID September 27, 2007

"An innovative tool to analyse and identify computer file formats has won the 2007 Digital Preservation Award. DROID, developed by The National Archives in London, can examine any mystery file and identify its format. The tool works by gathering clues from the internal 'signatures' hidden inside every computer file, as well as more familiar elements such as the filename extension (.jpg, for example), to generate a highly accurate 'guess' about the software that will be needed to read the file. . . .

"Now, by using DROID and its big brother, the unique file format database known as PRONOM, experts at the National Archives are well on their way to cracking the problem. Once DROID has labelled a mystery file, PRONOM's extensive catalogue of software tools can advise curators on how best to preserve the file in a readable format. The database includes crucial information on software and hardware lifecycles, helping to avoid the obsolescence problem. And it will alert users if the program needed to read a file is no longer supported by manufacturers.

"PRONOM's system of identifiers has been adopted by the UK government and is the only nationally-recognised standard in its field."

Filed under: Indexing & Seaching Information, Libraries , Preservation & Conservation of Information, Software | Bookmark or share this entry »

21 Billion in Revenue from Google AdWords 2008

The revenue from AdWords, Google's flagship advertising product, is  $21,795,550,000 in 2008.

"AdWords offers pay-per-click (PPC) advertising, and site-targeted advertising for both text and banner ads. The AdWords program includes local, national, and international distribution. Google's text advertisements are short, consisting of one title line and two content text lines. Image ads can be one of several different Interactive Advertising Bureau (IAB) standard sizes" (Wikipedia article on AdWords, accessed 06-09-2009).

Filed under: eCommerce, Indexing & Seaching Information | Bookmark or share this entry »

Over One Trillion Unique URLs July 2008

Google announces in its blog that it is indexing over one trillion (1,000,000,000,000) unique URLs.

Filed under: Indexing & Seaching Information, Internet & Networking | Bookmark or share this entry »

Old Wine in New Bottles? October 24, 2008

The conversion of the old format of  From Gutenberg to the Internet Timeline, begun in 2005, to this new interactive database format is complete. Reflecting its coverage of the history of information since the beginning of records, I have renamed the timeline From Cave-Paintings to the Internet.

By the end of the conversion there were 1535 timeline entries, nearly all of which had one or more hyperlinks to reference sources. There were also more than sixty themes by which the timeline could be searched. Timeline items were indexed by up to six themes.

In the process of converting from the old list format to the new interactive database I checked all hyperlinks, corrected mistakes, added new hyperlinks, and added a numerous new entries.

The timeline remains a work in progress.

JMN

Filed under: Indexing & Seaching Information, Organization of Information / Taxonomy | Bookmark or share this entry »

Analysis of Web Search Queries Track the Spread of Flu Faster than Traditional Surveillance Methods November 11, 2008

Google.org unveils Google Flu Trends, using aggregated Google search data to estimate flu activity up to two weeks faster than traditional flu surveillance systems.

"Each week, millions of users around the world search for online health information. As you might expect, there are more flu-related searches during flu season, more allergy-related searches during allergy season, and more sunburn-related searches during the summer. You can explore all of these phenomena using Google Trends. But can search query trends provide an accurate, reliable model of real-world phenomena?

"We have found a close relationship between how many people search for flu-related topics and how many people actually have flu symptoms. Of course, not every person who searches for "flu" is actually sick, but a pattern emerges when all the flu-related search queries from each state and region are added together. We compared our query counts with data from a surveillance system managed by the U.S. Centers for Disease Control and Prevention (CDC) and discovered that some search queries tend to be popular exactly when flu season is happening. By counting how often we see these search queries, we can estimate how much flu is circulating in various regions of the United States.

"During the 2007-2008 flu season, an early version of Google Flu Trends was used to share results each week with the Epidemiology and Prevention Branch of the Influenza Division at CDC. Across each of the nine surveillance regions of the United States, we were able to accurately estimate current flu levels one to two weeks faster than published CDC reports" (Google Flu Trends website).

Filed under: Computing & Medicine / Biology, Indexing & Seaching Information, Medicine | Bookmark or share this entry »

Higher Resolution Map of Knowledge Than Can be Produced from Citation Analysis March 11, 2009

Johan Bollen of Los Alamos National Laboratory and six co-authors publish "Clickstream Data Yields High Resolution Maps of Science" in the open access online journal Plos ONE.  The map was based on clickstream data collected when online readers switched from one journal to another, allowing the collection of about one billion data points -- a far greater number and presumably more reflective of actual reading patterns than the prior method of citation analysis developed by the Institute for Scientific Information (Now Thomson Scientific's Web of Science) which traces the relationship of footnotes in scholarly journals.

"Maps of science derived from citation data visualize the relationships among scholarly publications or disciplines. They are valuable instruments for exploring the structure and evolution of scholarly activity. Much like early world charts, these maps of science provide an overall visual perspective of science as well as a reference system that stimulates further exploration. However, these maps are also significantly biased due to the nature of the citation data from which they are derived: existing citation databases overrepresent the natural sciences; substantial delays typical of journal publication yield insights in science past, not present; and connections between scientific disciplines are tracked in a manner that ignores informal cross-fertilization.

"Scientific publications are now predominantly accessed online. Scholarly web portals provide access to publications in the natural sciences, social sciences and humanities. They routinely log the interactions of users with their collections. The resulting log datasets have a set of attractive characteristics when compared to citation datasets. First, the number of logged interactions now greatly surpasses the volume of all existing citations. This is illustrated by Elsevier's announcement, in 2006, of 1 billion (1×109) article downloads since the launch of its Science Direct portal in April 1999. In contrast, around the time of Elsevier's announcement, the total number of citations in Thomson Scientific's Web of Science from the year 1900 to the present does not surpass 600 million (6×108). Second, log datasets reflect the activities of a larger community as they record the interactions of all users of scholarly portals, including scientific authors, practitioners of science, and the informed public. In contrast, citation datasets only reflect the activities of scholarly authors. Third, log datasets reflect scholarly dynamics in real-time because web portals record user interactions as soon as an article becomes available at the time of its online publication. In contrast, a published article faces significant delays before it eventually appears in citation datasets: it first needs to be cited in a new article that itself faces publication delays, and subsequently those citations need to be picked up by citation databases.

"Given the aforementioned characteristics of scholarly log data, we investigated a methodological issue: can valid, high resolution maps of science be derived from clickstream data and can clickstream data be leveraged to yield meaningful insights in the structure and dynamics of scholarly behavior? To do this we first aggregated log datasets from a variety of scholarly web portals, created and analyzed a clickstream model of journal relationships from the aggregate log dataset, and finally visualized these journal relationships in a first-ever map of science derived from scholarly log data" (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0004803#pone.0004803-Brody1, accessed 03-19-2009).

Filed under: Cartography / Geography / Voyages / Travels, Graphics / Visualization / Animation, Indexing & Seaching Information, Organization of Information / Taxonomy, Science | Bookmark or share this entry »

Wolfram/Alpha May 16, 2009

Stephen Wolfram and Wolfram Research launch Wolfram|Alpha, a computational data engine with a new approach to knowledge extraction, based on natural language processing, a large library of algorithms and an NKS (New Kind of Science) approach to answering queries.

The Wolfram|Alpha engine differs from traditional search engines in that it does not simply return a list of results based on a query, but instead computes an answer.

Filed under: Artificial Intelligence, Data Processing / Computing, Indexing & Seaching Information, Linguistics / Translation / Speech, Organization of Information / Taxonomy | Bookmark or share this entry »

Microsoft Launches Bing June 1, 2009

Microsoft launches the Bing search engine.

"By August 2009, Bing had gained 9.3 percent of the United States Internet search market. However, by September, StatCounter stated that Bing's share of the US search market in September had fallen by over one percentage point to 8.51%. Comscore claimed otherwise, stating that Bing's growth had held steady in September 2009, gaining 0.1 percent of the total United States Internet Search Market representing a market share of 9.4 percent" (Wikipedia article on Bing, accessed 11-12-2009).

Filed under: Indexing & Seaching Information | Bookmark or share this entry »

Bing Will Power Yahoo! Search July 29, 2009

Microsoft and Yahoo! announce  10-year deal in which the Yahoo! search engine, currently second-largest in terms of query volume, will be replaced by Bing. Yahoo! will get to keep 88% of the revenue from all search ad sales on its site for the first five years of the deal, and have the right to sell advertisements on some Microsoft sites. Yahoo! Search will still maintain its own user interface, but will eventually feature "Powered by Bing" branding.

Filed under: eCommerce, Indexing & Seaching Information | Bookmark or share this entry »

Algorithm to Decipher Ancient Texts September 2, 2009

"Researchers in Israel say they have developed a computer program that can decipher previously unreadable ancient texts and possibly lead the way to a Google-like search engine for historical documents.

"The program uses a pattern recognition algorithm similar to those law enforcement agencies have adopted to identify and compare fingerprints.

"But in this case, the program identifies letters, words and even handwriting styles, saving historians and liturgists hours of sitting and studying each manuscript.

"By recognizing such patterns, the computer can recreate with high accuracy portions of texts that faded over time or even those written over by later scribes, said Itay Bar-Yosef, one of the researchers from Ben-Gurion University of the Negev.

" 'The more texts the program analyses, the smarter and more accurate it gets,' Bar-Yosef said.

"The computer works with digital copies of the texts, assigning number values to each pixel of writing depending on how dark it is. It separates the writing from the background and then identifies individual lines, letters and words.

"It also analyses the handwriting and writing style, so it can 'fill in the blanks' of smeared or faded characters that are otherwise indiscernible, Bar-Yosef said.

"The team has focused their work on ancient Hebrew texts, but they say it can be used with other languages, as well. The team published its work, which is being further developed, most recently in the academic journal Pattern Recognition due out in December but already available online. A program for all academics could be ready in two years, Bar-Yosef said. And as libraries across the world move to digitize their collections, they say the program can drive an engine to search instantaneously any digital database of handwritten documents. Uri Ehrlich, an expert in ancient prayer texts who works with Bar-Yosef's team of computer scientists, said that with the help of the program, years of research could be done within a matter of minutes. 'When enough texts have been digitized, it will manage to combine fragments of books that have been scattered all over the world,' Ehrlich said" (http://www.reuters.com/article/newsOne/idUSTRE58141O20090902, accessed 09-02-2009).

Filed under: Artificial Intelligence, Graphics / Visualization / Animation, Indexing & Seaching Information, Linguistics / Translation / Speech, Manuscripts & Manuscript Copying, Writing / Palaeography / Calligraphy | Bookmark or share this entry »

The First Historical Thesaurus October 2009

Oxford University Press publishes the Historical Thesaurus of the Oxford English Dictionary with Additional Material from A Thesaurus of Old English,edited by Christian Kay, Jane Roberts, Michael Samuels, and Irene Wotherspoon.

Forty years in the making, this 4448-page work is the first historical thesaurus to be compiled for any language, and the first to include almost the entire vocabulary of English, from Old English to the present. It is also the largest thesaurus resource in the world, covering more than 920,000 words and meanings, based on the Oxford English Dictionary.

The Historical Thesaurus lists synonyms listed with dates of first recorded use in English, in chronological order, with earliest synonyms first. For obsolete words, the Thesaurus also includes last recorded use of word.

The work uses a specially devised thematic system of classification. Its comprehensive index enables complete cross-referencing of nearly one million words and meanings. It contains a comprehensive sense inventory of Old English and a fold-out color chart which shows the top levels of the classification structure. 

Filed under: Indexing & Seaching Information, Organization of Information / Taxonomy, Publishing | Bookmark or share this entry »

Google Represents 6% of All Internet Traffic October 19, 2009

According to the report by Arbor Networks' 2009 Atlas Observatory Report Google accounts for 6 percent of all Internet traffic of every type. 

"And how many would have heard of a company called Carpathia Hosting? Its MegaUpload, MeaErotik, MegaClick and MegaVideo services have turned it into a company that now accounts for 1 percent of all Internet traffic, says Arbor, and this will doubtless grow. The important takeaway is that few of these companies had even been heard of two years ago, and very few of them are big telcos. To put all this into perspective, in 2007 Arbor found that the overwhelming majority of Internet traffic was accounted for by 30,000 entities, with fifty percent of traffic accounted for by around 10,000 companies.

"Only two years later that same fifty percent now runs through only 150 top 'content delivery networks' (CDNs), an astonishing consolidation made more remarkable by the fact that Internet traffic has grown significantly during that time.

" 'Up to 2007, The Internet meant connecting to lots of servers and data centres around the world,' notes Arbor's chief scientist, Craig Labovitz. Now there are barely 100 companies that matter. Traffic patterns tend to be hidden, mainly because the companies losing out - the traditional telcos and ISPs - don't exactly have an interest in advertising their waning status. The reason for their decline in importance is that Internet traffic is being driven by huge providers with access to content such as video.

" 'For 150 years, they [BT and other telcos] have had the same business model. Now everyone is trying to get away from being a dumb pipe.' Arbor's Atlas Internet Observatory report crunched traffic from 100 of the Internet's largest entities, accounting for 12 Terabytes of peak throughput, equivalent to about a quarter of the Internet's total at any one moment, said Labovitz.The importance of this is not simply that a small number of companies will account for a lot of traffic, but that these companies are increasingly what the Internet actually is. The Internet up to around 2007 was dominated by a hierarchy of companies, co-operating with one another to allow traffic to be passed from one to the other, regardless of size. The new Internet superpowers, in stark contrast, bypass a lot of this and use direct connections from one to the other. If a company is not part of this new core, it could find itself increasingly passed to the 'long tail', a polite way of saying they will be shoved to the fringe.  

Video, including video that runs over web/http, now accounts for an estimated 10 percent of all Internet traffic, and is one reason all these direct connections between large data centres are now necessary. IPv6 traffic remains tiny at only 0.03 percent of traffic, but is showing sudden and possibly rapid growth in recent months thanks to deployments by named hosters.  

"Interestingly, P2P is in rapid decline, falling from around 3 percent of all traffic in 2007 to only half a percent now. Again, downloaders appear to prefer direct connectivity for downloads, mostly through port 80 and the web" (http://www.thestandard.com/news/2009/10/14/internet-now-dominated-traffic-superpowers)

Filed under: eCommerce, Indexing & Seaching Information, Internet & Networking | Bookmark or share this entry »

Bing Will Encorporate Wolfram Alpha Search Information November 12, 2009

Microsoft announces a deal that will bring the Wolfram Alpha search tool to its Bing search engine.

"The company said that the deal will allow users to take advantage of the Wolfram Alpha algorithms and search tools within Bing queries.

"The initial partnership, which is expected to bear fruit within a few days, will focus on providing nutritional information to users as well as certain mathematical tools. When users search for foods or recipes, the engine will display a small tab containing nutritional information.  

"Along with increasing traffic to the Bing service, Microsoft hopes that the features will allow users to better monitor their diet and exercise plans.

" 'This notion of creating and presenting computational knowledge in search results is one of the more exciting things going on in search (and beyond) today, and the team at Bing is incredibly fired up to bring some of this amazing work to our customers,' " programme managers Tracey Yao and Pedro Silva said in a blog posting.  

"The Wolfram Alpha partnership is one of several campaigns Microsoft has embarked on to drum up traffic for Bing. Other recent additions include visual search results and the ability to search within a user's Hotmail archives" (http://www.v3.co.uk/v3/news/2253013/microsoft-gives-further-updates)

Filed under: Artificial Intelligence, Indexing & Seaching Information, Organization of Information / Taxonomy | Bookmark or share this entry »

Google Announces Real-Time Search December 2009

"First, we're introducing new features that bring your search results to life with a dynamic stream of real-time content from across the web. Now, immediately after conducting a search, you can see live updates from people on popular sites like Twitter and FriendFeed, as well as headlines from news and blog posts published just seconds before. When they are relevant, we'll rank these latest results to show the freshest information right on the search results page.  

Try searching for your favorite TV show, sporting event or the latest development on a recent government bill. Whether it's an eyewitness tweet, a breaking news story or a fresh blog post, you can find it on Google right after it's published on the web. . .

Our real-time search enables you to discover breaking news the moment it's happening, even if it's not the popular news of the day, and even if you didn't know about it beforehand. For example, in the screen shot, the big story was about GM's stabilizing car sales, which shows under "News results." Nonetheless, thanks to our powerful real-time algorithms, the 'Latest results' feature surfaces another important story breaking just seconds before: GM's CEO stepped down.

Click on 'Latest results' or select 'Latest' from the search options menu to view a full page of live tweets, blogs, news and other web content scrolling right on Google. You can also filter your results to see only 'Updates' from micro-blogs like Twitter, FriendFeed, Jaiku and others. Latest results and the new search options are also designed for iPhone and Android devices when you need them on the go, be it a quick glance at changing information like ski conditions or opening night chatter about a new movie — right when you're in line to buy tickets.  

And, as part of our launch of real-time on Google search, we've added 'hot topics' to Google Trends to show the most common topics people are publishing to the web in real-time. With this improvement and a series of other interface enhancements, Google Trends is graduating from Labs.  

"Our real-time search features are based on more than a dozen new search technologies that enable us to monitor more than a billion documents and process hundreds of millions of real-time changes each day. Of course, none of this would be possible without the support of our new partners that we're announcing today: Facebook, MySpace, FriendFeed, Jaiku and Identi.ca — along with Twitter, which we announced a few weeks ago" (http://googleblog.blogspot.com/2009/12/relevance-meets-real-time-web.html, accessed 05-06-2010).

Filed under: Indexing & Seaching Information, Internet & Networking , News Media / Journalism, Social Media / Wikis | Bookmark or share this entry »

Google Living Stories December 8, 2009

Google announces the Living Stories project, which provides a new, experimental way to consume news, developed by a partnership between Google, the New York Times, and the Washington Post

"The announcement of the 'living stories' project shows Google collaborating with newspapers at a time when some major publishers have characterized the company as a threat. Google has also taken steps recently to project an image of itself as a friend to the industry. 

"Living stories is a much-enhanced version of what some newspaper Web sites already do by grouping material by subject matter. In the case of The Times, the paper’s Web site has thousands of “topic pages.” But those efforts have not yielded heavy reader traffic or much advertising.  

"The Google project, presented without ads, is now at livingstories.googlelabs.com, part of Google Labs, where the company tries out experimental products. If it is judged a success, it would eventually reside on the site of any publisher that wanted to use it. Those publishers could also sell ads on those pages.  

"Google’s dominant search engine sells ads alongside search results that often include news articles, leading some newspaper industry leaders — particularly executives of the News Corporation, led by Rupert Murdoch — to cry foul. Other publishers say that, on the contrary, they owe much of their Internet traffic and revenue to search engines.  

"Google executives argue that the tools their company has developed, including search, make them the papers’ ally, a case made by Eric E. Schmidt, Google’s chairman and chief executive, in an opinion piece published last week in The Wall Street Journal. Also last week, Google announced changes in the way its search function interacts with news sites, giving publishers more flexibility in limiting the material readers can see before encountering demands for payment or registration. The changes were relatively minor, but reinforced the message that the company wanted to help news sites.  

" 'There’s been a series of steps to work with and mollify news publishers, to improve the P.R., and you can see the living page in that same vein,' said Ken Doctor, a media analyst with the analysis firm Outsell. The project is a genuine step forward, he said, because 'on most news sites, site search, looking for a lot on one subject, is awful.'

"Google worked for months on the project with journalists and Web staffs at The Times and The Post. For now, it covers just eight broad topics, like health care reform and the Washington Redskins. At the top of each subject page is a summary, a timeline of major events and pictures, followed by the opening sections of a series of articles, in reverse chronological order. A set of buttons allows the reader to narrow the topic.  'It’s an experiment with a different way of telling stories,' said Martin A. Nisenholtz, senior vice president for digital operations of The New York Times Company. 'I think in it, you can see the germ of something quite interesting.'

"A reader can call up an entire article without navigating away from the subject page, reading one piece after another without using the 'forward' and 'back' buttons. Josh Cohen, business product manager for Google News, said that having all the material appear on a single page would help the page rank higher in Internet searches than newspapers’ subject pages do now.  

"In various ways, the experiment duplicates or improves on what can now be done on publishers’ own sites, through a search engine’s news function or even on Wikipedia. Mr. Cohen said that if it worked well, Google would make the software available free to publishers, much as those publishers now use Google Maps and YouTube functions on their sites" (http://www.nytimes.com/2009/12/09/technology/companies/09google.html?hpw).

Filed under: Indexing & Seaching Information, News Media / Journalism, Publishing | Bookmark or share this entry »

Introduction of Google Goggles December 8, 2009

Google introduces Google Goggles image recognition and search technology for the Android mobile device operating system.  

If you photograph certain types of individual objects the program will recognize them and automatically displace links to relevant information on the Internet. If you point your phone at a building the program will identify it by GPS and identify it. Then if you click on the name of the building it will bring up relevant Internet links.

♦ On May 7, 2010 you could watch a video describing the features of Google Goggles at this link:

http://www.google.com/mobile/goggles/#text

Filed under: Graphics / Visualization / Animation, Imaging / Photography , Indexing & Seaching Information | Bookmark or share this entry »

French Alternative to Google Books Formed December 17, 2009

Jean-Pierre Gérault, president of i2S, announces the formation of a French consortium to scan the contents of French libraries. The project is called "Polinum," a French acronym that stands for "Operating Platform for Digital Books."

"French President Nicolas Sarkozy has made catching up on France's digital delay one of the national priorities by earmarking euro750 million of a euro35 billion ($51 billion) spending plan announced earlier this week for digitizing France's libraries, film and music archives and other repositories of the nation's recorded heritage. These funds will mainly go to French libraries, universities and museums, who will use them to develop their own plans for digitizing their holdings.  

"The consortium, meanwhile, intends to be the technological choice for those institutions, Gerault said. He declined to estimate what part of the euro750 million the consortium thinks it can capture. 

"France's culture ministry has been in difficult negotiations with Google, which would like to help digitize France's archives but has met resistance in France over fears of giving the internet search giant too much control over the nation's cultural heritage, as well as over how it would protect the interests of authors and other copyright holders" (http://www.businessweek.com/ap/financialnews/D9CL4M480.htm, accessed 12-17-2009).

Filed under: Education / Reading / Literacy, Indexing & Seaching Information, Libraries | Bookmark or share this entry »

2010 – Present

Google's Computers in China Come Under Attack, Initiating a Review of the Company's Operations in China January 12, 2010

"Like many other well-known organizations, we face cyber attacks of varying degrees on a regular basis. In mid-December, we detected a highly sophisticated and targeted attack on our corporate infrastructure originating from China that resulted in the theft of intellectual property from Google. However, it soon became clear that what at first appeared to be solely a security incident--albeit a significant one--was something quite different.

"First, this attack was not just on Google. As part of our investigation we have discovered that at least twenty other large companies from a wide range of businesses--including the Internet, finance, technology, media and chemical sectors--have been similarly targeted. We are currently in the process of notifying those companies, and we are also working with the relevant U.S. authorities.  

"Second, we have evidence to suggest that a primary goal of the attackers was accessing the Gmail accounts of Chinese human rights activists. Based on our investigation to date we believe their attack did not achieve that objective. Only two Gmail accounts appear to have been accessed, and that activity was limited to account information (such as the date the account was created) and subject line, rather than the content of emails themselves.

"Third, as part of this investigation but independent of the attack on Google, we have discovered that the accounts of dozens of U.S.-, China- and Europe-based Gmail users who are advocates of human rights in China appear to have been routinely accessed by third parties. These accounts have not been accessed through any security breach at Google, but most likely via phishing scams or malware placed on the users' computers. //We have already used information gained from this attack to make infrastructure and architectural improvements that enhance security for Google and for our users. In terms of individual users, we would advise people to deploy reputable anti-virus and anti-spyware programs on their computers, to install patches for their operating systems and to update their web browsers. Always be cautious when clicking on links appearing in instant messages and emails, or when asked to share personal information like passwords online. You can read more here about our cyber-security recommendations. People wanting to learn more about these kinds of attacks can read this Report to Congress (PDF) by the U.S.-China Economic and Security Review Commission (see p. 163-), as well as a related analysis (PDF) prepared for the Commission, Nart Villeneuve's blog and this presentation on the GhostNet spying incident.  "We have taken the unusual step of sharing information about these attacks with a broad audience not just because of the security and human rights implications of what we have unearthed, but also because this information goes to the heart of a much bigger global debate about freedom of speech. In the last two decades, China's economic reform programs and its citizens' entrepreneurial flair have lifted hundreds of millions of Chinese people out of poverty. Indeed, this great nation is at the heart of much economic progress and development in the world today.  

"We launched Google.cn in January 2006 in the belief that the benefits of increased access to information for people in China and a more open Internet outweighed our discomfort in agreeing to censor some results. At the time we made clear that 'we will carefully monitor conditions in China, including new laws and other restrictions on our services. If we determine that we are unable to achieve the objectives outlined we will not hesitate to reconsider our approach to China.'

"These attacks and the surveillance they have uncovered--combined with the attempts over the past year to further limit free speech on the web--have led us to conclude that we should review the feasibility of our business operations in China. We have decided we are no longer willing to continue censoring our results on Google.cn, and so over the next few weeks we will be discussing with the Chinese government the basis on which we could operate an unfiltered search engine within the law, if at all. We recognize that this may well mean having to shut down Google.cn, and potentially our offices in China" (http://googleblog.blogspot.com/2010/01/new-approach-to-china.html, accessed 01-16-2010).

Filed under: Censorship , Freedom / Privacy / Security , Indexing & Seaching Information, Internet & Networking , Military / Warfare / Cyberwarfare | Bookmark or share this entry »

Google Pulls its Search Engine Out of Mainland China March 22, 2010

Google announced in its blog that it stopped censoring search services on Google.cn, and moved its Chinese search business from Google.cn to Google.com.hk.

"Users visiting Google.cn are now being redirected to Google.com.hk, where we are offering uncensored search in simplified Chinese, specifically designed for users in mainland China and delivered via our servers in Hong Kong. Users in Hong Kong will continue to receive their existing uncensored, traditional Chinese service, also from Google.com.hk. Due to the increased load on our Hong Kong servers and the complicated nature of these changes, users may see some slowdown in service or find some products temporarily inaccessible as we switch everything over" (http://googleblog.blogspot.com/2010/03/new-approach-to-china-update.html, accessed 03-22-2010)

Filed under: Censorship , Indexing & Seaching Information, Internet & Networking | Bookmark or share this entry »

Google Announces "Replay" for Twitter April 14, 2010

"Since we first introduced real-time search last December, we’ve added content from MySpace, Facebook and Buzz, expanded to 40 languages and added a top links feature to help you find the most relevant content shared on updates services like Twitter. Today, we’re introducing a new feature to help you search and explore the public archive of tweets.  

"With the advent of blogs and micro-blogs, there’s a constant onlineconversation about breaking news, people and places — some famous and some local. Tweets and other short-form updates create a history of commentary that can provide valuable insights into what’s happened and how people have reacted. We want to give you a way to search across this information and make it useful.  

"Starting today, you can zoom to any point in time and 'replay' what people were saying publicly about a topic on Twitter. To try it out, click 'Show options' on the search results page, then select 'Updates.' The first page will show you the familiar latest and greatest short-form updates from a comprehensive set of sources, but now there’s a new chart at the top. In that chart, you can select the year, month or day, or click any point to view the tweets from that specific time period. . . ." (http://googleblog.blogspot.com/2010/04/replay-it-google-search-across-twitter.html, accessed 05-06-2010).

Filed under: Indexing & Seaching Information, News Media / Journalism, Social Media / Wikis | Bookmark or share this entry »

Using the Twitter Archive for Historical Research April 30, 2010

The New York Times publishes "When History is Compiled 140 Characters at a Time" from which I quote:

“ 'Twitter is tens of millions of active users. There is no archive with tens of millions of diaries,' said Daniel J. Cohen, an associate professor of history at George Mason University and co-author of a 2006 book, 'Digital History.' What’s more, he said, 'Twitter is of the moment; it’s where people are the most honest.'  

"Last month, Twitter announced that it would donate its archive of public messages to the Library of Congress, and supply it with continuous updates.  

"Several historians said the bequest had tremendous potential. 'My initial reaction was, ‘When you look at it Tweet by Tweet, it looks like junk,’ said Amy Murrell Taylor, an associate professor of history at the State University of New York, Albany. 'But it could be really valuable if looked through collectively.' Ms. Taylor is working on a book about slave runaways during the Civil War; the project involves mountains of paper documents. 'I don’t have a search engine to sift through it,' she said.  

"The Twitter archive, which was 'born digital,' as archivists say, will be easily searchable by machine — unlike family letters and diaries gathering dust in attics.  

"As a written record, Tweets are very close to the originating thoughts. 'Most of our sources are written after the fact, mediated by memory — sometimes false memory,' Ms. Taylor said. 'And newspapers are mediated by editors. Tweets take you right into the moment in a way that no other sources do. That’s what is so exciting.'  

"Twitter messages preserve witness accounts of an extraordinary variety of events all over the planet. 'In the past, some people were able on site to write about, or sketch, as a witness to an event like the hanging of John Brown,' said William G. Thomas III, a professor of history at the University of Nebraska-Lincoln. 'But that’s a very rare, exceptional historical record.'  

"Ten billion Twitter messages take up little storage space: about five terabytes of data. (A two-terabyte hard drive can be found for less than $150.) And Twitter says the archive will be a bit smaller when it is sent to the library. Before transferring it, the company will remove the messages of users who opted to designate their account 'protected,' so that only people who obtain their explicit permission can follow them.

"A Twitter user can also elect to use a pseudonym and not share any personally identifying information. Twitter does not add identity tags that match its users to real people.  

"Each message is accompanied by some tidbits of supplemental information, like the number of followers that the author had at the time and how many users the author was following. While Mr. Cohen said it would be useful for a historian to know who the followers and the followed are, this information is not included in the Tweet itself.  

"But there’s nothing private about who follows whom among users of Twitter’s unprotected, public accounts. This information is displayed both at Twitter’s own site and in applications developed by third parties whom Twitter welcomes to tap its database.  

"Alexander Macgillivray, Twitter’s general counsel, said, 'From the beginning, Twitter has been a public and open service.' Twitter’s privacy policy states: 'Our services are primarily designed to help you share information with the world. Most of the information you provide to us is information you are asking us to make public.  

"Mr. Macgillivray added, 'That’s why, when we were revising our privacy policy, we toyed with the idea of calling it our ‘public policy.’ ' He said the company would have done so but California law required that it have a 'privacy policy' labeled as such.  

"Even though public Tweets were always intended for everyone’s eyes, the Library of Congress is skittish about stepping anywhere in the vicinity of a controversy. Martha Anderson, director of the National Digital Information Infrastructure and Preservation Program at the library, said, 'There’s concern about privacy issues in the near term and we’re sensitive to these concerns.'  

"The library will embargo messages for six months after their original transmission. If that is not enough to put privacy issues to rest, she said, 'We may have to filter certain things or wait longer to make them available.' The library plans to dole out its access to its Twitter archive only to those whom Ms. Anderson called “qualified researchers.”  

"BUT the library’ s restrictions on access will not matter. Mr. Macgillivray at Twitter said his company would be turning over copies of its public archive to Google, Yahoo and Microsoft, too. These companies already receive instantaneously the stream of current Twitter messages. When the archive of older Tweets is added to their data storehouses, they will have a complete, constantly updated, set, and users won’t encounter a six-month embargo.  

"Google already offers its users Replay, the option of restricting a keyword search only to Tweets and to particular periods. It’s quickly reached from a search results page. (Click on 'Show options,' then 'Updates,' then a particular place on the timeline.)  

"A tool like Google Replay is helpful in focusing on one topic. But it displays only 10 Tweets at a time. To browse 10 billion — let’s see, figuring six seconds for a quick scan of each screen — would require about 190 sleepless years.  

"Mr. Cohen encourages historians to find new tools and methods for mining the 'staggeringly large historical record' of Tweets. This will require a different approach, he said, one that lets go of straightforward 'anecdotal history.' " (http://www.nytimes.com/2010/05/02/business/02digi.html?scp=1&sq=twitter%20+%20history&st=cse, accessed 05-06-2010).

Filed under: Freedom / Privacy / Security , Indexing & Seaching Information, Libraries , News Media / Journalism, Preservation & Conservation of Information, Social Media / Wikis | Bookmark or share this entry »

Google Introduces Translation Feature for Google Goggles May 6, 2010

Google announces a translation feature for Google Goggles, image recognition and search feature available on Android-based mobile devices.

"Here’s how it works:

"Point your phone at a word or phrase. Use the region of interest button to draw a box around specific words Press the shutter button

"If Goggles recognizes the text, it will give you the option to translate

"Press the translate button to select the source and destination languages."

"Today Goggles can read English, French, Italian, German and Spanish and can translate to many more languages. We are hard at work extending our recognition capabilities to other Latin-based languages. Our goal is to eventually read non-Latin languages (such as Chinese, Hindi and Arabic) as well."

Filed under: Imaging / Photography , Indexing & Seaching Information, Linguistics / Translation / Speech | Bookmark or share this entry »