8,000 BCE – 1,000 BCE
One of the Earliest Surviving Examples of Narrative Relief Sculpture and Egyptian Hieroglyphs
Circa 3,200 BCE

The Narmer Palette, one of the earliest surviving examples of narrative relief sculpture, was found during excavations at Hierakonpolis (modern Kawm al-Ahmar) in the 1890s. It is also one of the earliest surviving records of Egyptian hieroglyphs.
The Narmer Palette is preserved in the Museum of Egyptian Antiquities, Cairo.
Filed under: Archaeology, Art , Linguistics / Translation / Speech, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
The Earliest Autograph Signatures
Circa 3,100 BCE

Pictographic lexical lists written in ancient Sumerian pictographic script on clay tablets are the earliest literature known, and also the earliest known evidence of school and learning.
An example preserved in the Schøyen Collection (MS 2429/4 MS 2429/4) is a lexical list of 41 titles and professions, starting: Nam Gist Sita (Lord of the Mace), signed by the scribe Gar.Ama.
The scribal signatures on this tablet and other lexical lists are the earliest autograph signatures extant.
Filed under: Education / Reading / Literacy, Linguistics / Translation / Speech, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
The Earliest Known Dictionaries
Circa 2,300 BCE

The oldest known dictionaries are cuneiform tablets from the Akkadian empire with biliingual wordlists in Sumerian and Akkadian discovered in Ebla in modern Syria.
The Urra=hubullu glossary, a major Babylonian glossary or encyclopedia from the second millenium BCE, preserved in the Louvre, is an outstanding example of this early form of wordlist.
"The canonical version extends to 24 tablets. The conventional title is the first gloss, ur5-ra and ḫubullu meaning "interest-bearing debt" in Sumerian and Akkadian, respectively. One bilingual version from Ugarit [RS2.(23)+] is Sumerian/Hurrian rather than Sumerian/Akkadian.
"Tablets 4 and 5 list naval and terrestrial vehicles, respectively. Tablets 13 to 15 contain a systematic enumeration of animal names, tablet 16 lists stones and tablet 17 plants. Tablet 22 lists star names.
"The bulk of the collection was compiled in the Old Babylonian period (early 2nd millennium BC), with pre-canonical forerunner documents extending into the later 3rd millennium" (Wikipedia article on Urra=hubullu, accessed 05-08-2009).
Filed under: Archaeology, Book History, Linguistics / Translation / Speech | Bookmark or share this entry »
"The World's First Typewritten Document" - James Chadwick
Circa 2,000 BCE –
1,700 BCE

The Phaistos Disc, a disc of fired clay from the Minoan Palace of Phaistos on the island of Crete, was discovered in 1908 by the Italian archaeologist Luigi Pernier, and remains the most famous document found in Crete.
"It is about 15 cm (5.9 in) in diameter and covered on both sides with a spiral of stamped symbols. Its purpose and meaning, and even its original geographical place of manufacture, remain disputed, making it one of the most famous mysteries of archaeology. This unique object is now on display at the archaeological museum of Heraklion in Crete" (Wikipedia article on Phaistos Disc, accessed 07-26-2009).
Because of the unique features of the disc, and the mysteries surrounding its origin, many people have doubted its authenticity, but no one has yet been able to prove conclusively that it is a forgery.
"The disk has the distinction of being the world's first typewritten document. It was made by taking a stamp or punch bearing the sign to be written in a raised pattern, and impressing this on the wet clay. The maker therefore needed to have as many stamps as there were signs in the script. It has the advantage that even complicated signs can be quickly written, and every example of the same sign is identical and easy to read. The disadvantage is that a considerable outlay of time and effort is required to make the set of stamps before any document can be produced. It is therefore evident that the system was not created solely for a single document; its maker must have intended to reproduce a large number of documents, though it remains some way from being an anticipation of printing.
"It is therefore all the more remarkable that after more than eighty years of excavation not another single scrap of clay impressed with these stamps had been found at Phaistos, or at any other site in Crete or elsewhere. It would be very surprising if there were not somewhere more examples of the script waiting to be found, but the disk remains so far unique, and the suspicion must arise that it was an isolated object brought from some other area.
"This impression of foreign origin can be supported by two arguments. The work of cutting the stamps, whether made directly or perhaps more likely by making moulds into which metal was poured, is a technique very similar to gem-engraving. We might therefore expect the signs to bear a stylistic resemblance to those engraved on seal-stones. In fact the style of art is noticeably different. Secondly, some of the objects depicted by the signs have a distinctly foreign appearance to those familiar with Minoan art" (Chadwick, Linear B and Related Scripts [1987] 57-58).
Filed under: Archaeology, Art , Crimes / Forgeries / Hoaxes , Linguistics / Translation / Speech, Printing / Typography, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
Archive of Egyptian Diplomatic Correspondence Written in the Diplomatic Language, Akkadian Cuneiform
Circa 1,360 BCE –
1,330 BCE

The Amarna Letters, or Correspondence, an archive of mostly diplomatic correspondence written on clay tablets, between the Egyptian administration and its representatives in Canaan and Amurru during the New Kingdom, was found in Upper Egypt at Amarna, the modern name for the Egyptian capital of Akhetaten (Akhetaton), founded by pharaoh Akhenaten (Akhnaton), during the Eighteenth dynasty of Egypt.
"The Amarna letters are unusual in Egyptological research, being mostly written in Akkadian cuneiform, the writing system of ancient Mesopotamia rather than ancient Egypt. The known tablets currently total 382 in number, 24 further tablets having been recovered since the Norwegian Assyriologist Jørgen Alexander Knudtzon's landmark edition of the Amarna correspondence, Die El-Amarna-Tafeln in two volumes (1907 and 1915).
"These letters, consisting of cuneiform tablets mostly written in Akkadian – the regional language of diplomacy for this period – were first discovered by local Egyptians around 1887, who secretly dug most of them from the ruined city (they were originally stored in an ancient building archaeologists have since called the Bureau of Correspondence of Pharaoh) and then sold them on the antiquities market. Once the location where they were found was determined, the ruins were explored for more. The first archaeologist who successfully recovered more tablets was William Flinders Petrie in 1891–92, who found 21 fragments. Émile Chassinat, then director of the French Institute for Oriental Archaeology in Cairo, acquired two more tablets in 1903. Since Knudtzon's edition, some 24 more tablets, or fragments of tablets, have been found, either in Egypt, or identified in the collections of various museums.
"The tablets originally recovered by local Egyptians have been scattered among museums in Cairo, Europe and the United States: 202 or 203 are at the Vorderasiatisches Museum in Berlin; 80 in the British Museum; 49 or 50 at the Egyptian Museum in Cairo; seven at the Louvre; 3 at the Pushkin Museum; and 1 is currently in the collection of the Oriental Institute in Chicago.
"The full archive, which includes correspondence from the preceding reign of Amenhotep III as well, contained over three hundred diplomatic letters; the remainder are a miscellany of literary or educational materials. These tablets shed much light on Egyptian relations with Babylonia, Assyria, the Mitanni, the Hittites, Syria, Canaan, and Alashiya (Cyprus). They are important for establishing both the history and chronology of the period. Letters from the Babylonian king Kadashman-Enlil I anchor the timeframe of Akhenaten's reign to the mid-14th century BC. Here was also found the first mention of a Near Eastern group known as the Habiru, whose possible connection with the Hebrews remains debated. Other rulers include Tushratta of Mittani, Lib'ayu of Shehchem, Abdi-Heba of Jerusalem and the quarrelsome king Rib-Hadda of Byblos, who in over 58 letters continuously pleads for Egyptian military help" (Wikipedia article on Amarna letters, accessed 09-01-2009).
Filed under: Archaeology, Archives, Linguistics / Translation / Speech, Social / Political , Survival of Information, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
Possibly the Earliest Hebrew Inscription
Circa 1,000 BCE

An ostracon shard found in October 2008 about 20 miles southwest of Jerusalem at the Elah Fortress in Khirbet Qeiyafa, the earliest known fortified city of the biblical period of Israel, and written in ink in Proto-Canaanite script, could be the earliest known Hebrew inscription, according to biblical archaeologist Yosef Garfinkel. Other scholars urge caution in accepting that interpretation. The shard is one of only a dozen or so examples of Proto-Canaanite that have survived.
"The Israelites were not the only ones using proto-Canaanite characters, and other scholars suggest it is difficult - perhaps impossible - to conclude the text is Hebrew and not a related tongue spoken in the area at the time. Garfinkel bases his identification on a three-letter verb from the inscription meaning to do, a word he said existed only in Hebrew.
" 'That leads us to believe that this is Hebrew, and that this is the oldest Hebrew inscription that has been found,' he said.
"Other prominent Biblical archaeologists warned against jumping to conclusions.
"Hebrew University archaeologist Amihai Mazar said the inscription was very important, as it is the longest proto-Canaanite text ever found. But he suggested that calling the text Hebrew might be going too far" (http://www.haaretz.com/hasen/spages/1032929.html, accessed 08-30-2009).
Filed under: Archaeology, Linguistics / Translation / Speech, Survival of Information, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
1,000 BCE – 300 BCE
The First Olympic Games
776 BCE
Date of the first Olympic games, according to ancient Greek records, which also represent the adoption in Greece of the Phoenician alphabet, from which all other Western alphabets are descended.
The date is based on inscriptions, found at Olympia, of the winners of a foot race held every four years, starting in 776 BCE.
Filed under: Archaeology, Games / Simulations , Linguistics / Translation / Speech, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
One of the Oldest Known Examples of Writing in Greek
Circa 740 BCE –
720 BCE

The so-called Cup of Nestor from Pithikoussai, a clay drinking cup (kotyle) that was found in 1954 at excavations in a grave in the ancient Greek site of Pithikoussai on the island of Ischia in Italy, bears a three-line inscription that was scratched on its side at a later time. This inscription and the so-called Dipylon inscription from Athens, also noticed in this database, are the oldest known examples of writing in the Greek alphabet.
Pithikoussai was one of the earliest Greek colonies in the West. The cup is dated to the Geometric Period (c.750-700 BCE) and is believed to have been originally manufactured in Rhodes. It is preserved in the Villa Arbusto museum in the village of Lacco Ameno on the island of Ischia, Italy.
Both the Cup of Nestor and the Dipylon inscription have been linked to early writing in the island of Euboea.
Filed under: Archaeology, Education / Reading / Literacy, Linguistics / Translation / Speech, Survival of Information, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
The Marsiliana Tablet Abecedarium
700 BCE
It is not clear whether the process of adaptation of the Old Italic or Etruscan alphabet from the Greek alphabet took place in Italy in the city of Cumae, the first Greek colony on the mainland of Italy, or in Greece/Asia Minor. The Etruscan alphabet was a precursor of the Old Latin alphabet, the basis of the Latin alphabet.
"It was in any case a Western Greek alphabet. In the alphabets of the West, X had the sound value [ks], Ψ stood for [kʰ]; in Etruscan: X = [s], Ψ = [kʰ] or [kχ] (Rix 202-209).
"The earliest Etruscan abecedarium, the Marsiliana d'Albegna (near Grosseto) tablet which dates to c. 700 BCE, lists 26 letters corresponding to contemporary forms of the Greek alphabet which retained san and qoppa but which had not yet developed omega.
" 𐌀 𐌁 𐌂 𐌃 𐌄 𐌅 𐌆 𐌇 𐌈 𐌉 𐌊 𐌋 𐌌
"in transliteration,
"A B G D E V Z H Θ I K L M N Ξ O P Ś Q R S T Y X Φ Ψ"
"21 of the 26 archaic Etruscan letters were adopted for Old Latin from the 7th century BCE, either directly from the Cumae alphabet, or via archaic Etruscan forms, compared to the classical Etruscan alphabet retaining B, D, K, O, Q, X but dropping Θ, Ś, Φ, Ψ, F (Etruscan U is Latin V, Etruscan V is Latin F).
"𐌀 𐌁 𐌂 𐌃 𐌄 𐌅 𐌆 𐌇 𐌉 𐌊 𐌋 𐌌 𐌍 𐌏 𐌐 𐌒 𐌓 𐌔 𐌕 𐌖 𐌗
"A B C D E F Z H I K L M N O P Q R S T V X
(Wikipedia article on Old Italic alphabet, accessed 08-02-2009).
Filed under: Linguistics / Translation / Speech, Survival of Information, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
The Taylor Prism and the Sennacherib Prism
689 BCE –
691 BCE
The Taylor Prism, a six-sided baked clay document (or prism) was discovered at the Assyrian capital Nineveh, in an area known today as Nebi Yunus, now Iraq. It was acquired by Colonel R. Taylor, British Consul General at Baghdad, in 1830, after whom it is named. The British Museum purchased it from Taylor's widow in 1855.
One of the first major Assyrian documents discovered, the Taylor Prism played an important part in the decipherment of cuneiform script.
"The prism is a foundation record, intended to preserve King Sennacherib's achievements for posterity and the gods. The record of his account of his third campaign (701 BC) is particularly interesting to scholars. It involved the destruction of forty-six cities of the state of Judah and the deportation of 200,150 people. Hezekiah, king of Judah, is said to have sent tribute to Sennacherib. This event is described from another point of view in the Old Testament books of 2 Kings and Isaiah. Interestingly, the text on the prism makes no mention of the siege of Lachish which took place during the same campaign and is illustrated in a series of panels from Sennacherib's palace at Nineveh" (http://www.britishmuseum.org/explore/highlights/highlight_objects/me/t/the_taylor_prism.aspx, accessed 12-26-2009).
♦ Another version of the same text, produced in the same prism format, and known as the Sennacherib Prism, was purchased by James Henry Breasted from a Baghdad antiques dealer in 1919 for the Oriental Institute of Chicago, where it is preserved. The two known complete examples of Sennacherib's inscription are nearly identical, although the dates on the prisms show that they were written sixteen months apart, the Taylor Prism in 691 BCE and the Oriental Institute prism in 689 BCE. There are also at least eight other fragmentary prisms preserving parts of this text, all in the British Museum, and most of them containing just a few lines.
Filed under: Linguistics / Translation / Speech, Social / Political , Survival of Information, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
The Rosetta Stone of Cuneiform Script
522 BCE –
486 BCE
The Behistun Inscription (also Bisitun or Bisutun, Modern Persian: بیستون ; Old Persian: Bagastana, meaning "the god's place or land"), a multi-lingual stone inscription approximately 15 meters high and 25 meters wide, located on Mount Behistun in Kermanshah Province, near the city of Kermanshah in western Iran, was written by Darius I, the Great sometime between his coronation as Zoroastrian king of kings of the Achaemenid, or Persian, Empire in the summer of 522 BCE and his death in autumn of 486 BCE.
" . . . the inscription begins with a brief autobiography of Darius I, the Great including his ancestry, lineage etc. Later in the inscription, Darius provides a lengthy sequence of events following the death of Cyrus the Great and Cambyses II in which he fought nineteen battles in a period of one year (ending in December of 521 BC) to put down multiple rebellions throughout the Persian Empire. Darius' inscription states in detail that the rebellions, which had resulted from the deaths of Cyrus the Great and his son Cambyses II, were orchestrated by several impostors and their co-conspirators in various cities throughout the empire, each of whom falsely proclaimed kinghood during the upheaval following Cyrus the Great's death. Darius the Great proclaimed himself victorious in all battles during the period of upheaval, attributing his success to the "grace of Ahuramazda (God)".
"The inscription includes three versions of the same text, written in three different cuneiform script languages: Old Persian, Elamite, and Babylonian. Babylonian was a later form of Akkadian: unlike Old Persian, they are Semitic languages. In effect, then, the inscription is to cuneiform what the Rosetta Stone is to Egyptian hieroglyphs: the document most crucial in the decipherment of a previously lost script.
"Translation of the text was a multi-step and multi-national effort based on earlier work done on the decipherment of the Old Persian script by Georg Friedrich Grotefend in the late 1700's when Grotefend discovered that, unlike Elamite and Babylonian texts, Old Persian text is alphabetic. In the following years, the efforts of [Eugène] Burnouf, [Christian] Lassen, and [Henry] Rawlinson (who had the remainder of the inscription transcribed in two parts, in 1835 and 1843) contributed to translating the Old Persian cuneiform text using the Zoroastrian book Avesta as a key, in addition to cross referencing with modern Persian and Vedic languages. With the Old Persian text deciphered, Rawlinson and others were able to then translate the Elamite and Babylonian texts (both of which were ancient translations of the Old Persian text) after 1843.
"The Inscription is . . . 100 metres up a limestone cliff from an ancient road connecting the capitals of Babylonia and Media (Babylon and Ecbatana, respectively). The mountainside was removed to make the inscription more visible after its completion. The Old Persian text contains 414 lines in five columns; the Elamite text includes 593 lines in eight columns, and the Babylonian text is in 112 lines. The inscription was illustrated by a life-sized bas-relief of Darius I, the Great, holding a bow as a sign of kingship, with his left foot on the chest of a figure lying on his back before him. The prostrate figure is reputed to be the pretender Gaumata. Darius is attended to the left by two servants, and ten one-metre figures stand to the right, with hands tied and rope around their necks, representing conquered peoples. Faravahar floats above, giving his blessing to the king" (Wikipedia article on Behistun Inscription, accessed 12-27-2009).
Filed under: Archaeology, Linguistics / Translation / Speech, Social / Political , Writing / Palaeography / Calligraphy | Bookmark or share this entry »
The Earliest Known Work on Descriptive Linguistics
Circa 501 BCE

Panini, an Indian grammarian from Gandhara, composed his formulation of 3,959 rules of Sanskrit morphology known as Ashtadhyayi. This is the earliest known work on descriptive linguistics. It includes the concepts of the phoneme, the morpheme, and the root, and metarules, transformation, and recursion.
Filed under: Linguistics / Translation / Speech | Bookmark or share this entry »
300 BCE – 30 CE
The Earliest Known Examples of Maya Script
Circa 300 BCE
The earliest stone inscription which is identifiably in Maya script, (or Maya glyphs or May hieroglyphs) was found in in the pre-Columbian archaeological site of San Bartolo, northeastern Guatemala in 2005. This vertical column of ten glyphic words roughly six inches long "may be related to a nearby painted image of the maize god" (http://www.nytimes.com/2006/01/10/science/10maya.html?_r=1, accessed 03-23-2010). In 2010 this inscription had not been deciphered.
Filed under: Archaeology, Linguistics / Translation / Speech, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
The Earliest Surviving Monolingual Dictionary
Circa 250 BCE

The earliest surviving monolingual dictionary is the Chinese dictionary called the Eyra.
"The Erya has been described as a dictionary, glossary, synonymicon, thesaurus, and encyclopaedia. Karlgren (1931: 46) explains that the book "is not a dictionary in abstracto, it is a collection of direct glosses to concrete passages in ancient texts." The received text contains 2094 entries, covering about 4300 words, and a total of 13,113 characters. It is divided into nineteen sections, the first of which is subdivided into two parts. The title of each chapter combines shi ("explain; elucidate") with a term describing the words under definition. Seven chapters (4, 8, 9, 10, 12, 18, and 19) are organized into taxonomies. For instance, chapter 4 defines terms for: paternal clan (宗族), maternal relatives (母黨), wife's relatives (妻黨), and marriage (婚姻). The text is divided between the first three heterogeneous chapters defining abstract words and the last sixteen semantically-arranged chapters defining concrete words. The last seven – concerning grasses, trees, insects and reptiles, fish, birds, wild animals, and domestic animals – describe more than 590 kinds of flora and fauna. It is a valuable document of natural history and historical biogeography" (Wikipedia article on Eyra, accessed 05-08-2008).
Filed under: Indexing & Seaching Information, Linguistics / Translation / Speech, Natural History, Organization of Information / Taxonomy | Bookmark or share this entry »
30 CE – 500 CE
The New Testament Was Probably Written over Less than a Century
Circa 65 CE –
150 CE
Unlike the Old Testament, which was written over several hundred years, the New Testament was written in a relatively narrow span of time, probably less than a century.
The 27 books of the New Testament were written by various authors at various times and places, probably in Koine Greek, the vernacular dialect in first-century Roman provinces. "Koine Greek is not only important to the history of the Greeks for being their first common dialect . . ., but it's also important . . . for being the first 'international' form of speech, and eventually the chosen medium for the teaching and spreading of Christianity. Koine Greek was unofficially a first or second language in the Roman Empire."
Filed under: Book History, Linguistics / Translation / Speech, Religious Texts / Religion | Bookmark or share this entry »
500 CE – 600
The Codex Argenteus, Written in Silver and Gold Letters on Purple Vellum
Circa 520

The Codex Argenteus, the "Silver Bible," is written in silver and gold letters on purple vellum in Ravenna, Italy about this time, probably for the Ostrogothic ruler of Italy, Theodoric.
The Codex Argenteus contains fragments of the Four Gospels in the fourth-century Gothic version of Bishop Ulfilas (Wulfila), and is the primary surviving example of the Gothic language, an extinct Germanic language that was spoken by the Goths. Of the original 336 leaves only 188 are preserved at the Carolina Rediviva library at the University of Uppsala, Sweden, plus one separate leaf, discovered, remarkably, in 1970 in the cathedral of Speyer in Germany.
During the Ostrogothic rule of Italy there was a bilateral Gothic-Latin culture, of which the Codex Brixianus survives as a Latin counterpart to the Codex Argenteus. "With the end of Gothic rule the Gothic manuscripts in Italy were rendered valueless; what remained of them (with the exception of the Codex Argenteus) became part of that waste material which in the seventh and eighth centuries was re-used in Bobbio" (Bischoff, Latin Palaeography: Antiquity and Middle Ages [1990] 186).
The manuscript was discovered in the middle of the 16th century in the library of the Benedictine monastery of Werden in the Ruhr, near Essen in Germany. This abbey, whose abbots were imperial princes with a seat in the imperial diets, was among the richest monasteries of the Holy Roman Empire.
"Later the manuscript became the property of the Emperor Rudolph II, and when, in July 1648, the last year of the Thirty Years' War, the Swedes occupied Prague, it fell into their hands together with the other treasures of the Imperial Castle of Hradcany. It was subsequently deposited in the library of Queen Christina in Stockholm, but on the abdication of the Queen in 1654 it was acquired by one of her librarians, the Dutch scholar Isaac Vossius. He took the manuscript with him to Holland, where, in 1662, the Swedish Count Magnus Gabriel De la Gardie bought the codex from Vossius and, in 1669, presented it to the University of Uppsala. He had previously had it bound in a chased silver binding, made in Stockholm from designs by the painter David Klöcker Ehrenstrahl" (http://www.ub.uu.se/arv/codexeng.cfm, accessed 11-22-2008).
Filed under: Book History, Bookbinding, Libraries , Linguistics / Translation / Speech, Manuscript Illumination, Manuscripts & Manuscript Copying, Religious Texts / Religion, Survival of Information | Bookmark or share this entry »
700 – 800
The Earliest Surviving Document in Italian?
Circa 775 –
825

The growth of a written vernacular allowed the development of a written culture outside the religious orders.
The Indovinello versonese or Veronese Riddle, a riddle, apparently half-Italian, half-Latin, written on the margin of a manuscript probably in the late eighth or early ninth century by a monk from Verona--a city in the Veneto region in Northern Italy--was considered the first document ever written in the Italian language for some years after its discovery by Schiaparelli in 1924.
"Many more European documents seem to confirm that the distinctive traits of Romance languages occurred all around the same time (e.g. France's Serments de Strasburg). Though initially hailed as the earliest document in Italian in the first years following Schiapparelli's discovery, today the record has been disputed by many scholars from Migliorini to Segre and Bruni, who have placed it at the latest stage of Vulgar Latin, though this very term is far from being clear-cut and Migliorini himself considers it dilapidated. At present, however, the Placito Capuano (960 A.D.) (the first in a series of four documents dating 960-963 A.D. issued by a Capuan court) is considered to be the first document ever written in Italian, although Migliorini concedes that since the Placito was put on record as an official court proceeding (and signed by a notary), Italian must have been widely spoken for at least one century" (Wikipedia article on Veronese Riddle, accessed 06-22-2009).
Filed under: Linguistics / Translation / Speech, Survival of Information | Bookmark or share this entry »
900 – 1000
Massive Byzantine Encyclopedic Dictionary
Circa 950
The Suda, or Souda, a massive Byzantine encyclopedic dictionary of the Mediterranean world written in Greek, contains 30,000 entries, many drawn from ancient sources that were since lost. Little is known regarding its compilation except that it must have been compiled before the 12th century writer, Eustathius of Thessalonica, who frequently quotes from it.
"The Suda is somewhere between a grammatical dictionary and an encyclopedia in the modern sense. It explains the source, derivation, and meaning of words according to the philology of its period, using such earlier authorities as Harpocration and Helladios. There is nothing especially important about this aspect of the work. It is the articles on literary history that are valuable. These entries supply details and quotations from authors whose works are otherwise lost. They use older scholia to the classics (Homer, Thucydides, Sophocles, etc.), and for later writers, Polybius, Josephus, the Chronicon Paschale, George Syncellus, George Hamartolus, and so on.
"This lexicon represents a convenient work of reference for persons who played a part in political, ecclesiastical, and literary history in the East down to the tenth century. The chief source for this is the encyclopedia of Constantine VII Porphyrogenitus (912-59), and for Roman history the excerpts of John of Antioch (seventh century). Krumbacher (Byzantinische Literatur, 566) counts two main sources of the work: Constantine VII for ancient history, and Hamartolus (Georgios Monachos) for the Byzantine age" (Wikipedia article on Suda, accessed 02-02-2010).
The most significant edition of the Suda is Suda On Line: Byzantine Lexicography.
"The purpose of the Suda On Line is to open up this stronghold of information by means of a freely accessible, keyword-searchable, XML-encoded database with translations, annotations, bibliography, and automatically generated links to a number of other important electronic resources. To date over 170 scholars have contributed to the project from eighteen countries and four continents. Of the 30,000-odd entries in the lexicon, over 25,000 have been translated as of this date, and more translations are submitted every day."
Filed under: Indexing & Seaching Information, Linguistics / Translation / Speech, Organization of Information / Taxonomy | Bookmark or share this entry »
1200 – 1300
The Oldest Surviving Literary Document in Yiddish
1272

The oldest surviving literary document in Yiddish, the language originated by the Askenazi Jews of Central and Eastern Europe, dates from this year. It is a blessing in the Mahzor Worms, a festival prayerbook in Hebrew according to the Ashkenazi rite of the Jews in Worms, Germany, for the use of hazanim (cantors) in the synagogue.
The manuscript is preserved in the Jewish National and University Library of the Hebrew University of Jerusalem.
♦ You can download a digital facsimile of the Mahzor Worms from the Jewish National and University Library at this link: http://jnul.huji.ac.il/dl/mss/worms/a_eng.html, accessed 04-04-2010.
Filed under: Linguistics / Translation / Speech, Religious Texts / Religion, Survival of Information | Bookmark or share this entry »
1300 – 1400
The Earliest Surviving Example of Old Polish Literature
Circa 1375

The Psałterz Floriansk, an illuminated psalmody, consisting of parallel Latin, Polish and German texts created toward the end of the 14th century, is probably the earliest surviving example of literature in the Old Polish language. Sometimes also known as Hedwig Psałterz, its name comes from a village in Austria — St. Florian. The manuscript was discovered in 1827, and first published as a printed book in Vienna, 1834. It was acquired by Poland in 1931 and is preserved in the National Library of Poland in Warsaw.
Filed under: Linguistics / Translation / Speech, Manuscript Illumination, Survival of Information | Bookmark or share this entry »
1400 – 1450
The Earliest Grammar of a Romance Language
1437 –
1441
Italian author, artist, architect, poet, priest, linguist, philosopher, and cryptographer Leon Battista Alberti writes Grammatica della lingua toscana.
This is the earliest grammar of a Romance language. Also called the Grammatichetta vaticana, it is known from the only surviving manuscript copy included in the codex Reginense Latino 1370 preserved in Rome in the Vatican Library.
Albert's Grammatica della lingua toscana was not published in print until 1908.
Filed under: Linguistics / Translation / Speech | Bookmark or share this entry »
Lorenzo Valla Proves that the Donation of Constantine is a Forgery
1440
Italian humanist, rhetorician and orator Lorenzo Valla publishes De falso credita et ementita Constantini Donatione declamatio, proving on historical and linguistic grounds that the Donation of Constantine is a forgery.
Valla showed that the "document could not possibly have been written in the historical era of Constantine I (4th Century), as its vernacular style dated conclusively to a later era (8th Century). One of Valla's reasons was that the document contained the word satrap which he believed Romans such as Constantine I would not have used. The document, though met with great criticism at its introduction, was accepted as legitimate, in part owing to the beneficial nature of the document for the western church. The Donation of Constantine suggests that Constantine I "donated" the whole of the Western Roman Empire to the Roman Catholic Church as an act of gratitude for having been miraculously cured of leprosy by Pope Sylvester I. This would have obviously discounted Pepin the Short's own Donation of Pepin, which gave the Lombards land to the north of Rome.
"Valla was motivated to reveal the Donation of Constantine as a fraud by his employer of the time, Alfonso of Aragon, who was involved in a territorial conflict with the Papal States, then under Pope Eugene IV. The Donation of Constantine had often been cited to support the temporal power of the Papacy, since at least the 11th century.
"[Valla's] essay began circulating in 1440, but was heavily rejected by the Church. It was not formally published until 1517. It became popular among Protestants. An English translation was published for Thomas Cromwell in 1534. Valla's case was so convincingly argued that it still stands today, and the illegitimacy of the Donation of Constantine is generally conceded" (Wikipedia article on Lorenzo Valla, accessed 01-17-2009).
Filed under: Crimes / Forgeries / Hoaxes , Linguistics / Translation / Speech, Religious Texts / Religion | Bookmark or share this entry »
1450 – 1500
An Intermediate Form between a Collection of Prints and a Blockbook
Circa 1460 –
1465
It appears that no blockbooks (block books) in the literal sense were published in France in the 15th century. An example of an intermediate form between a collection of prints and a blockbook printed in France about 1465 was a collection of three woodcuts with text, printed on one side of three sheets, entitled Les neuf preux. This is known from a single copy preserved in the Bibliothèque nationale de France.
"It consists of three sheets of paper, each of which contains an impression from a block containing three figures. They are printed by means of the frotton in light-coloured ink, and have been coloured by hand. The first sheet contains pictures of the three champions of classical times, Hector, Alexander, and Julius Caesar; the second the three champions of the Old Testament, Joshua, David, and Judas Maccabeaeus; the third, the three champions of mediaeval history, Arthur, Charlemagne, and Godfrey of Boulogne. Under each picture is a stanza of six lines, all rhyming, cut in a body type.
"These leaves form part of the Armorial of Gilles le Bouvier, who was King-at-Arms to Charles VII of France; and as the manuscript was finished between 9th November 1454 and 22 September 1457, it is reasonable to suppose that the prints were executed in France, probably at Paris, before the latter date. The verses are, at any rate, the oldest printed specimen of the French language" (Duff, Early Printed Books (1893) 17-18).
Les neuf preux is described by Ursula Baurmeister in Catalogue des incunables de la Bibliothèque nationale de France (CIBN), Vol. 1, fascicule 1 (Xylographes) no. NN-1.
The Armorial of Gilles le Bouvier is BnF Ms. fr. 4985.
In "Prints in the Early Printing Shops," Parshall (ed) The Woodcut in Fifteenth-Century Europe (2009) 39-91 Paul Needham discusses publications related to Les neuf preux.
Filed under: Book History, Book Illustration, Linguistics / Translation / Speech, Printing / Typography, Prints and Printmaking, Survival of Information | Bookmark or share this entry »
The First Technical Dictionary
1473 –
1474
Printer Günther Zainer of Augsburg, Germany, issues Vocabularius, with text in both Latin and German. ISTC no. iv00322000.
Vocabularius rerum was the first technical dictionary, and after the Vocabularius ex quo (1467), the first bi-lingual dictionary, of which one copy is recorded (ISTC no. v00361700). The work was "devoted entirely to technical terms, each with its own section, of medicine (four sections), culinary and medicinal herbs and food plants, zoology, mining and mineralogy, navigation, architecture, textiles, tanning and leather work, musical instruments, books and book production, cooking and kitchen utensils, baking, wine and viticulture, gambling, carpentry, horses and carriages, etc.
"Some of the words are highly technical, lexicographical rarities. In the section on scribes and book production we find definitions not only of the traditional scribal tools (calamus, stilus, graphius, pugillaris, etc.), but also of such specialist words as antipira (= the scribe's eye-shade, for protection against the fire or candle-light), corrosorium (= the mill or grinder to reduce chalk to a powder for the preparation of vellum), and epicausterium (= the table-cloth on which the parchment is laid for ease of writing). None of these last words occurs, for example, in Karen Gould's "Terms for Book Production in a Fifteenth-Century Latin-English Nominale", The Papers of the Bibliographical Society of America, 79 (1985), pp. 75-99. There is also an entry on the distinction between the words liber, volumen, and codex; likewise between exemplar and exemplum.' (Nicholas Poole-Wilson). . . ." (W. P. Watson Antiquarian Books, online description, accessed 08-09-2009).
"Possessed of a knowledge of names rather than of things, the mediaeval student had one urgent need - a dictionary. New words began to pour in—in Arabic, Syriac, Hebrew, and Greek—whose meanings he sought to know; and, for the medical student, there were new drugs, the composition and uses of which were essential to his practice. It is not surprising then to find books of the dictionary class among the first to be printed. . . . The Vocabularius . . . has four sections devoted to medicine: (1) De homine et de diversis membris, in which the parts of the body are defined in order, with the German equivalents; brief references to authors are given. (2) De nominibus balneatorum etc., containing all the terms relating to bathing, bleeding, and cupping. (3) De medicis et eorum que pertinent ad medicine artes. The definitions here are most interesting... Siringa is described as a metallic instrument with which a surgeon injects resolving medicines into the Virile member in order to dissolve calculi in the bladder. (4) De nominibus quorundam egritudinum, contains seven and a half folios of definitions of diseases." (Osler, Incunabula medica).
Filed under: Book History, Food / Wine / Cookery / Diet, Linguistics / Translation / Speech, Manuscripts & Manuscript Copying, Medicine, Science, Technology | Bookmark or share this entry »
1500 – 1550
The Transition from Latin to the Vernacular in the 16th Century
Circa 1500 –
1600
"The well defined traditional groups of readers knew Latin, and many read it with ease and better than their own mother tongue. Books in the vernacular languages were for 'every man, as well rude as learned,' and the student of literacy and literary taste must be as much concerned with the 'rude' as with the learned. Latin, the language of the educated, was the international language throughout the Middle Ages; this fact is reflected by the book production. Slightly more than three-fourths of surviving incunables are in Latin, the rest in different verancular languages. Throughout the XVIth century the percentage of books in the verancular increased, caused in part by the mounting concern of authors, printers and publishers with the 'rude' (men, women and children who were able or willing to read books in their own tongue, but not in Latin). It is also true that the importance of Latin as the language of communication among the learned declined, in spite of the revival of learning and increased concern with the classics and their style. Already during the first half of the XVIth century books in Latin and those in the vernacular languages were much more evenly distributed, and by the end of the XVIth century the latter accounted probably for more than half of the total production. Latin had lost its international character except among the clergy (of the Catholic Church), a coterie of Neo-Latin writers, and limited groups of scholars and professionals. National languages had won the battle. The favorable reception of books in the mother tongue was only one of several causes. Political and religious ferment of this period involved an ever increasing number of persons. In order to reach the largest possible number, the leaders and the propagandists turned more and more to the vernacular. A third factor was the changing attitude of the educated towards their own native language" (Hirsch, Printing, Selling, Reading 1450-1550 [1967] 132).
Filed under: Book History, Education / Reading / Literacy, Linguistics / Translation / Speech, Publishing | Bookmark or share this entry »
1550 – 1600
The First Book Printed in a Goidelic Language
April 24, 1567
Foirm na n-Urrnuidheadh (The Form of the Prayers), Bishop Séon Carsuel's (John Carswell's) translation into Gaelic of the Book of Common Order or "Knox's Liturgy", is published in Edinburgh at the press of Roibeard (Robert) Lekprevik. This was the first work printed in either Scottish or Gaelic, or any of the Goidelic languages.
"Its language has been characterised as 'exuberant, highly decorated classical common Gaelic', and helped forward the message of Scottish protestantism from the English-speaking south-east of the country into Gaelic-speaking Scotland. It was written in the traditional orthography of Irish Classical Common Gaelic, and Donald Meek has suggested that if it were not for Carsuel's training in this form of literacy and his decision to use it, Scottish Gaelic today may be employing, like the Manx language, a script with orthographic rules more similar to English and French than traditional Irish.
"It was also ground-breaking in its use of prose for non-heroic material, 'the first to use this type of formal Classical [Gaelic] prose'. And Carsuel had indeed complained in his work about earlier Gaelic writings, slamming the
'. . . darkness of sin and ignorance and design of those who teach and write and cultivate Gaelic, that they are more designed, and more accustomed, to compose vain, seductive, lying and worldly tales about the Tuatha De Danann and the sons of Mil and the heroes and Finn MacCoul and his warriors and to cultivate and piece together much else which I will not enumerate or tell here, for the purpose of winning for themselves the vain rewards of the world.'
"In the late 19th century, his skeleton was dug up; the skeleton measured seven feet in length, making Carsuel an extremely tall man by the standard of any era or geographical location (Wikipedia article on Séon Carsuel, accessed 12-11-2009).
Of the first edition of Foirm na n-Urrnuidheadh, only three copies—all imperfect—are known to exist. One is in Edinburgh University Library.
Filed under: Book History, Linguistics / Translation / Speech, Religious Texts / Religion, Survival of Information | Bookmark or share this entry »
First Complete Slavic Bible
July 20, 1580 –
August 12, 1581
Ivan Ivan Fyodorov, Fedorov or Fedorovych (Russian: Iва́н Федоров) prints the first complete Slavic Bible.
It is known as the Ostog Bible (Ukrainian: Острозька Біблія; Russian: Острожская Библия), because it was printed on the estate of the Ukrainian/Lithuanian prince, Konstanty Wasyl Ostrogski (Belarusian: Канстантын Васiль Астрожскi Lithuanian: Konstantinas Vasilijus Ostrogiškis Ukrainian: Костянтин-Василь Острозький) at Ostog, Ukraine.
"The Ostrog Bible is unique among Church Slavonic Bibles in that the Old Testament was translated not from the (Hebrew) Masoretic text, but from the (Greek) Septuagint. This translation, comprising seventy-six books of the Old and New Testaments, was based on the Gennadius Bible and a manuscript of the Codex Alexandrinus. Some parts were based on Francysk Skaryna's translations.
The Ostrog Bibles were printed on two dates: 12 July 1580, and 12 August 1581. The second version differs from the 1580 original in composition, ornamentation, and correction of misprints. In the printing of the Bible delays occurred, as it was necessary to remove mistakes, to search for correct textual resolutions of questions, and to produce a correct translation. The editing of the Bible detained printing. In the meantime, Fyodorov and his company printed other biblical books. The first were those which did not require correcting: the Psalter and the New Testament.
"The Ostrog Bible is a monumental publication of 1,256 pages, lavishly decorated with headpieces and initials, which were prepared especially for it. From the typographical point of view, the Ostrog Bible is irreproachable. This is the first Bible printed in Cyrillic type. It served as the original and model for further Russian publications of the Bible. The importance of the first printed Cyrillic Bible can hardly be overestimated. Prince Ostrogski sent copies to Pope Gregory XIII and tsar Ivan the Terrible, while the latter presented a copy to an English ambassador. When leaving Ostroh, Fyodorov took 400 books with him. Only 300 copies of the Ostrog Bible are extant today" (Wikipedia article on Ostrog Bible, accessed 01-03-2010).
Filed under: Book History, Linguistics / Translation / Speech, Printing / Typography, Religious Texts / Religion | Bookmark or share this entry »
1600 – 1650
The First Bibliography Published in the New World
1606
Franciscan Fray Juan Bautista publishes A Jesu Christo S.N. ofrece este Sermonario en lengua mexicana in Mexico, En casa de Diego Lopez Davalos.
This was the second collection of sermons published Nahuatl (Aztec) prefaced with a two-page list of previously published works by Bautista. The listing of books was the first bibliography published in the Western Hemisphere.
"On signature **iii (recto and verso) is a list of 'las obras que hasta agora ha impresso el auctor' ('the works that until now the author has had published'). The list is not in chronological order nor is it alphabetical by title; nonetheless it is a bibliography and supplies us with information now known only because of its inclusion here. Of the 17 items listed, several have failed to survive in any known copy, including the second part of this sermonario: at the time of publication of part one 'de la sequnda parte esta ya impresso gran pedaço' ('of the second part a large piece is already printed')" (Szewczyk & Buffington, 39 Books and Broadsides Printed In America Before the Bay Psalm Book [1989] no. 19).
Filed under: Bibliography, Book History, Linguistics / Translation / Speech, Printing / Typography, Religious Texts / Religion, Survival of Information | Bookmark or share this entry »
Descartes Discusses the Idea of an Artificial Language
1629
In a letter to theologian, philosopher, and mathematician Marin Mersenne, philosopher, mathematician and physicist René Descartes proposes an artificial universal language, with equivalent ideas in different tongues sharing one symbol:
"Et si quelqu’un avait bien expliqué quelles sont les idées simples qui sont en l’imagination des hommes, desquelles se compose tout ce qu’ils pensent, et que cela fût reçu par tout le monde, j’oserais espérer ensuite une langue universelle, fort aisée à apprendre, à prononcer et à écrire."
"The notion of a universal language was based upon the idea of precisely cataloging the elements of the human imagination. The great advantage of such a language would be that it would represent everything 'distinctement.' Yet, the great problem faced by someone who wanted to create such a language was the nature of the human imagination itself. Although separate from the mind and reason, which were the foundations of Cartesian thought, the imagination nevertheless played an important role for Descartes. As he wrote elsewhere in the Meditations, the imagination not only conceptualized external things but also considers them, 'as being present by the power and internal application of my mind.' Imagination, in other words, produced the illusion of presence, figures appearing so that can the person can 'look upon them as present with the eyes of my mind.' As a result, Descartes remains highly suspicious of the imagination because it can produce appearances that have no corresponding reality. Descartes concluded his letter to Mersenne by dismissing hopes for a universal language or a real character as only being possible in a 'terrestrial paradise' or 'fairyland' because of the confused nature of signification and the variation of human understanding.
"Mais n’espérez pas de la voir jamais en usage; cela présuppose de grands changements en l’ordre des choses, et il faudrait que tout le Monde ne fût qu’un paradis terrestre, ce qui n’est bon à proposer que dans le pays des romans.
"A universal language that would work at the level of the imagination, describing the actual 'things' of the external world, could only produce uniform results in the perfection of Eden or the ideal of fiction. One should, instead, stick with the institution of geometry as a method of rationalizing nature, a divine language grounded upon the cogito’s transmission of being. Descartes ultimately remains skeptical about any possibility of using alternative language games aside from mathematics in the project of rationalizing the world" (Batchelor, The Republic of Codes: Cryptographic Theory and Scientific Networks in the Seventeenth Century [1999] http://www.stanford.edu/dept/HPS/writingscience/Cryptography.html, accessed 01-22-2010).
Filed under: Artificial Intelligence, Linguistics / Translation / Speech, Mathematics / Logic | Bookmark or share this entry »
1650 – 1700
The Earliest Model for Machine Translation
1661
Physician and alchemist, Johann Joachim Becher, publishes Character, pro notitia linguarum universali in Frankfurt. This proposal for a universal language in numeric form may, to some extent, anticipate the idea of machine translation.
“Becher constructed a Latin dictionary that was almost ten times more vast (10,000 items). [...] For each item in Becher’s dictionary there is an Arabic number: the city of Zurich, for example, is designated by the number 10283. A second Arabic number refers the user to grammatical tables which supply verbal endings, the endings for the comparative and superlative forms of adjectives, or adverbial endings. A third number refers to case endings. The dedication 'Inventum Eminentissimo Principi' is written 4442. 2770:169:3. 6753:3, that is, '(My) Invention (to the) Eminent + superlative + dative singular, Prince + dative singular'. Unfortunately Becher was afraid that his system might prove difficult for peoples who did not know the Arabic numbers; he therefore thought up a system of his own for the direct visual representation of numbers. The system is atrociously complicated and almost totally illegible. [However, together with Gaspar Schott’s Technica curiosa (1664), Becher’s system has been seen] as tentative models for future practices of computer translation. In fact, it is sufficient to think of Becher’s pseudo-ideograms as instructions for electronic circuits, prescribing to a machine which path to follow through the memory in order to retrieve a given linguistic term, and we have a procedure for a word-for-word translation (with all the obvious inconveniences of such a merely mechanical program)’ (Umberto Eco, The Search for the Perfect Language, pp. 201–3).” See Bernard Quaritch Ltd., Logic and Language [PDF] Autumn 2008, number 1.
Filed under: Linguistics / Translation / Speech | Bookmark or share this entry »
The First Complete Bible Published in the Western Hemisphere
1661 –
1663
English puritan clergyman and missionary in Roxbury, Massachusetts John Eliot, and printers Samuel Green and Marmaduke Johnson in Cambridge, Massachusetts issue The Holy Bible: Containing the Old Testament and the New, Translated into the Indian Language.
This was the first complete edition of the bible published in the Western Hemisphere, and “the earliest example in history of the translation and printing of the entire Bible in a new language as a means of evangelization” (Darlow and Moule).
On July 27, 1649, the British Parliament enacted an "Ordinance for the Advancement of Civilization and Christianity Among the Indians." This act created The Society for the Propagation of the Gospel in New England, the first Protestant missionary society. Also in 1649 Eliot made the decision to attempt the translation of the Scriptures into the Algonquin language. Like other native American languages, Alogonquin had no written form, and it was considered one of the world's most difficult languages. The process of translation of the bible into the Natick dialect of the region's Algonquin tribes took Eliot ten years, with the assistance of John Sassamon, a member of the local tribe, whose ability to speak and write English proved invaluable.
“When the manuscript was ready for publication, the Society for the Propagation of the Gospel in New England not only provided the funds to print it, but they also sent an English printer by the name of Marmaduke Johnson, a printing press, and a supply of paper. Johnson arrived in the New World and set to work with Samuel Green who had already started to print the New Testament. By 1661 they had completed the printing of fifteen hundred copies of the New Testament. One thousand of the New Testaments were reserved for binding with the Old Testament, when completed, to form an entire Bible. The remaining copies of the New Testament were distributed among the Algonquin tribe or sent to England as presentation copies.
"When the task of printing the New Testament was complete, Green and Johnson began printing one thousand copies of the Old Testament, which included a translation of the Metrical Psalms. The work proceeded quickly and by 1663 the printing was finished. The Old Testaments were bound with the reserved copies of the New Testament to produce one thousand copies of the entire Bible” (Samworth, John Eliot and America's First Bible, accessed 12-30-2008).
Filed under: Book History, Linguistics / Translation / Speech, Printing / Typography, Religious Texts / Religion | Bookmark or share this entry »
A Universal Language Based on a Classification Scheme or Ontology
1668
John Wilkins publishes in London An Essay towards a Real Character and a Philosophical Language.
In this work Wilkins attempted to create a universal, artificial language, based upon an innovative classification of knowledge, by which scholars and philosophers as well as diplomats, scholars, and merchants, could communicate. Wilkins intended his "universal language" as a supplement to rather than a replacement for existing "natural" languages. His scheme has been called ingenious but completely unworkable.
By "real character" Wilkins meant:
"an ingeniously constructed family of symbols corresponding to an elaborate classification scheme developed at great labor by Wilkins and his colleagues, which was intended to provide elementary building blocks from which could be constructed the universe's every possible thing and notion. The Real Character is emphatically not an orthography in that it is not a written representation of oral speech. Instead, each symbol represents a concept directly, without (at least in the early parts of the Essay's presentation) there being any way of vocalizing it at all; each reader might, if he wished, give voice to the text in his or her own tongue. Inspiration for this approach came in part from (partially mistaken) accounts of the Chinese writing system.
"Later in the Essay Wilkins introduces his "Philospophical Language," which assigns phonetic values to the Real Characters, should it be desired to read text aloud without using any of the existing national languages. (The term philosophical language is an ill-defined one, used by various authors over time to mean a variety of things; most of the description found at the article on "philosophical languages" applies to Wilkins' Real Character on its own, even excluding what Wilkins called his "Philosophical Language")
"For convenience, the following discussion blurs the distinction between Wilkins' Character and his Language. Concepts are divided into forty main Genera, each of which gives the first, two-letter syllable of the word; a Genus is divided into Differences, each of which adds another letter; and Differences are divided into Species, which add a fourth letter. For instance, Zi identifies the Genus of “beasts” (mammals); Zit gives the Difference of “rapacious beasts of the dog kind”; Zitα gives the Species of dogs. (Sometimes the first letter indicates a supercategory— e.g. Z always indicates an animal— but this does not always hold.) The resulting Character, and its vocalization, for a given concept thus captures, to some extent, the concept's semantics.
"The Essay also proposed ideas on weights and measure similar to those later found in the metric system. The botanical section of the essay was contributed by John Ray; . . .
"Jorge Luis Borges wrote a critique of Wilkins' philosophical language in his essay El idioma analítico de John Wilkins (The Analytical Language of John Wilkins). He compares Wilkins’ classification to the fictitious Chinese encyclopedia Celestial Emporium of Benevolent Knowledge, expressing doubts about all attempts at a universal classification. Modern information theory also suggests that it is a bad idea to have words with similar but distinct meanings also sound similar, because mishearings and the resulting confusion would be much more prominent than in real-world languages. In The Search for the Perfect Language, Umberto Eco catches Wilkins himself making this kind of mistake in his text, using Gαde (barley) instead of Gαpe (tulip)" (Wikipedia article on An Essay towards a Real Character and a Philosophical Language, accessed 06-16-2010).
Filed under: Linguistics / Translation / Speech, Organization of Information / Taxonomy, Preservation & Conservation of Information, Science | Bookmark or share this entry »
Leibniz on Binary Arithmetic
March 15, 1679 –
1705
A dated manuscript by Gottfried Wilhelm Leibniz, preserved in the Niedersachsische Landesbibliothek, Hannover, “includes a brief discussion of the possibility of designing a mechanical binary calculator which would use moving balls to represent binary digits.”
Though Leibniz thought of the application of binary arithmetic to computing in 1679, the machine he outlined was never built, and he published nothing on the subject until his Explication de l'arithmétique binaire, qui se sert des seuls caracteres 0 & 1; avec des remarques sur son utilité, & sur ce qu'elle donne le sens des anciens figues Chinoises de Fohy' published in Histoire de l'Académie Royale des Sciences année MDCCIII. Avec les mémoires de mathématiques which appeared in print in 1705.
"The publication of the Explication was prompted by Leibniz's correspondence with Joachim Bouvet, a member of the Jesuit Mission in China. Leibniz had developed an interest in China, and in April 1697 he edited a collection of letters and essays by members of the Mission, entitled Novissima Sinica. A copy of this came into the hands of Bouvet, who wrote to Leibniz on 18 October 1697 expressing his commendation of the work. Thus began an extended correspondence between the two men which proved to be very important for the dissemination of Leibniz's ideas about binary arithmetic. The crucial exchange began on 15 February 1701, when Leibniz wrote to Bouvet describing for his correspondent the principles of his binary arithmetic, including the analogy of the formation of all the numbers from 0 and 1 with the creation of the world by God out of nothing. Bouvet immediately recognised the relationship between the hexagrams of the I ching and the binary numbers and he communicated his discovery in a letter written in Peking on 4 November 1701. This reached Leibniz, after a detour through England, on 1 April 1703. With this letter, Bouvet enclosed a woodcut of the arrangement of the hexagrams attributed to Fu-Hsi, the mythical founder of Chinese culture, which holds the key to the identification. Within a week of receiving Bouvet's letter, Leibniz had sent to Abbé Bignon for publication in the Mémoires of the Paris Academy his Explication de l'Arithmétique binaire,... & sue ce qu'elle donne le sens des anciens figures Chinoises de Fohy. Ten days later he sent a brief account to Hans Sloane, the Secretary of the Royal Society. Leibniz viewed binary arithmetic less as a computational tool than as a means of discovering mathematical, philosophical and even theological truths. He remarked to Tschirnhaus in 1682 that he anticipated from the use of binary numbers discoveries in number theory that other progressions could not reveal. It was at the same time a candidate for the characteristica generalis, his long sought-for alphabet of human thought. With base 2 numeration Leibniz witnessed a confluence of several intellectual strands in his world view, including theological and mystical ideas of order, harmony and creation. Fontanelle, secretary of the Paris Academy, wrote the unsigned review of Liebniz's paper for the Mémoires section of the volume. He noted that arithmetic could have different bases besides ten; bases such as 12, and two as in the case of Leibniz's binary system. He also noted that although the binary system was not practical for common use Leibniz thought that it would be of advantage in advanced mathematics" (W.P. Watson, antiquarian book description, http://www.ilabdatabase.com/db/detail.php?booknr=360538539, accessed 01-21-2010).
This manuscript was first published, along with as well as facsimiles of Leibniz's "Explication de l'arithmétique binaire" (1705) and his two letters to Johann Christian Schulenberg on binary arithmetic (March 29 and May 17, 1698), published in the Opera Omnia of 1768, with historical articles and translations in German, to commemorate the 250th anniversary of Leibniz's death as Herrn von Leibniz' Rechnung mit Null und Eins (1966).
Filed under: Computer & Calculator Design / Architecture, Computing Theory, Data Processing / Computing, Linguistics / Translation / Speech, Mathematics / Logic | Bookmark or share this entry »
1750 – 1800
Foundation of Comparative Linguistics
February 2, 1786
Philologist William Jones delivers The third anniversary discourse . . . [On the Hindus].
This was first published in 1788 in Volume One of Asiatick Researches: Or, Transactions of the Society Instituted in Bengal, for Inquiring into the History and Antiquities, the Arts, Sciences and Literature, of Asia. In his paper, printed in India in the English language, Jones announced his discovery of the relationship between the Sanskrit, Greek, Latin, Gothic and Celtic languages, marking the foundation of comparative philology and historical linguistics. Jones’s “clear understanding of the basic principles of scientific linguistics provided the foundations on which Rask, Bopp and Grimm built the imposing structure of comparative Indo-European studies” (Carter & Muir, Printing and the Mind of Man [1967]) no. 235).
Filed under: Linguistics / Translation / Speech | Bookmark or share this entry »
The First Successful Speech Synthesizer
1791
Austro-Hungarian author and inventor, Wolfgang von Kempelen, publishes in Vienna Mechanismus der mensclichen Sprache nebst Beschreibung seiner sprechenden Maschine, in which he discusses the origins and development of languages, and describes the first successful speech synthesizer.
Unlike von Kempelen’s fraudulent chess-playing Turk automaton (1769, and noticed in this database), Kempelin's speech synthesizer actually worked. Kempelen's synthesizer was the first that produced not only some speech sounds, but also whole words and short sentences. He believed that it was possible to acquire skill in using the machine within three weeks, especially if one chose to synthesize sentences in Latin, French, or Italian. German von Kempelen considered much more difficult to synthesize because of its many closed syllables and consonant clusters.
"The machine consisted of a bellows that simulated the lungs and was to be operated with the right forearm (uppermost drawing). A counterweight provided for inhalation. The middle and lower drawings show the 'wind box' that was provided with some levers to be actuated with the fingers of the right hand, the 'mouth', made of rubber, and the 'nose' of the machine. The two nostrils had to be covered with two fingers unless a nasal was to be produced. The whole speech production mechanism was enclosed in a box with holes for the hands and additional holes in its cover.
"The air flow was conducted into the mouth not only by way of an oscillating reed, but also through a narrow shunting tube. This allowed the air pressure in the mouth cavity to increase when its opening was covered tightly in order to produce unvoiced speech sounds. Driven by a spring, a small auxiliary bellows would then deliver an extra puff of air at the release.
"With the left hand, it was also possible to control the resonance properties of the mouth by varied covering of its opening. In this way, some vowels and consonants could be simulated in sufficient approximation. This was not really a simulation of natural articulation, since the shape of the mouth of the machine in itself remained constant. Some vowels and, especially, the consonants [d t g k] could not be simulated in this way, but only feigned, at best. An [l] could be produced by putting the thumb into the mouth.
"The function of the vocal cords was simulated by a slamming reed made of ivory (leftmost drawing). Although the effective length of the reed could be varied, this could not be done during speech production, so that the machine spoke on a monotone.
"Two of the levers to be actuated with the right hand served the production of the fricatives [s] and . . . as well as [z] and . . . by means of separate, hissing whistles (right drawing). A third one effectuated the production of a rattling [R] by dropping a wire on the vibrating reed (middle drawing)." (http://www.ling.su.se/staff/hartmut/kemplne.htm, accessed 12-14-2008).
Kempelin's final version of the machine, which differs slightly from the version shown in the book, is preserved in the Deutsches Museum, in the department of musical instruments.
Because Kempelin's speech synthesizer required a human for its operation it was not literally an automation but may be thought of as a forerunner of robotic or computer speech synthesizers.
Filed under: Games / Simulations , Linguistics / Translation / Speech, Music , Robotics / Automata, Science, Technology | Bookmark or share this entry »
The Rosetta Stone
July 15, 1799
Captain Pierre-François Bouchard, with Napoleon in Egypt, discovers a dark granite stone near the city of Rosetta on which are carved a decree from the Ptolemaic period 196 BCE passed by a council of priests— one of a series that affirm the royal cult of the 13-year-old Ptolemy V on the first anniversary of his coronation. The decree is written in Egyptian Demotic script (the native script used for daily purposes), classical Greek (the language of the administration), and Egyptian hieroglyphs (suitable for a priestly decree).
Known as the Rosetta Stone, the stone was forfeited to the English in 1801 under the terms of the Treaty of Alexandria. In 1802 it was placed in the British Museum, where it remains.
Filed under: Archaeology, Cryptography / Cryptanalysis, Linguistics / Translation / Speech, Survival of Information | Bookmark or share this entry »
1800 – 1850
Phasing Out Latin as the International Language
1800
Around this time publication of scientific and medical books in Latin— the international language of scholarship, religion, and science since the Roman Empire— for the most part ceased. From the nineteenth century onward most scientific and medical books were published in their vernacular language of authorship, or in French, German or English.
Filed under: Book History, Linguistics / Translation / Speech, Medicine, Publishing, Science | Bookmark or share this entry »
Deciphering the Hieroglyphs
1822
Having examined texts brought back from Egypt, Jean-Francois Champollion publishes Lettre a M. d'Acier relative à l'alphabet des hiéroglyphes phonétiques, in which he begins to identify a relationship between hieroglyphic and non-hieroglyphic scripts, deciphering Egyptian hieroglyphs, the meaning of which had been lost for over 1500 years.
Filed under: Archaeology, Cryptography / Cryptanalysis, Linguistics / Translation / Speech | Bookmark or share this entry »
Deciphering the Hieroglyphs
1823
English physician, scientist and polymath Thomas Young publishes An Account of Some Recent Discoveries in Hieroglyphical Literature, and Egyptian Antiquities.
"Young was also one of the first who tried to decipher Egyptian hieroglyphs, with the help of a demotic alphabet of 29 letters built up by Johan David Åkerblad in 1802 (15 turned out to be correct), but Åkerblad wrongly believed that demotic was entirely alphabetic. 'Dr Young however showed that neither the alphabet of Akerblad, nor any modification of it which could be proposed, was applicable to any considerable part of the enchorial portion of the Rosetta inscription beyond the proper names.' By 1814 Young had completely translated the "enchorial" (demotic, in modern terms) text of the Rosetta Stone (he had a list with 86 demotic words), and then studied the hieroglyphic alphabet but initially failed to recognise that the demotic and hieroglyphic texts were paraphrases and not simple translations. Some of Young's conclusions appeared in the famous article "Egypt" he wrote for the 1818 edition of the Encyclopædia Britannica.
"When the French linguist Jean-François Champollion in 1822 published a translation of the hieroglyphs and the key to the grammatical system, Young (and many others) praised his work. In 1823 Young published an Account of the Recent Discoveries in Hieroglyphic Literature and Egyptian Antiquities, in order to have his own work recognised as the basis for Champollion's system. In this he made it clear that many of his findings had been published and sent to Paris in 1816. Young had correctly found the sound value of six signs, but had not deduced the grammar of the language. Champollion was unwilling to share the credit. In the ensuing schism, strongly motivated by the political tensions of that time, the British championed Young, while the French supported Champollion. Champollion maintained that he alone had deciphered the hieroglyphs, although his understanding of the hieroglyphic grammar showed the same mistakes made by Young. However, after 1826, when Champollion was a curator in the Louvre he did offer Young access to demotic manuscripts" (Wikipedia article on Thomas Young, accessed 07-28-2009).
Filed under: Archaeology, Linguistics / Translation / Speech | Bookmark or share this entry »
Decipherment of the Mayan System of Counting
1832
From a reproduction of just five pages of the Dresden Codex, a pre-Columbian Maya book of the eleventh or twelfth century of the Yucatecan Maya in Chichén Itzá, European-American autodidact polymath, mathematician, botanist, zoologist, and malachologist Constantine Samuel Rafinesque deciphers the Maya's system of numerals.
"In 1832, Rafinesque declared in his newsletter, the Atlantic Journal and Friend of Knowledge, that the dots and bars seen in Maya glyphs represented simple numbers—a dot equaled one and a bar five. Later findings proved him right and also revealed that the Maya even had a symbol for zero, which appeared on Mesoamerican carvings as early as 36 B.C. (Zero didn't appear in Western Europe until the 12th century)" (http://www.pbs.org/wgbh/nova/mayacode/time-flash.html, accessed 10-10-2009).
Filed under: Archaeology, Linguistics / Translation / Speech, Mathematics / Logic | Bookmark or share this entry »
The First Book on a Secular Subject Printed in Arabic by a Press in the Arab World
1836
A pocket-sized Arabic grammar, the first book on a secular (non-religious) subject, is issued from the American Press, in Beirut, Lebanon in an edition of 1000 copies.
The work by Nasif al-Yaziji, Kitab fasl al-khitab fi usul lughat al-a'rab (The Conclusive Discouse of the Rules of the Arab's Language)
". . . was printed by the Protestant missionaries of the 'American Board of Commissioners for Foreign Missions' (ABCFM) who had opened a printing shop in Beirut two years earlier in 1834. The author of the concise treatise on Arabic grammar was Nasif al-Yaziji (1800-1871) a local Greek Catholic scholar from a little village south of Beirut who later became one of the most celebrated Christian Arab authors of the nineteenth century. With his numerous philological works, but moreover with his poetry and rhyming prose he influenced a whole generation of Arab intellectuals and thus became a pioneer and outstanding protagonist of the so call Nahda, the renaissance of Arabic language and literature" (Lehrstuhl für Türkische Sprache, Geschichte und Kultur, Universität Bamberg, The Beginnings of Printing in the Near and Middle East: Jews, Christians and Muslims [2001] no. 5).
Filed under: Book History, Linguistics / Translation / Speech, Printing / Typography | Bookmark or share this entry »
1850 – 1875
The Largest Dictionary in Book Form
1863
The first fascicule (A-Aanhaling) of the Woordenboek der Nederlandsche Taal (English: "Dictionary of the Dutch language") is published during this year.
This became the largest dictionary in the world in print, eventually containing over 430,000 entries of Dutch words from 1500 to 1921 in 43 volumes and close to 50,000 pages. The last fasciculde (Zuid-Zythum) was published in 1998. Three supplements containing modern Dutch words were published in 2001.
Since 27 January 2007, the dictionary has been available online. There is no charge for access but registration is required.
Filed under: Linguistics / Translation / Speech, Organization of Information / Taxonomy, Publishing | Bookmark or share this entry »
1875 – 1900
3,500,000 Quotations on Individual Slips of Paper
1882
James Murray, working in a corrugated out-building called "The Scriptorium," lined with book shelves and 1,029 pigeon-holes for quotation slips, is receiving 1000 quotation slips each day from contributors to the A New English Dictionary on Historical Principles.
By this year Murray had accumulated 3,500,000 quotations sent in by contributors, each on an individual slip of paper.
Filed under: Book History, Linguistics / Translation / Speech, Organization of Information / Taxonomy, Publishing | Bookmark or share this entry »
The O E D Finally Begins Publication
February 1, 1884
Twenty-three years after the project began, the first fascicule of A New English Dictionary on Historical Principles; Founded Mainly on the Materials Collected by The Philological Society is published, under the editorship of James Murray.
The 352-page volume, covering words from A to Ant, cost 12s.6d or U.S.$3.25. The total sales of this fascicule were 4000 copies. The dictionary was complete in 125 fascicules, the last of which was published on April 19, 1928. The name Oxford English Dictionary was first used for the work in 1895.
Filed under: Book History, Linguistics / Translation / Speech, Publishing | Bookmark or share this entry »
1930 – 1940
The First Electronic Speech Synthesizer
1936 –
1939
Homer Dudley and a team of engineers at Bell Labs produce the first electronic speech synthesizer, called the Voder.
The Voder was demonstrated at the 1939 World's Fair by experts who used a keyboard and foot pedals to play the machine and emit speech.
Filed under: Communication, Electronic Media, Games / Simulations , Linguistics / Translation / Speech, Technology | Bookmark or share this entry »
1940 – 1945
Does Language Influence Thought?
April 1940
American chemist, anthropologist and linguist Benjamin Lee Whorf publishes "Science and Linguistics," M.I.T.'s Technological Review, 42: no. 6 (April, 1940) 229-231, 247-248, in which he develops controversial ideas concerning linguistic relativity— the hypothesis that language influences thought.
Filed under: Linguistics / Translation / Speech | Bookmark or share this entry »
1945 – 1950
Earliest Work Leading toward Machine Translation
1947
Working at the Princeton IAS machine, Andrew D. Booth and Kathleen Britten write a program for realizing a translation dictionary on an electronic computing machine, provided that the necessary storage capacity is available. This may be the earliest work leading toward machine or computer translation.
Filed under: Linguistics / Translation / Speech, Software | Bookmark or share this entry »
Nineteen Eighty-Four
1949
Eric Arthur Blair, under his pseudonym, George Orwell, publishes the dystopian novel, Nineteen Eighty-Four. "The story follows the life of one seemingly insignificant man, Winston Smith, a civil servant assigned the task of falsifying records and political literature, thus effectively perpetuating propaganda, who grows disillusioned with his meagre existence and so begins an ultimately futile rebellion against the system.
"The novel has become famous for its satirical portrayal of surveillance and society's increasing encroachment on the rights of the individual. Since its publication the terms Big Brother and Orwellian have entered the popular vernacular."
"Nineteen Eighty-Four's impact upon the English language is extensive; many of its concepts: Big Brother, Room 101 (the worst place in the world), the Thought Police, the memory hole (oblivion), doublethink (simultaneously holding and believing two contradictory beliefs), and Newspeak (ideological language), are common usages for denoting and connoting overarching, totalitarian authority; Doublespeak is an elaboration of doublethink; the adjective "Orwellian" denotes that which is characteristic and reminiscent of George Orwell's writings, specifically 1984. The practice of appending the suffixes "-speak" and "-think" (groupthink, mediaspeak) to denote unthinking conformity. Many other works, in various forms of media, have taken themes from Nineteen Eighty-four" (Wikipedia article on Nineteen Eighty-Four).
Filed under: Censorship , Destruction / Looting of Information, Fiction, Science Fiction, Drama, Poetry, Freedom / Privacy / Security , Linguistics / Translation / Speech, Popular Culture | Bookmark or share this entry »
The Origin of Statistical Machine Translation
July 15, 1949
Mathematician Warren Weaver, a student of Claude Shannon's information theory, circulates a memorandum entitled Translation, suggesting that language translation by computer might be possible.
Weaver's memorandum has been called the origin of statistical machine translation.
(See Reading 10.1.)
Filed under: Linguistics / Translation / Speech | Bookmark or share this entry »
1950 – 1955
Decipherment of Linear B
1952 –
1953
English architect and classical scholar Michael Ventris and John Chadwick, an English linguist and classical scholar, decipher Linear B, proving that this Mycenaean language is an early form of Greek.
Ventris & Chadwick, Documents in Mycenaean Greek (1956), chapters 1-2.
Chadwick, The Decipherment of Linear B (1958).
Filed under: Archaeology, Cryptography / Cryptanalysis, Linguistics / Translation / Speech | Bookmark or share this entry »
The Georgetown-IBM Experiment in Machine Translation
January 7, 1954
Developed jointly by Georgetown University and IBM, the Georgetown-IBM experiment in computational linguistics involved completely automatic translation of more than sixty Russian sentences into English.
"Conceived and performed primarily in order to attract governmental and public interest and funding by showing the possibilities of machine translation, it was by no means a fully-featured system: It had only six grammar rules and 250 items in its vocabulary. Apart from general topics, the system was specialised in the domain of organic chemistry. The translation was done using a IBM 701 mainframe computer.
"Well publicized by journalists and perceived as a success, the experiment did encourage governments to invest in computational linguistics. The authors claimed that within three or five years, machine translation would be a solved problem."
Filed under: Linguistics / Translation / Speech | Bookmark or share this entry »
1955 – 1960
Chomsky's Hierarchy of Syntactic Forms
September 1956
Noam Chomsky publishes "Three Models for the Description of Language" in IRE Transactions on Information Theory IT-2 113-24.
In this work read at a symposium on information theory held at MIT a few months before the publication of his Syntactic Structures (1957), Chomsky introduced two key concepts— 'Chomsky's hierarchy' of syntactic forms, and transformational-generative grammar theory. The latter attempts to define rules that can generate the infinite number of grammatical (well-formed) sentences possible in a language, and works to identify rules (transformations) that govern relations between parts of a sentence, on the assumption that beneath such aspects as word order a fundamental deep structure exists.
Hook & Norman, Origins of Cyberspace (2002) no. 531.
Filed under: Artificial Intelligence, Communication / Information Theory, Linguistics / Translation / Speech | Bookmark or share this entry »
Human Versus Machine Intelligence and Communication
1959
"Somewhat the same problem arises in communicating with a machine entity that would arise in communicating with a person of an entirely different language background than your own. A system of logical definition and translation would have to be available. In order that meanings should not be lost, such a system of translation would also need to be precise. We are all familiar with the unhappy results of language translations which are either lacking in precision or where suitable words of equivalent meaning cannot be found. Likewise, translating into a machine language cannot be anything but an exact operation. Machines even more than people must be addressed with clarity and unambiguity, for machines cannot improvise on their own or imagine that about which they have not been specifically informed, as a human might do within reasonable limits of error. . . .
"We must now ascertain how concepts are formulated within the framework of computer language. For analogy, let us first consider the manner in which instructions are usually given to a non-mechanical entity. When we instruct, for example, a human being, we are aided by the fact that the human is usually able to fill in gaps in our instructions through acumen acquired from his own past experiences. It is seldom necessary that instructions be either detailed or literal, although we may have lost sight of this fact.
"The computer in a correlate example is a mechanical 'being' which must be instructed at each and every step. But it can be given a very long list of instructions upon which it can be expected to subsequently act with great speed and accuracy and with untiring repetition. Machine traits are: low comprehension, high retention, extreme reliability, and tremendous speed. The use of superlatives here to describe these traits is not exaggerative. Since speed becomes in practice the equivalent of number, the machine might be, and has sometimes been, equated to legions — an army, if you will — of lowgrade morons whose conceptualization is entirely literal, who remember as long as is necessary or as you desire them to, whose loyalty and subservience is complete, who require no holidays, no spurious incentives, no morale programs, pensions, not even gratitude for past service, and who seemingly never tire of doing elementary repetitive tasks such as typing, accounting, bookkeeping, arithmetic, filling in forms, and the like. In about all these respects the machine may be seen to be the exact opposite of nature's loftiest creature, the intellligent human being, who becomes bored with the petty and repetitious, who is unreliable, who wanders from the task for the most trivial reasons, who gets out of humor, who forgets, who requires constant incentives and rewards, who improvises on his own even when to do so is impertinent to the objectives being undertaken, and who in summary (let's face it) is unsuitable to most forms of industry as the latter are ideally and practically conceived in our times. It becomes apparent in retrospect that the only excuse we might ever have had for employing him to do many of civilization's more literal and repetitious tasks was the absence of something more efficient with which to replace him!
"It is not the purpose of this volume to explore further the ramifications of the above statements of fact. . . ."(Nett & Hetzler, An Introduction to Electronic Data Processing [1959] 86-88).
Filed under: Communication, Computers & the Human Brain, Human-Computer Interaction, Linguistics / Translation / Speech | Bookmark or share this entry »
Origins of Corpus Linguistics
1959
Randolph Quirk founds the Survey of English Usage, the first research center in Europe to carry out research in corpus linguistics.
"The original Survey Corpus predated modern computing. It was recorded on reel-to-reel tapes, transcribed on paper, filed in filing cabinets, and indexed on paper cards. Transcriptions were annotated with a detailed prosodic and paralinguistic annotation developed by Crystal and Quirk (1964) Sets of paper cards were manually annotated for grammatical structures and filed, so, for example, all noun phrases could be found in the noun phrase filing cabinet in the Survey. Naturally, corpus searches required a visit to the Survey.
"This corpus is now known more widely as the London-Lund Corpus (LLC), as it was the responsibility of co-workers in Lund, Sweden, to computerise the corpus" (Wikipedia article on Survey of English Usage, accessed 06-07-2010).
Filed under: Linguistics / Translation / Speech | Bookmark or share this entry »
First Formal Definition of Hacker
June 1959
Peter R. Samson, Public Relations Committee of the MIT Tech Model Railroad Club, defines the term "hacker" in the Tech Model Railroad Club Dictionary as:
"1) an article or project without constructive end
"2) a project undertaken on bad self-advice
"3) an entropy booster
"4) to produce, or attempt to produce, a hack(3)."
Samson defined hacker is defined as "one who hacks, or makes them."
Much of the Tech Model Railroad Club jargon was later incorporated into early computer culture. In 2005 Samson commented:
"I saw this as a term for an unconventional or unorthodox application of technology, typically deprecated for engineering reasons. There was no specific suggestion of malicious intent (or of benevolence, either). Indeed, the era of this dictionary saw some 'good hacks:' using a room-sized computer to play music, for instance; or, some would say, writing the dictionary itself" (http://www.gricer.com/tmrc/dictionary1959.html, accessed 06-01-2009).
Filed under: Computer / Internet Culture, Linguistics / Translation / Speech | Bookmark or share this entry »
1960 – 1970
The Viterbi Algorithm
1967
Italian-American electrical engineer and businessman Andrew Viterbi develops the Viterbi algorithm, "as an error-correction scheme for noisy digital communication links, finding universal application in decoding the convolutional codes used in both CDMA and GSM digital cellular, dial-up modems, satellite, deep-space communications, and 802.11 wireless LANs. It is now also commonly used in speech recognition, keyword spotting, computational linguistics, and bioinformatics. For example, in speech-to-text (speech recognition), the acoustic signal is treated as the observed sequence of events, and a string of text is considered to be the "hidden cause" of the acoustic signal. The Viterbi algorithm finds the most likely string of text given the acoustic signal" (Wikipedia article on Viterbi algorithm, accessed 12-29-2009).
Filed under: Linguistics / Translation / Speech, Mathematics / Logic, Software , Telecommunications, Telephone | Bookmark or share this entry »
"Computational Analysis of Present-Day American English"
1967
Henry Kucera (born Jindřich Kučera) and Nelson Francis publish Computational Analysis of Present-Day American English.
A founding work on corpus linguistics, this book "provided basic statistics on what is known today simply as the Brown Corpus. The Brown Corpus was a carefully compiled selection of current American English, totaling about a million words drawn from a wide variety of sources. Kucera and Francis subjected it to a variety of computational analyses, from which they compiled a rich and variegated opus, combining elements of linguistics, psychology, statistics, and sociology" (Wikipedia article on Brown Corpus, accessed 06-07-2010)./
Filed under: Linguistics / Translation / Speech, Social / Political , Statistics / Demography | Bookmark or share this entry »
The First Dictionary Based on Corpus Linguistics
1969
Houghton Mifflin publishes The American Heritage Dictionary of the English Language.
"The AHD broke ground among dictionaries by using corpus linguistics for compiling word-frequencies and other information. It took the innovative step of combining prescriptive information (how language should be used) and descriptive information (how it actually is used). The descriptive information was derived from actual texts. Citations were based on a million-word, three-line citation database[the Brown Corpus] prepared by Brown University linguist Henry Kucera" (Wikipedia article on The American Heritage Dictionary of the English Language, accessed 06-07-2010).
Filed under: Linguistics / Translation / Speech, Publishing | Bookmark or share this entry »
1980 – 1990
The Perseus Digital Library Project
1985
The Perseus Digital Library Project begins at Tufts University. Though the project is ostensibly about Greek and Roman literature and culture, it will evolve into an exploration of the ways that digital collections can enhance scholarship with new research tools that take libraries and scholarship beyond the physical book.
"Since planning began in 1985, the Perseus Digital Library Project has explored what happens when libraries move online. Two decades later, as new forms of publication emerge and millions of books become digital, this question is more pressing than ever. Perseus is a practical experiment in which we explore possibilities and challenges of digital collections in a networked world.
"Our flagship collection, under development since 1987, covers the history, literature and culture of the Greco-Roman world. We are applying what we have learned from Classics to other subjects within the humanities and beyond. We have studied many problems over the past two decades, but our current research centers on personalization: organizing what you see to meet your needs.
"We collect texts, images, datasets and other primary materials. We assemble and carefully structure encyclopedias, maps, grammars, dictionaries and other reference works. At present, 1.1 million manually created and 30 million automatically generated links connect the 100 million words and 75,000 images in the core Perseus collections. 850,000 reference articles provide background on 450,000 people, places, organizations, dictionary definitions, grammatical functions and other topics."
Filed under: Electronic Media, Indexing & Seaching Information, Linguistics / Translation / Speech, Preservation & Conservation of Information | Bookmark or share this entry »
WordNet
1985
Psychologist and cognitive scientist George A. Miller and team begin development of WordNet, a lexical database for the English language.
WordNet "groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications" (Wikipedia article on WordNet). You can browse Wordnet at http://wordnet.princeton.edu/.
WordNet has been used for a number of different purposes in information systems, including word sense disambiguation, information retrieval, automatic text classification, automatic text summarization, and even automatic crossword puzzle generation.
Filed under: Artificial Intelligence, Computers & the Human Brain, Linguistics / Translation / Speech, Organization of Information / Taxonomy | Bookmark or share this entry »
Critique of Computational Linguistics
1987
Integrational linguist Roy Harris publishes The Language Machine.
"This volume completes the trilogy which began with The Language-Makers (1980) and The Language Myth (1981). The Language Machine examines the impact of the electronic computer on modern conceptions of language and communication. When Swift wrote Gulliver’s Travels the notion that a machine could handle language was an absurdity to be satirized. Descartes regarded it as foolish to suppose that a robot could ever be built that would answer questions. But today it is widely assumed that mechanical speech recognition and automatic translation will be commonplace in tomorrow’s technology. Underlying these assumptions is a subtle shift in popular and academic conceptions of what a language is. Understanding a sentence is treated as a computational process. This in turn contributes powerfully to accepting a mechanistic view of human intelligence, and to the insulation of language from moral values" (http://www.royharrisonline.com/linguistic_publications/The_Language-machine.html, accessed 07-23-2010).
Filed under: Communication, Linguistics / Translation / Speech | Bookmark or share this entry »
The Unicode Universal Character Set
August 29, 1988
Joseph D. Becker of Xerox Corporation, Lee Collins (also at Xerox) and Mark Davis of Apple develop a universal character set.
Becker coined the word "Unicode" to cover the project in his report, Unicode 88:
"1.1. Abstract
"This document is a draft proposal for the design of an international/multilingual text character coding system, tentatively called Unicode.
"Unicode is intended to address the need for a workable, reliable world text encoding. Unicode could be roughly described as 'wide-body ASCII' that has been stretched to 16 bits to encompass the characters of all the world's living languages. In a properly engineered design, 16 bits per character are more than sufficient for this purpose.
"In the Unicode system, a simple unambiguous fixed-length character encoding is integrated into a coherent overall architecture of text processing. The design aims to be flexible enough to support many disparate (vendor-specific) implementations of text processing software.
"A general scheme for character code allocations is proposed (and materials for making specific individual character code assignments are well at hand), but specific code assignments are not proposed here. Rather, it is hoped that this document will evoke interest from many organizations, which could cooperate in perfecting the design and in determining the final character code assignments" (http://www.unicode.org/history/unicode88.pdf, accessed 01-29-2010).
Filed under: Cryptography / Cryptanalysis, Internet & Networking , Linguistics / Translation / Speech, Printing / Typography | Bookmark or share this entry »
1990 – 2000
The Unicode Standard: Now 107,000 Charcters in 90 Scripts
October 1991
The first volume of the Unicode standard is published by the Unicode Consortium.
"Unicode is a computing industry standard allowing computers to consistently represent and manipulate text expressed in most of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard, the latest version [5.2, 2009] of Unicode consists of a repertoire of more than 107,000 characters covering 90 scripts [including Egyptian hieroglyphs] a set of code charts for visual reference, an encoding methodology and set of standard character encodings, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering, and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic or Hebrew, and left-to-right scripts) " (Wikipedia article on Unicode, accessed 01-29-2010).
Filed under: Data Processing / Computing, Linguistics / Translation / Speech, Printing / Typography, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
Development of Neural Networks
1993
Psychologist, neural scientist and cognitive scientist James A. Anderson publishes "The BSB Model: A simple non-linear autoassociative network," M. Hassoun (Ed), Associative Neural Memories: Theory and Implementation (1993).
Anderson's neural networks have been applied to models of human concept formation, decision making, speech perception, and models of vision.
Anderson, J. A., Spoehr, K. T. and Bennett, D.J. "A study in numerical perversity: Teaching arithmetic to a neural network," D.S. Levine and M. Aparicio (Eds.) Neural Networks for Knowledge Representation and Inference, (1994).
Filed under: Artificial Intelligence, Computers & the Human Brain, eCommerce, Human-Computer Interaction, Indexing & Seaching Information, Linguistics / Translation / Speech | Bookmark or share this entry »
Statistical Machine Translation
1993
Peter F. Brown and colleagues at IBM's TJ Watson Research Center publish "The Mathematics of Statistical Machine Translation: Parameter Estimation," Computational Linguistics, 19 (2) 263-311:
"We describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another. We define a concept of word-by-word alignment between such pairs of sentences. For any given pair of such sentences each of our models assigns a probability to each of the possible word-by-word alignments. We give an algorithm for seeking the most probable of these alignments. Although the algorithm is suboptimal, the alignment thus obtained accounts well for the word-by-word relationships in the pair of sentences. We have a great deal of data in French and English from the proceedings of the Canadian Parliament. Accordingly, we have restricted our work to these two languages; but we,feel that because our algorithms have minimal linguistic content they would work well on other pairs of languages. We also feel, again because of the minimal linguistic content of our algorithms, that it is reasonable to argue that word-by-word alignments are inherent in any sufficiently large bilingual corpus."
"The first ideas of statistical machine translation were introduced by Warren Weaver in 1949, including the ideas of applying Claude Shannon's information theory. Statistical machine translation was re-introduced in 1991 by researchers at IBM's Thomas J. Watson Research Center and has contributed to the significant resurgence in interest in machine translation in recent years. Nowadays it is by far the most widely-studied machine translation method" (Wikipedia article on Statistical machine translation, accessed 05-14-2010).
Filed under: Linguistics / Translation / Speech | Bookmark or share this entry »
Speech Recognition Technology from 6,700 Characters
1996
IBM introduces continuous speech recognition technology for Mandarin Chinese. In developing the product, researchers identified and classified thousand of vocal tones and homonyms, created an algorithm that deconstructs syllables into parts, and developed a new language model to transform spoken words into the right combination drawn from 6,700 Chinese characters.
IBM also announces software that gives people a hands-free way to dictate text and navigate the desktop with the power of natural speech.
Filed under: Linguistics / Translation / Speech, Software | Bookmark or share this entry »
Using Neural Networks for Word Sense Disambiguation
1998
Cognitive scientist / entrepeneur Jeffrey Stibel, physicist, psychologist, neural scientist James A. Anderson, and others create a word sense disambiguator using George A. Miller's WordNet lexical database.
Stibel and others applied this technology in Simpli, "an early search engine that offered disambiguation to search terms. A user could enter in a search term that was ambiguous (e.g., Java) and the search engine would return a list of alternatives (coffee, programming language, island in the South Seas)."
"The technology was rooted in brain science and built by academics to model the way in which the mind stored and utilized language."
"Simpli was sold in 2000 to NetZero. Another company that leveraged the Simpli WordNet technology was purchased by Google and they continue to use the technology for search and advertising under the brand Google AdSense.
"In 2001, there was a buyout of the company and it was merged with another company called Search123. Most of the original members joined the new company. The company was later sold in 2004 to ValueClick, which continues to use the technology and search engine to this day" (Wikipedia article on Simpli, accessed 05-10-2009).
Filed under: Artificial Intelligence, Computers & the Human Brain, eCommerce, Linguistics / Translation / Speech, Organization of Information / Taxonomy | Bookmark or share this entry »
2000 – 2005
2005 – 2010
IBM's Watson Question Answering System Challenges Humans at Jeopardy
April 27, 2009
IBM's Watson Question Answering (QA) System will challenge humans in the television quiz show Jeopardy!
"IBM is working to build a computing system that can understand and answer complex questions with enough precision and speed to compete against some of the best Jeopardy! contestants out there.
"This challenge is much more than a game. Jeopardy! demands knowledge of a broad range of topics including history, literature, politics, film, pop culture and science. What's more, Jeopardy! clues involve irony, riddles, analyzing subtle meaning and other complexities at which humans excel and computers traditionally do not. This, along with the speed at which contestants have to answer, makes Jeopardy! an enormous challenge for computing systems. Code-named "Watson" after IBM founder Thomas J. Watson, the IBM computing system is designed to rival the human mind's ability to understand the actual meaning behind words, distinguish between relevant and irrelevant content, and ultimately, demonstrate confidence to deliver precise final answers.
"Known as a Question Answering (QA) system among computer scientists, Watson has been under development for more than three years. According to Dr. David Ferrucci, leader of the project team, 'The confidence processing ability is key to winning at Jeopardy! and is critical to implementing useful business applications of Question Answering.
"Watson will also incorporate massively parallel analytical capabilities and, just like human competitors, Watson will not be connected to the Internet, or have any other outside assistance.
"If we can teach a computer to play Jeopardy!, what could it mean for science, finance, healthcare and business? By drastically advancing the field of automatic question answering, the Watson project's ultimate success will be measured not by daily doubles, but by what it means for society" (http://www.research.ibm.com/deepqa/index.shtml, accessed 06-16-2010).
On June 16, 2010 The New York Times Magazine published a long article by Clive Thompson on IBM's Watson's challenge of humans in Jeopardy! entitled, in the question response language of Jeopardy!, "What is I.B.M.'s Watson?."
Filed under: Artificial Intelligence, Games / Simulations , Linguistics / Translation / Speech, Television | Bookmark or share this entry »
Wolfram/Alpha
May 16, 2009
Stephen Wolfram and Wolfram Research launch Wolfram|Alpha, a computational data engine with a new approach to knowledge extraction, based on natural language processing, a large library of algorithms and an NKS (New Kind of Science) approach to answering queries.
The Wolfram|Alpha engine differs from traditional search engines in that it does not simply return a list of results based on a query, but instead computes an answer.
Filed under: Artificial Intelligence, Data Processing / Computing, Indexing & Seaching Information, Linguistics / Translation / Speech, Organization of Information / Taxonomy | Bookmark or share this entry »
Algorithm to Decipher Ancient Texts
September 2, 2009
"Researchers in Israel say they have developed a computer program that can decipher previously unreadable ancient texts and possibly lead the way to a Google-like search engine for historical documents.
"The program uses a pattern recognition algorithm similar to those law enforcement agencies have adopted to identify and compare fingerprints.
"But in this case, the program identifies letters, words and even handwriting styles, saving historians and liturgists hours of sitting and studying each manuscript.
"By recognizing such patterns, the computer can recreate with high accuracy portions of texts that faded over time or even those written over by later scribes, said Itay Bar-Yosef, one of the researchers from Ben-Gurion University of the Negev.
" 'The more texts the program analyses, the smarter and more accurate it gets,' Bar-Yosef said.
"The computer works with digital copies of the texts, assigning number values to each pixel of writing depending on how dark it is. It separates the writing from the background and then identifies individual lines, letters and words.
"It also analyses the handwriting and writing style, so it can 'fill in the blanks' of smeared or faded characters that are otherwise indiscernible, Bar-Yosef said.
"The team has focused their work on ancient Hebrew texts, but they say it can be used with other languages, as well. The team published its work, which is being further developed, most recently in the academic journal Pattern Recognition due out in December but already available online. A program for all academics could be ready in two years, Bar-Yosef said. And as libraries across the world move to digitize their collections, they say the program can drive an engine to search instantaneously any digital database of handwritten documents. Uri Ehrlich, an expert in ancient prayer texts who works with Bar-Yosef's team of computer scientists, said that with the help of the program, years of research could be done within a matter of minutes. 'When enough texts have been digitized, it will manage to combine fragments of books that have been scattered all over the world,' Ehrlich said" (http://www.reuters.com/article/newsOne/idUSTRE58141O20090902, accessed 09-02-2009).
Filed under: Artificial Intelligence, Graphics / Visualization / Animation, Indexing & Seaching Information, Linguistics / Translation / Speech, Manuscripts & Manuscript Copying, Writing / Palaeography / Calligraphy | Bookmark or share this entry »
ICANN Will Allow Web Addresses in Non-Latin Alphabets
October 30, 2009
The Internet Corporation for Assigned Names and Numbers (ICANN) votes to allow Web addresses written completely in Chinese, Arabic, Korean and other languages using non-Latin alphabets.
"The decision is a 'historic move toward the internationalization of the Internet,' said Rod Beckstrom, Icann’s president and chief executive. 'We just made the Internet much more accessible to millions of people in regions such as Asia, the Middle East and Russia.'
"This change affects domain names — anything that comes after the dot, including .com, .cn or .jp. Domain names have been limited to 37 characters — 26 Latin letters, 10 digits and a hyphen. But starting next year, domain names can consist of characters in any language. In some Web addresses, non-Latin scripts are already used in the portion before the dot. Thus, Icann’s decision Friday makes it possible, for the first time, to write an entire Internet address in a non-Latin alphabet.
"Initially, the new naming system will affect only Web addresses with 'country codes,' the designators at the end of an address name, like .kr (for Korea) or .ru (for Russia). But eventually, it will be expanded to all types of Internet address names, Icann said.
"Some security experts have warned that allowing internationalized domain names in languages like Arabic, Russian and Chinese could make it more difficult to fight cyberattacks, including malicious redirects and hacking. But Icann said it was ready for the challenge. 'I do not believe that there would be any appreciable difference,' Mr. Beckstrom said in an interview. 'Yes, maybe some additional potential but at the same time, some new security benefits may come too. If you look at the global set of cybersecurity issues, I don’t see this as any significant new threat if you look at it on an isolated basis.'
"The decision, reached after years of testing and debate, clears the way for Icann to begin accepting applications for non-Latin domain names Nov. 16. People will start seeing them in use around mid-2010, particularly in Arabic, Chinese and other scripts in which demand for the new 'internationalized' domain name system has been among the strongest, Icann officials say. Internet addresses in non-Latin scripts could lead to a sharp increase in the number of global Internet users, eventually allowing people around the globe to navigate much of the online world using their native language scripts, they said.
"This is a boon especially for users who find it cumbersome to type in Latin characters to access Web pages. Of the 1.6 billion Internet users worldwide, more than half use languages that have scripts that are not based on the Latin alphabet." (http://www.nytimes.com/2009/10/31/technology/31net.html?hp)
Filed under: Internet & Networking , Linguistics / Translation / Speech | Bookmark or share this entry »
The Film Avatar and Our Vision of Virtual Reality
December 10, 2009
Avatar, an American science fiction epic film written and directed by film director, producer, screenwriter, editor, and inventor James Cameron, and starring Sam Worthington, Zoe Saldana, Sigourney Weaver, Michelle Rodriguez and Stephen Lang, is first released in London.
"The film is set in the year 2154 on Pandora, a moon in the Alpha Centauri star system. Humans are engaged in mining Pandora's reserves of a precious mineral, while the Na'vi—a race of indigenous humanoids—resist the colonists' expansion, which threatens the continued existence of the Na'vi and the Pandoran ecosystem. The film's title refers to the genetically engineered bodies used by the film's characters to interact with the Na'vi.
"Avatar had been in development since 1994 by Cameron, who wrote an 80-page scriptment for the film. Filming was supposed to take place after the completion of Titanic, and the film would have been released in 1999, but according to Cameron, 'technology needed to catch up' with his vision of the film. In early 2006, Cameron developed the script, as well as the language and culture of the Na'vi. He said sequels would be possible if Avatar was successful, and in response to the film's success, confirmed that there will be another two.
"The film was released in traditional 2-D, as well as 3-D, RealD 3D, Dolby 3D, and IMAX 3D formats. Avatar is officially budgeted at $237 million; other estimates put the cost at $280–310 million to produce and $150 million for marketing. The film is being touted as a breakthrough in terms of filmmaking technology, for its development of 3D viewing and stereoscopic filmmaking with cameras that were specially designed for the film's production.
"Avatar premiered in London, UK on December 10, 2009, and was released on December 18, 2009 in the US and Canada to critical acclaim and commercial success. It grossed $27 million on its opening day domestically (in the United States and Canada) and $77 million domestically on its opening weekend. It opened two days earlier internationally and grossed $232 million worldwide in its first five days of international release. Within three weeks of its release, with a worldwide gross of over $1 billion, Avatar became the second highest-grossing film of all time worldwide, exceeded only by Cameron's previous film, Titanic" (Wikipedia article on Avatar (2009 film), accessed 01-16-2010).
♦ From my perspective the most significant aspect of Avatar, apart from its breathtaking computer graphic animation, and the fascinating artificial culture and language of the Na'vi, was the convincing portrayal of a total virtual reality experience. The film presented a vision of a reality that I could not have imagined before viewing. In its presentation of a new view of reality it is reminiscent of the 1982 film, Blade Runner, directed by Ridley Scott.
Another aspect of the film that is highly timely is its depiction of the struggle between destructive exploitation of natural resources versus living in harmony with nature.
Filed under: Cinematography / Films / Video, Ecology / Conservation / Planning, Fiction, Science Fiction, Drama, Poetry, Linguistics / Translation / Speech, Virtual Reality | Bookmark or share this entry »
2010 – Present
Google Introduces Translation Feature for Google Goggles
May 6, 2010
Google announces a translation feature for Google Goggles, image recognition and search feature available on Android-based mobile devices.
"Here’s how it works:
"Point your phone at a word or phrase. Use the region of interest button to draw a box around specific words Press the shutter button
"If Goggles recognizes the text, it will give you the option to translate
"Press the translate button to select the source and destination languages."
"Today Goggles can read English, French, Italian, German and Spanish and can translate to many more languages. We are hard at work extending our recognition capabilities to other Latin-based languages. Our goal is to eventually read non-Latin languages (such as Chinese, Hindi and Arabic) as well."
Filed under: Imaging / Photography , Indexing & Seaching Information, Linguistics / Translation / Speech | Bookmark or share this entry »
The First Internet Addresses in Non-Latin Characters
May 6, 2010
"Three Mideast countries have become the first to get Internet addresses entirely in non-Latin characters.
"Domain names in Arabic for Egypt, Saudi Arabia and the United Arab Emirates were added to the Internet's master directories on Wednesday, following final approval last month by the Internet Corporation for Assigned Names and Numbers, or ICANN. It's the first major change to the Internet domain name system since its creation in the 1980s.
"Registrations for websites to use those names are to begin soon. On Thursday, Egypt granted three companies approval to register names using the country's new Arabic suffix" (http://hosted.ap.org/dynamic/stories/M/ML_EGYPT_ARAB_DOMAIN_NAMES?SITE=AP&SECTION=HOME&TEMPLATE=DEFAULT, accessed 05-16-2010).
Filed under: Internet & Networking , Linguistics / Translation / Speech | Bookmark or share this entry »