The solution of Stilgherrian’s script challenge

Here is a link to the challenge.

Here is my solution:

Stilgherrian's script challenge: solved!The solution

Now I can also show a chart of the vowel glyphs:

A chart of vowel signs with the words where they appearvowel chart

under each glyph I put the words where it appears, in ordinary spelling, with the corresponding letters underlined. I put together the short and long glyphs for each vowel, when they appear in the Document, because I believe that they are variants of the same character, even if I don’t know the rules to choose between them.

As you can test pronouncing those words yourself, vowel orthography is not strictly phonetic: let’s say that Danny made some concessions to ordinary spelling. In any case, at least six RP vowels don’t appear in the Document and I have no idea how to represent them: /ɑː/, /ʌ/, /ɔɪ/, /eə/, /ʊ/ and /ʊə/ (assuming that Danny did not distinguish between /i/ and /iː/, /u/ and /uː/. Otherwise /iː/ and /u/ are also missing).

There are many uncertainties, I’ve already mentioned some of them. For now, I’ll add only that perhaps, the /ɜː/ glyph in “personage” is only the extended form of the /e/ glyph in “citizen”. A hint that there could be some phonetic meaning to short and long forms?


What follows is a first draft of my report of how I arrived at the solution:

When I learned of the challenge at the end of August 2011, all hints pointed to a phonetic writing system for English (I did mention Deseret and Shavian in my first comment), so I started from there, working by the book: I broke up the challenge text (henceforth: the Document) into words, the words into characters, produced figure 1,

A colored version of the challenge text, with green boxes to separate different words, odd charecters in red, even charecters in bluefig. 1

and, based on known facts about word freqency in English, I immediately identified three words: “the” and “of”, which I mentioned, and the single-letter L4W2, which had to be the article “a”, the most frequent English single-sound word. In cryptography jargon, such clues are called “cribs”. Then, let me quote myself:

3. The most prominent feature of the script is the line (code named “blue line”, painted in blue in my figure) which appears in every word and is sometimes interrupted. I’ll call “ascenders” the strokes above it and “descenders” those below it. Even if a long slash as the beginning of L1W2 really is only one pen stroke, I will analyse it as two strokes, an ascender and a descender. These are standard typography terms. A unique feature of this script is that, while descenders can live either with or without a blue segment above them,
an ascender always requires one. The reverse is not true: in two instances, L1W3 and L6W4, blue segments start without being triggered by ascenders.

4. In almost each case where an ascender and a descender are drawn one above the other, possibly in one pen stroke and probably as part of the same letter, one of them is “more complicated” than the other. This prompts me to classify the candidate letters of this script into four groups:
(i) strictly negative: consisting of descenders only. These are the only letters which can live without a blue segment. All other categories require one, because they involve ascenders.
(ii) extended negative: descenders, with an additional ascender (a simple vertical segment)
(iii) strictly positive: ascenders only (a rich inventory of hooks, loops, dotted variants…)
(iv) extended positive, the same as (iii), with an additional descender, which is a vertical segment (sometimes with a dot)

When I wrote this I wanted to state facts. I didn’t want to share my wild guesses, but I already had an idea in mind: all vowels of my cribs were extended positives, the two consonants were strict negatives. What if the positives represented vowels and the negatives represented consonants? In that case the odd behavior of those glyphs with respect to the midline would have a fascinating explanation: the midline represented voice! Vowels, i.e. positives, are always voiced,
as well as sonorant consonants (the extended negatives!), while plosives, fricatives and affricates, the consonants that in English come in voiceless/voiced pairs, had to be represented by the strict negatives, which appeared in the Document both with and without a midline segment above them. If this was true, it wasn’t simply like Shavian, where the glyphs representing voiced consonants are flipped versions of the ones representing voiceless ones. In this script, voice was really written down with separate penstrokes of its own, as in some kind of spectrogram!

(In the following, I write IPA symbols between slashes, as /hɪə/, both to represent phonemes and to represent their corresponding script letters as I decipher them. I know this is against common IPA usage and I hope this causes no confusion.)

The hypothesis had to be verified. Of the two crib consonants, L1W4.2, the voiced final /v/ of “of” had indeed a “blue” segment above it, and this would imply that L3W4.1 represented /f/, its voiceless counterpart, but the other crib consonant, L1W1.1, the initial /ð/ of “the”, was also voiced, but had no segment above it! The whole construction, however, was too beautiful to be dismissed by that simple dash. Many writing systems have special exceptions for common words, so I didn’t consider my idea disproved, but I badly needed real data to see if actual English phoneme frequencies matched what I thought I was seeing in the Document. The ETANOISH sequence mentioned by Bob Bain is well known, but it holds for conventional spelling, and I considered it of little use here. Fortunately, one of the most authoritative living English phoneticians, Prof. John C. Wells, whose blog is in my RSS feed, had posted a piece completely written in IPA in June. It was probably long enough to extract significant phoneme occurrence statistics from it. I preferred starting from scratch and counting the phonemes myself, because Wells uses a standard transcription system I’m completely familiar with, while many articles that could be found on the web used somewhat different systems, different phoneme counts, were based on different varieties of English and would require more adaptation work (I was assuming that the Document represented an Australian variety rather similar to Wells’ British English, at least in phoneme distribution if not in realization… I hope I’m upsetting nobody with this sentence).

In any case the timeframe I could dedicate to this matter had expired. The challenge went into my TODO list with the lowest possible priority, and it stayed there for months. Last weekend, I pushed it to the top.

Not surprisingly, 12.38% of my sample consisted of the single phoneme /ə/. It was clear that in the Document no character was so frequent, but Wells’ is a radical transcription, where, for instance, “the” is transcribed either /ðə/ or /ði/ according to its pronunciation. If Danny, the inventor of the script I was deciphering, wanted to keep the same spelling for the same word in all positions, he might have used /ði/ throughout, reducing the frequency of /ə/. Such an approach might also have explained the disturbing fact that the single vowel of L4W2, a candidate for the indefinite article, far from being the commonest, appeared only there. I still don’t know the reason: now, I think that that letter means “indefinite article, sometimes /ə/, sometimes /eɪ/”. In any case, the positives made up 43.09% of the Document, and 39.44% of the sample consisted of vowels. Not close, but not apart enough to disprove the theory. Maybe there were some positives which weren’t vowels. I know now that that was indeed the case: L1W2.1 appears three times, 2.73% of the Document, and represents /h/, not a vowel and not even a voiced sound. But it is a simple slash, it is somewhat outside of the system just as /h/ is a somewhat special sound, so it’s OK.

After /ə/, the commonest phonemes in the sample are, in order, /ntɪslkr/. I had to go for consonants, that is, in my hypothesis, negatives. The extended ones had to be sonorant consonants. In English there are seven of them: /m/, /n/, /ŋ/, /w/, /l/, /r/, /j/, and indeed I counted seven extended negatives, all scythe-shaped, one of them dotted, either with a sharp or a rounded angle where the “handle” met the “blade”, at three possible depth levels below the midline: 3 depths × 2 angle types + 1 dotted = 7! Some strict negatives, on the other side, were the handleless counterparts of those scythes (let them be “sickles”), while the ones I had already identified as /v,f/ and /ð/ had a completely different shape. Wait! The latter were all fricatives… could the former be plosives? In that case, could the three depths correspond to the three places of articulation of English plosives? In that case, the scythe representing /n/ would be at the same depth of the sickle representing /d,t/ (with or without midline), and similarly /m/ with /b,p/ and /ŋ/ with /g,k/!
(if you feel confused, this chart might help). Frequencies showed where /n/ and /t/ are. They are at middepth. Labials tend to prefer initial positions, so they had to be the shallow scythes (/m/ and /w/) and sickles (/b,p/), which also showed this preference. The velars were at maximum depth, with a very conveniently final /ŋ/ at L3W4.6, which also showed that the nasals where the rounded scythes, so that /w/, /l/, /r/, /j/ had to be the sharp-angled ones, identifying the dotted one (L4W6.1) with /j/ (I think you all know that /j/ is the initial glide of “you” /juː/, and not the “j” of “Jew” /dʒuː/)

There was also a spatial metaphor in this: the closer to the lips a sound is articulated, the closer to the midline its glyph is written. How elegant!

At this point I had most consonants, and I thought I had understood why vowels came in strictly positive or in extended form. My idea was that this was a way to tell the many similar characters apart: some of them cut the stems of the following vowel, some don’t, and this is a way not to confuse them, alongside with sharpness and depth. I don't think this is correct, and I actually still don't know how the different forms of a vowel are chosen. However, an example of the effect of the preceding consonant would be this: the boomerang-shaped vowel L3W4.4 isn’t cut by the preceding /l/ (middepth sharp scythe) but a preceding /r/ (deep sharp scythe) cuts it at L5W1.2. There are three possibilities: cutting, overstriking and joining, as in L6W4, where a shallow sickle (a /b/) joins the vowel of the article /ði/. Hey, this is the verb to /bi/!

Some fricatives were still missing, notably /s/, the commonest of them. A natural candidate was the commonest of the still unidentified glyphs, L1W3.4. It also appeared in a ligature with /k/ at the beginning of L1W7, which then could be read as /skr?b?/. Hmmm. “scribes” /skraɪbz/ perhaps? Tempting. This would identify L1W2 as “high” /haɪ/, solving the problem of L1W2.1, and understanding the initial sequence of L2W1 as /kh/. (/k/ is usually a cutter, as in L1W3, but probably /h/ can’t be cut at all, or /kh/ is a special case. Also, /j/ cuts L4W6.2 but doesn’t cut L6W6.2. Maybe cutting is optional for such an easily identified letter, maybe there are rules we cannot derive from such a short text. Never mind.)

A small problem with reading L1W2 L1W3 as “nine scribes”, however, was that /z/ was not represented as in L4W1 and as it should be, as an /s/ below a midline segment, but with a somewhat abbreviated form, easily confused with a final /v/ (the difference is that in /v/ the glyph hovers below the midline, while in the abbreviated final /z/ it dangles from it. I still don’t know if such an abbreviated form is always optional or is restricted to the cases when /z/ is obviously a suffix (plural, third person, genitive…), but again, it’s not a big problem.

If you have followed me to this point, you are surely able to find out the vowels for yourself. I’ll list a couple of final remarks here:

1. L3W5 is “personage”. I’d pronounce that word /pɜːsənɪdʒ/, but the vowel values I found correspond to /pɜːsɒnædʒ/. This confirms what we had already observed, that vowel characters in this script are not precise phonetic representations. In particular, reduced vowel sounds are (often) written as the full vowel they etymologically come from, just like in conventional English spelling. This word is also the only occurrence of /dʒ,tʃ/. We don’t know how /ʒ,ʃ/ looks like, there are no occurrences in the Document, but /dʒ,tʃ/ is composed by the middepth sickle /d,t/ and a final curl. Maybe that final curl alone represents /ʒ,ʃ/.

2. The whole Document is obviously written in an r-dropping variety of English, as shown by L3W2 /rekɔːdz/ “records” (for /re-/ instead of /ri-/ see the vowel comment above). However, L3W1 is “hereby”, and after a unique first vowel that I interpret as /ɪə/ (and is not in extended form, for an unknown reason), there is an /r/ character: /hɪərbaɪ/. The /r/ would be read only before vowels, but Danny decided to write it always, so that the same word is always spelled the same. Also, I don’t know why /aɪ/ is dotted here (and in L5W1, “Rothmile”). Maybe because there is another stressed vowel in those words. I don’t know.

3. The final vowel of L4W6 is a bit strange. It could be unique, but I think it is an /ɔː/ as in L3W2 /rekɔːdz/ “records” (strictly positive) or in L6W7 /ɔːlwəz/ “always” (extended positive), so I read that word as /jəʊsentrɔː/ and transliterate it as “Yocentro” or “Yocentror” but I am really in doubt here.

Thank you very much for keeping up with me for such a long post, and may the Sands be with you always.

Dario
(an Italian mathematician by study, sysadmin by trade, amateur linguist by passion)