Foundations of Amateur Radio

If you've ever looked at Morse Code, you might be forgiven if you conclude that it appears to be a less than ideal way of getting information from point A to point B. The idea is simple, based on a set of rules, you translate characters, one at a time, into a series of dits and dahs, each spaced apart according to the separation between each element, each character and each word.

The other day I came across a statement that asserted that you could send Morse faster than binary encoded ASCII letters. If you're not sure what that means, there are many different ways to encode information. In Morse, the letter "e" is the first character, represented by "dit", the letter "t" is the second character, represented by "dah". In ASCII, the American Standard Code for Information Interchange, the letter "e" is the 69th character, represented by 100 0101. The letter "t" is number 84 on the list, represented by 101 0100.

A couple of things to observe. The order of the characters between Morse and ASCII are not the same. That doesn't really matter, as long as both the sender and receiver agree that they're using the same list. Another thing to notice is that in Morse, letters are encoded using dits and dahs and appropriate spacing. In ASCII, or technically, binary coded ASCII, the letters are encoded using zero and one.

I'll also mention that there are plenty of other ways to encode information, EBCDIC or Extended Binary Coded Decimal Interchange Code was defined by IBM for its mainframe and mid-range computers. It's still in use today. In EBCDIC, the letter "e" is 133 and the letter "t" is 163. It was based around punched cards to ensure that hole punches were not too close together. It was designed for global use and can, for example, support Chinese, Japanese, Korean and Greek. Another encoding you might have heard of is UTF-16, which supports over a million different characters including all the emojis in use today.

Before I continue, I must make a detour past the ITU or the International Telecommunications Union. The ITU has a standard, called "Recommendation M.1677-1", approved on the 3rd of October 2009, which defines International Morse code. I'm making that point because I'm going to dig deeper into Morse and it helps if we're talking about the same version of Morse. I have talked about many versions of Morse before, so I'll leave that alone, but I will point out a couple of things.

The ITU defines 56 unique Morse sequences or characters. The obvious ones are the letters of the alphabet, the digits and several other characters like parentheses, quotes, question mark, full-stop, and comma, including the symbol in the middle of an email address, which it calls the "commercial at symbol" with a footnote telling us that the French General Committee on Terminology approved the term "arobase" in December 2002, but it seems that seven years isn't enough time to convince the ITU to update its own standard, mind you, the rest of the world, well, the English speaking part, calls it "at", the letter "a" with a circle around it, as in my email address, cq@vk6flab.com.

Another thing to note is that this standard is only available in English, Arabic, Chinese, French and Russian, so I'm not sure what the Spanish, Hindi, Portuguese, Bengali and Japanese communities, who represent a similar population size do for their Morse definitions. It's interesting to note that as part of its commitment to multilingualism, the ITU actually defines six official languages. Specifically, the "Spanish" version of the standard appears to be missing.

There's other curious things. For example, the standard defines a special character called "accented e", though it doesn't describe which accent, given that there are four variants in French alone, I found at least seven versions and it completely ignores accents on the i, the c, the o, special character combinations like "sz" in German and "ij" in Dutch. This isn't to throw shade on Morse, it's to point out that it's an approximation of a language with odd variations. I'm also going to ignore capitalisation. In Morse there's none and in ASCII, there are definitions for both, capitalised and not.

In addition to things you write in a message, there's also control codes. The ITU defines six specific Morse control codes. Things like "Understood", "Wait", and "Error". ASCII has those too. The first 31 codes in ASCII are reserved for controls like "linefeed", "carriage return", and "escape".

There are other oddities. The ITU specifies that the control code "Invitation to transmit" is symbolised by dah-dit-dah. If you're familiar with Morse, you'll know that this is the same as the letter "k". The specification says that multiplication is dah-dit-dit-dah, which is the same as "x". There's also rules on how to signify percentages and fractions using dah-dit-dit-dit-dit-dah, the hyphen, as a separator.

At this point I haven't even gotten close to exploring efficiency, but my curiosity is in overdrive. Is Morse really optimised for English, or are there other forces at work? I'm already digging.

I'm Onno VK6FLAB