Unraveling Message Encoding in SMS: A Deeper Insight

Understanding the intricacies of text message encoding is pivotal in the world of SMS communication. One common question that arises is why the segment count might unexpectedly double. This phenomenon is intricately tied to the encoding types used when sending SMS messages.

SMS messages, across the industry and by default, are sent in GSM-7 encoding. This allows as many commonly used characters as possible to be sent using one type of encoding as compactly as possible. When a special character (e.g., curly quotes or symbols like '»')or emoji is added to your script that is outside of the GSM-7 encoding, then we have to change to another type of coding (Unicode) to be able to send your message. GSM-7 allows for 160 characters for the first segment (153 characters per additional segment), but Unicode only allows for 70 for the first segment (67 per additional segment). This means that you can double the number of segments to send a message by adding a single curly comma or quote.

How do I know how my message will be encoded?

The message encoding type is shown beneath the initial message script editor as "Character Set":

When a character outside GSM-7 (7-bit) encoding is added to your initial message, you will see the Character Set, Total Segments, and the Characters Remaining change:

MMS messages are sent as 1 segment, regardless of the characters entered in the script.

How do I check whether a Unicode character is in my script?

Suppose the Character Set unexpectedly changes or the number of segments is larger than expected. In that case, you can always copy and paste your message into an SMS character counter like Messente to check the coding of the message.

This is what a Unicode text looks like in Messente:

Note the "Encoding" type in the top right corner and two "SMS parts" (segments). The message is labeled "Unicode" within the tool, and the entire message is green. The message's dark green characters at the bottom of the image represent the emoji in the script. The emoji is changing the coding of the whole message.

By removing the emoji, the entire message has changed to 7bit (GSM-7):

Note that the SMS parts, the segments, have decreased by half.

Some organizations pay per-segment and ensuring that messages are encoded correctly will save you money!

Steps GetThru is taking to prevent unintended encoding changes

In an effort to reduce surprise encoding changes when a message is drafted (an initial, follow up, or reply message), ThruText now automatically changes the following characters:

Character InputSymbolResulting CharacterSymbol
U+2018 - "left curly apostrophe"
0x27 - "straight apostrophe" '
U+2019 - "right curly apostrophe"
0x27 - "straight apostrophe"'
U+201C - "curly quote" 
 0x22 - "straight quote" "
U+201D - "curly quote"
0x22 - "straight quote""
U+2014 - "long dash"

0x45 - "hyphen"