Unraveling Message Encoding in SMS: A Deeper Insight

Understanding the intricacies of text message encoding is pivotal in the world of SMS communication. One common question that arises is why the segment count might unexpectedly double. This phenomenon is intricately tied to the encoding types used when sending SMS messages.

SMS messages, across the industry and by default, are sent in GSM-7 encoding. This allows as many commonly used characters as possible to be sent using one type of encoding as compactly as possible. When a special character (e.g., curly quotes or symbols like '»')or emoji is added to your script that is outside of the GSM-7 encoding, then we have to change to another type of coding (Unicode) to be able to send your message. GSM-7 allows for 160 characters for the first segment (153 characters per additional segment), but Unicode only allows for 70 for the first segment (67 per additional segment). This means that you can double the number of segments to send a message by adding a single curly comma or quote.

How do I know how my message will be encoded?

The message encoding type is shown beneath the initial message script editor as "Character Set":

A screenshot of the Initial Message text input, with black text on a white background. The 7-bit Character Set is highlighted by a rounded green solid-lined oval.

When a character outside GSM-7 (7-bit) encoding is added to your initial message, you will see the Character Set, Total Segments, and the Characters Remaining change:

A screenshot of the Initial Message text input, with black text on a white background. The Character Set has changed from 7bit in the previous image, to Unicode.

MMS messages are sent as 1 segment, regardless of the characters entered in the script.

How do I check whether a Unicode character is in my script?

Suppose the Character Set unexpectedly changes or the number of segments is larger than expected. In that case, you can always copy and paste your message into an SMS character counter like (external tool) Messaging Segment Calculator to check the coding of the message.

This is what a Unicode text looks like in the external Messaging Segment Calculator:

Screenshot of black text on a individual light green rectangles on a white background. The encoding is "Unicode."

Note the "Encoding Used" that says "UCS-2" and "Number of Segments" listed as "2". This means the message is labeled "Unicode" within the tool. The emoji is changing the coding of the whole message.

By removing the emoji, the entire message has changed to 7bit (GSM-7):

Screenshot of black text on grey rectangular boxes on a white background. The encoding is 7bit.

Note that the SMS parts, the segments, have decreased by half.

Some organizations pay per-segment and ensuring that messages are encoded correctly will save you money!

Steps GetThru is taking to prevent unintended encoding changes

In an effort to reduce surprise encoding changes when an outgoing message is drafted (an initial, FollowUp, or reply message), ThruText now automatically changes the characters listed in GitHub to their 7bit forms. For example, if the character " — " is added to an outgoing message, it will change to " - " and the encoding will remain 7bit (GSM-7). Automated character swaps save your organization segments and money!