If pages within your website are not set up with the correct encoding, you might find unrecognised non-alphanumeric characters appearing abnormally.
Sometimes the page will try to render them but fail dramatically (we’ve experienced a Pound Sterling symbol (£) appearing as £), whereas other times the character will simply be displayed as a �. Even everyday characters such as single and double quotes have been known to be displayed as question marks, if the page’s encoding is not correct.
So which character encoding should you choose to ensure this doesn’t happen? The answer is UTF-8, the reasoning for which is below:
In the beginning the web contained solely alphanumeric ASCII characters, and from a developer’s point of view all was good.
Over time, however, a need for more characters arose. Thus a series of 8-bit extensions were introduced, which improved the standard 7-bit encoded ASCII characters to include special symbols and letters such as é and æ. As each extension could only handle 127 additional characters, specific sets were developed to handle characters used by specific parts of the world (Western Europe’s being ‘ISO-8859-1’).
As the number of multinational websites and users began to expand, developers faced the escalating task of ensuring their pages were set up to recognise characters from each of the 8-bit extensions. As this task grew harder and harder, there emerged the need of a character set that would support all languages – and Unicode was born.
Going far beyond the initial 8-bit extensions that had been applied to ASCII, characters from almost every language in the world were supported. With Unicode, everything just seemed to work.
UTF-8 has steadily become arguably the most dominant international encoding (Unicode) on the web. If SWS are ever building a website or piece of software that needs to support multiple languages and character sets, we make sure to use the UTF-8 charset within our pages.