About this

This site was made because NFO files use outdated encodings. You can read below why this happened.

Source code

This website is based upon an application I wrote recently. You can find it on Github.

ASCII Art

If you try to open an NFO file in a regular text editor, it will render the text, but there are weird symbols all around it. These symbols are used to create ASCII art and they are stored in an outdated encoding scheme.

Encodings

Computers store everything in a binary representation, including text. To know which letter/digit/symbol is which number, a so called character table is used, which essentially maps these glyphs to digits. Computers were made so the smallest useful unit is a byte, which contains 8 bits. 8 Bit allow you to encode 256 symbols in a single byte. The first commonly used encoding was ASCII, which was only 7 bits however, giving you 128 symbols. If you take away all the digits once and the alphabet twice (upper- and lowercase) you are not left with much free slots to use and most of them went away for punctuation and control chars (like tab or line feeds).
With the emerging of 8-bit computers, the symbol space essentially doubled and was used for symbols that do not represent text. Included were shaded blocks (█▓▒░) and box drawing characters (┌┐└┘╔╗╚╝...). Now people had the ability to draw crude user interfaces and progress bars in their text terminal.

╔════════[crypt]═════════╗
║                        ║▒
║     Enter Password     ║▒
║                        ║▒
║  ****░░░░░░░░░░░░░░░░  ║▒
║                        ║▒
╚════════════════════════╝▒
 ▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒

When computers were developed, they contained a character table that contained these box drawing characters. This table was essentially identical to Codepage 437 and is the reason, why NFO files are usually encoded in this encoding.

Languages

The problem with an 8-bit encoding is that it can't fit the language specific characters of many languages at once. People who live in Europe need additional characters like "äöüàéèñ". These usually replace some of the box drawing characters, usually the "hybrid" ones that connect double and single lines. If you are an older computer user you might know the issue of seeing wrong characters placed at box junctions.

Windows

When graphical operating systems came, there was no need for box drawing characters anymore and they were replaced with more language specific characters. This step is the reason, ASCII art looks terrible in text editors but on the other hand, there were less codepages needed in total.
You can check this out yourself. Search for "character map" in Windows and you can have a look at different codepages. You need to check the "Advanced view" box to select different codepages.

Unicode to the rescue

Unicode is a multi-byte encoding. In fact, it uses 4 bytes for each character instead of only one. This gives you a whopping 4'294'967'296 character space, out of which about a million is already used up. Unicode contains enough space for all human languages and their quirks. There are even Egyptian Hieroglyphs (PDF) in there. This encoding also brings back our beloved block drawing chars and box drawing chars.
→ Unicode is the reason you can mix chinese with english and emoji in the same document.

UTF-8

UTF-8 is not a codepage itself but a way to encode Unicode. The problem with a 4-byte encoding is that everything just quadruples in size. UTF-8 avoids that. UTF-8 is a way to reduce the size needed to encode Unicode. The first 127 characters are identical to the "ancient" ASCII encoding. This is important, because it allows applications that don't understand Unicode and UTF to read most text files without any issues. Above the first 127 characters, UTF-8 becomes a multi-byte encoding. I am not going to explain how it works but you can read in on Wikipedia.
→ UTF-8 is now the most widely used encoding for websites.

About `ï»¿` and `∩╗┐`

UTF is technically a multi-byte encoding. If you use UTF-16, every "unit" is 2 bytes long instead of 1 in UTF-8. Two bytes can be stored the way we use regular numbers, with the highest digit on the left, or it can be stored with the lowest digit on the left. This is called endiannes and depends on your computers architecture. To tell an application in which way the text was encoded, this so called BOM (Byte Order Mark) is inserted at the beginning of a document. If your application is not aware of UTF, it will render them as shown in the heading. The first variant is what you commonly see in Windows, the second one is identical but rendered in the DOS codepage. For a single byte encoding, the endianness of a system doesn't matter and thus the BOM is useless for UTF-8. However it can be used to detect a document as being encoded in UTF-8.
→ If you see the BOM, your application is not Unicode aware and you should update it if possible.

The future

NFO files are probably going to switch to UTF-8 in the future. The problem here is that they are usually made with applications that allow the user to "paint" chars rather than typing them. These applications need to switch to use UTF-8 by default. I have seen some files in the past which were already encoded this way.

Converting to UTF-8

Converting from an old codepage to UTF-8 works flawlessly for languages that use the latin alphabet. In fact, this is what this website does, it reads the file in the encoding you specified, converts it to UTF-8 and then draws it. There are applications that can do this rather easily. If you have Notepad++ you can select "Convert to UTF-8" in the "Encoding" menu and save the document again. Done.