Ensuring text encoding - UTF-8

gunhilltrain

Multi-unit control
Joined
Mar 1, 2018
Posts
8,521
I'll try this question on AA rather than Tech Support. Recently the issue came up (not on Lit) of having submissions (text files in Word) encoded as UTF-8. I have some idea of what that means, but it is beyond my knowledge of how to ensure that applies to a particular document. Any help would be appreciated. So far I haven't found anything online that I can understand.
 
Here you go - quick and dirty.

Saving a Word Document in UTF-8

  • Save As
  • Name the Document
  • Select Plain Text (.txt)
  • Click Save
  • The File Conversion Window Opens
  • Select Other Encoding
  • Pick Unicode (UTF-8) from the window on the right.
  • Click Okay
 
Here you go - quick and dirty.

Saving a Word Document in UTF-8

  • Save As
  • Name the Document
  • Select Plain Text (.txt)
  • Click Save
  • The File Conversion Window Opens
  • Select Other Encoding
  • Pick Unicode (UTF-8) from the window on the right.
  • Click Okay
Okay, thanks, I saw that. I used to use the "Windows Default" button on the left. Am I correct in that this will only work when saving it as plain text? I've been using that format for a while now. The issue is to get quotation marks and apostrophes to appear correctly. So the text encoding will not work if merely saving it as a regular Word document? I need to get up to speed a bit on the different formats that Word documents can be saved as and their characteristics.
 
Okay, thanks, I saw that. I used to use the "Windows Default" button on the left. Am I correct in that this will only work when saving it as plain text? I've been using that format for a while now. The issue is to get quotation marks and apostrophes to appear correctly. So the text encoding will not work if merely saving it as a regular Word document? I need to get up to speed a bit on the different formats that Word documents can be saved as and their characteristics.
Correct - it has to be a plain text (.txt) document in order to encode it in UTF-8. Then, you will have to go through it and clean up/convert any .doc or .docx formatting to .txt formatting - which will involve any quotation marks, dashes, etc. Microsoft's special characters are not ANSI characters, which .txt files are ANSI.

If you write straight into Microsoft Notepad, it automatically encodes it as UTF-8.
 
Correct - it has to be a plain text (.txt) document in order to encode it in UTF-8. Then, you will have to go through it and clean up/convert any .doc or .docx formatting to .txt formatting - which will involve any quotation marks, dashes, etc. Microsoft's special characters are not ANSI characters, which .txt files are ANSI.

If you write straight into Microsoft Notepad, it automatically encodes it as UTF-8.
In over four years of publishing online on three different sites, I never had to be concerned with this. It only came up on another site with extremely old software, and even there is took two years before it became an issue after many stories had been published there. (They fixed it for me when I told them about it.) I usually start writing in plain text anyway, so I'm not sure how there could be any .doc or .docx formatting in there. Doesn't Notepad have to be converted to Word before it can be submitted to Lit? Anyway, thank you for the information you have been providing.

By the way, I have seen sites where various characters (e.g., quotation marks) have not been converted properly. I could check to see if I can find one now as an example.
 
In over four years of publishing online on three different sites, I never had to be concerned with this. It only came up on another site with extremely old software, and even there is took two years before it became an issue after many stories had been published there. (They fixed it for me when I told them about it.) I usually start writing in plain text anyway, so I'm not sure how there could be any .doc or .docx formatting in there. Doesn't Notepad have to be converted to Word before it can be submitted to Lit? Anyway, thank you for the information you have been providing.

By the way, I have seen sites where various characters (e.g., quotation marks) have not been converted properly. I could check to see if I can find one now as an example.
No, they accept *.TXT files in UTF-8.

1667529129830.png
 
No, they accept *.TXT files in UTF-8.

View attachment 2186060
Thanks, I have never actually used Notepad. Anyway, this is the first time I've ever been concerned with the text coding (UTF-8). I don't upload files, but rather cut and paste the plain text directly into the submission box. I've been using plain text for a long time, but I don't remember what I did at the very beginning at Lit which was over four years ago. I know I've never uploaded story text. I also have never had any problems with submitting stories on this site. For italics and bold, I add HTML tags.

The site that had problems does require files to be uploaded, but it seems to have very old software.
 
Back
Top