Problem scanning text files into Word

DevilishTexan

Literotica Guru
Joined
Dec 3, 2003
Posts
71,963
I use a HP v40i multifunction printer with a load in scanner. The problem I run into is when I open the scanned page in Word many words are askew. And its quite a bit so I don't want to have to edit because it would take some time. It has built in OCR software although I suppose it isn't very good. Is this the problem? I have to scan business docs and covert to PDF so I need to figure this out. Thanks.
 
DevilishTexan said:
I use a HP v40i multifunction printer with a load in scanner. The problem I run into is when I open the scanned page in Word many words are askew. And its quite a bit so I don't want to have to edit because it would take some time. It has built in OCR software although I suppose it isn't very good. Is this the problem? I have to scan business docs and covert to PDF so I need to figure this out. Thanks.

I haven't seen any OCR software that can scan printed text error free, so you're always going to have to do some editing.

If your OCR software allows it, scanning a page in sections -- select as close to the edges of the text as possible and no more than about half a page and OCR just the selected portion.

My OCR software will also allow me to OCR a TIFF image. Scanning a document at a high resolution into a TIFF image and running the OCR on that usually produces better results for me -- although it does mess up the font size.

If you can increase the OCR scan resolution above the 300DPI default, you'll get better results -- but I haven't found any way to do that with my OCR software, (Omnipage 5.0 from Caere Software (c) 1996).

A quick Google Search reveals Omnipage is up to version 15 and is now by Nuance software -- and is a very spendy $149.99 for the standard version. http://www.nuance.com/omnipage/standard/

If Nuance has maintained anything close to the functionality of version 5.0 Limeted edition that was bundled with my first scanner, and you're using it for business, it's probably worth it.
 
Weird Harold said:
I haven't seen any OCR software that can scan printed text error free, so you're always going to have to do some editing.

If your OCR software allows it, scanning a page in sections -- select as close to the edges of the text as possible and no more than about half a page and OCR just the selected portion.

My OCR software will also allow me to OCR a TIFF image. Scanning a document at a high resolution into a TIFF image and running the OCR on that usually produces better results for me -- although it does mess up the font size.

If you can increase the OCR scan resolution above the 300DPI default, you'll get better results -- but I haven't found any way to do that with my OCR software, (Omnipage 5.0 from Caere Software (c) 1996).

A quick Google Search reveals Omnipage is up to version 15 and is now by Nuance software -- and is a very spendy $149.99 for the standard version. http://www.nuance.com/omnipage/standard/

If Nuance has maintained anything close to the functionality of version 5.0 Limeted edition that was bundled with my first scanner, and you're using it for business, it's probably worth it.

Thanks WH. I can always depend on you. Yeah, it will allow me to scan a Tiff but this is very limited software because its a multi-function printer. I have a stand alone scanner but its not compatible with XP.
I don't think I can increase the scan dpi. But I'll mess with it and see.
 
I checked and I can increase dpi to 1200 but only when I scan as an image instead of a text file. We'll see how it comes out. It just might work. Thanks bro.
 
Eureka!!! I was able to scan to bitmap image then convert to PDF perfectly. You rock Harold. :D
 
DevilishTexan said:
Eureka!!! I was able to scan to bitmap image then convert to PDF perfectly. You rock Harold. :D

Glad I could help. It's usually just a matter of out-smarting the software and overcoming the unrealistic assumptions of the programmers. Whoever decided that 300DPI was sufficient for accurate OCR apparently never tested it except under ideal conditions -- like perfectly printed documents on high contrast paper.
 
Back
Top