TqB Posted June 7, 2019 Share Posted June 7, 2019 I'm trying to translate some scanned Russian papers into English. What I'm looking for is a good OCR (optical character recognition) software that will then produce a translatable pdf to be used in one of the standard online translation sites. Needs to work with a Mac and I don't mind paying if it does the job well. Any suggestions? My searches just come up with loads of companies advertising their own... Link to post Share on other sites
Brett Breakin' Rocks Posted June 7, 2019 Share Posted June 7, 2019 Google Docs doesn't do this natively ? ... I thought they did. Try it ? ... if it doesn't work they give you the code and API to write the app yourself .. haha A quick search: https://support.google.com/drive/answer/176692?co=GENIE.Platform%3DDesktop&hl=en Convert PDF and photo files to text You can convert image files to text with Google Drive. Computer AndroidiPhone & iPad Prepare the file These tips will give you the best results: Format: You can convert .JPEG, .PNG, .GIF, or PDF (multipage documents) files. File size: The file should be 2 MB or less. Resolution: Text should be at least 10 pixels high. Orientation: Documents must be right-side up. If your image is facing the wrong way, rotate it before uploading it to Google Drive. Languages: Google Drive will detect the language of the document. Font and character set: For best results, use common fonts such as Arial or Times New Roman. Image quality: Sharp images with even lighting and clear contrasts work best. Convert an image file On your computer, go to drive.google.com. Right-click on the desired file. Click Open with Google Docs. The image file will be converted to a Google Doc, but some formatting might not transfer: Bold, italics, font size, font type, and line breaks are most likely to be retained. Lists, tables, columns, footnotes, and endnotes are likely not be detected. Edit: Here is the tutorial so you can write your own app using Google's Cloud OCR. https://cloud.google.com/functions/docs/tutorials/ocr 1 Link to post Share on other sites
TqB Posted June 7, 2019 Author Share Posted June 7, 2019 1 hour ago, Brett Breakin' Rocks said: Google Docs doesn't do this natively ? ... I thought they did. Try it ? ... if it doesn't work they give you the code and API to write the app yourself .. haha Thanks, I'm trying that - I can't make it do Russian - opening a scanned Russian text I put on Google Drive just comes out as English characters, like this: Link to post Share on other sites
DPS Ammonite Posted June 7, 2019 Share Posted June 7, 2019 Could you post your scanned document? Maybe someone could get a translation program to work and send the product back to you. Link to post Share on other sites
Brett Breakin' Rocks Posted June 7, 2019 Share Posted June 7, 2019 44 minutes ago, TqB said: Thanks, I'm trying that - I can't make it do Russian - opening a scanned Russian text I put on Google Drive just comes out as English characters, like this: Darn ... that stinks ... it says it can recognize languages. I guess only English ? Sorry bout that .. Cheers, Brett Link to post Share on other sites
TqB Posted June 7, 2019 Author Share Posted June 7, 2019 18 minutes ago, Brett Breakin' Rocks said: Darn ... that stinks ... it says it can recognize languages. I guess only English ? Sorry bout that .. Cheers, Brett Not at all, I'm learning as I go! There's probably a way to make it work... Link to post Share on other sites
Brett Breakin' Rocks Posted June 7, 2019 Share Posted June 7, 2019 Just now, TqB said: Not at all, I'm learning as I go! There's probably a way to make it work... Yeah, sounds like a UI bug. I wonder if the recognizer has to be primed for the incoming non-standard characters. You'd think it being free someone has written an open source specifically for Russian text. Good Luck ! 1 Link to post Share on other sites
piranha Posted June 7, 2019 Share Posted June 7, 2019 Send me the pdfs in a PM and I will OCR them for you in Russian. 1 Link to post Share on other sites
TqB Posted June 7, 2019 Author Share Posted June 7, 2019 A couple of knowledgeable friends have independently suggested ABBYY Fine Reader as the best to buy for Cyrillic recognition (and it does all other texts that most people are likely to need). A little expensive but not considering what it does. Link to post Share on other sites
TqB Posted June 7, 2019 Author Share Posted June 7, 2019 Just now, piranha said: Send me the pdfs in a PM and I will OCR them for you Russian. That's very kind, thank you! I'll try a free trial of the ABBYY first and shout if I have problems. I have quite a few to do - other languages as well. Wish this had been around when I did my Ph.D decades ago! Link to post Share on other sites
TqB Posted June 7, 2019 Author Share Posted June 7, 2019 @piranha Once converted, do you have a preferred (free) document translator? Link to post Share on other sites
piranha Posted June 7, 2019 Share Posted June 7, 2019 5 minutes ago, TqB said: @piranha Once converted, do you have a preferred (free) document translator? Google Translate does a good job up to 5K characters at a time. The best bet for OCR is Adobe, especially for reliable Russian and Chinese. 2 Link to post Share on other sites
TqB Posted June 7, 2019 Author Share Posted June 7, 2019 40 minutes ago, piranha said: Google Translate does a good job up to 5K characters at a time. The best bet for OCR is Adobe, especially for reliable Russian and Chinese. Thank you! Link to post Share on other sites
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now