Jump to content

Translating pdf papers (Russian etc.)


Recommended Posts

I'm trying to translate some scanned Russian papers into English.

What I'm looking for is a good OCR (optical character recognition) software that will then produce a translatable pdf to be used in one of the standard online translation sites.

Needs to work with a Mac and I don't mind paying if it does the job well.

 

Any suggestions? My searches just come up with loads of companies advertising their own...

Link to post
Share on other sites
Brett Breakin' Rocks

Google Docs doesn't do this natively ? ... I thought they did.  Try it ? ... if it doesn't work they give you the code and API to write the app yourself .. haha

 

A quick search: https://support.google.com/drive/answer/176692?co=GENIE.Platform%3DDesktop&hl=en

 

Convert PDF and photo files to text

You can convert image files to text with Google Drive.

Prepare the file

These tips will give you the best results:

  • Format: You can convert .JPEG, .PNG, .GIF, or PDF (multipage documents) files.
  • File size: The file should be 2 MB or less.
  • Resolution: Text should be at least 10 pixels high.
  • Orientation: Documents must be right-side up. If your image is facing the wrong way, rotate it before uploading it to Google Drive.
  • Languages: Google Drive will detect the language of the document.
  • Font and character set: For best results, use common fonts such as Arial or Times New Roman.
  • Image quality: Sharp images with even lighting and clear contrasts work best.

Convert an image file

  1. On your computer, go to drive.google.com.
  2. Right-click on the desired file.
  3. Click Open with and then Google Docs.
  4. The image file will be converted to a Google Doc, but some formatting might not transfer:
    • Bold, italics, font size, font type, and line breaks are most likely to be retained.
    • Lists, tables, columns, footnotes, and endnotes are likely not be detected.

 

 

Edit: Here is the tutorial so you can write your own app using Google's Cloud OCR. 

 

https://cloud.google.com/functions/docs/tutorials/ocr

  • I found this Informative 1
Link to post
Share on other sites
1 hour ago, Brett Breakin' Rocks said:

Google Docs doesn't do this natively ? ... I thought they did.  Try it ? ... if it doesn't work they give you the code and API to write the app yourself .. haha

 

 

 

Thanks, I'm trying that - I can't make it do Russian - opening a scanned Russian text I put on Google Drive just comes out as English characters, like this:

5cfa724194479_Screenshot2019-06-07at15_16_13.thumb.png.92c4fe075f0be98b24830ccd10a5c074.png

Link to post
Share on other sites
DPS Ammonite

Could you post your scanned document? Maybe someone could get a translation program to work and send the product back to you.

Link to post
Share on other sites
Brett Breakin' Rocks
44 minutes ago, TqB said:

Thanks, I'm trying that - I can't make it do Russian - opening a scanned Russian text I put on Google Drive just comes out as English characters, like this:

Darn ... that stinks ... it says it can recognize languages. I guess only English ? Sorry bout that .. 

 

Cheers,

Brett

Link to post
Share on other sites
18 minutes ago, Brett Breakin' Rocks said:

Darn ... that stinks ... it says it can recognize languages. I guess only English ? Sorry bout that .. 

 

Cheers,

Brett

Not at all, I'm learning as I go! There's probably a way to make it work...

Link to post
Share on other sites
Brett Breakin' Rocks
Just now, TqB said:

Not at all, I'm learning as I go! There's probably a way to make it work...

Yeah, sounds like a UI bug. I wonder if the recognizer has to be primed for the incoming non-standard characters. You'd think it being free someone has written an open source specifically for Russian text. 

 

Good Luck !

  • I found this Informative 1
Link to post
Share on other sites

Send me the pdfs in a PM and I will OCR them for you in Russian. :fistbump:

  • I found this Informative 1
Link to post
Share on other sites

A couple of knowledgeable friends have independently suggested ABBYY Fine Reader as the best to buy for Cyrillic recognition (and it does all other texts that most people are likely to need). A little expensive but not considering what it does.

Link to post
Share on other sites
Just now, piranha said:

Send me the pdfs in a PM and I will OCR them for you Russian. :fistbump:

That's very kind, thank you! I'll try a free trial of the ABBYY first and shout if I have problems. :wub:B) I have quite a few to do - other languages as well. Wish this had been around when I did my Ph.D decades ago!

Link to post
Share on other sites

@piranha Once converted, do you have a preferred (free) document translator?

Link to post
Share on other sites
5 minutes ago, TqB said:

@piranha Once converted, do you have a preferred (free) document translator?

 

 

Google Translate does a good job up to 5K characters at a time.  The best bet for OCR is Adobe, especially for reliable Russian and Chinese.

  • I found this Informative 2
Link to post
Share on other sites
40 minutes ago, piranha said:

 

 

Google Translate does a good job up to 5K characters at a time.  The best bet for OCR is Adobe, especially for reliable Russian and Chinese.

Thank you!

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...