Jump to content

Translating pdf papers (Russian etc.)


TqB

Recommended Posts

I'm trying to translate some scanned Russian papers into English.

What I'm looking for is a good OCR (optical character recognition) software that will then produce a translatable pdf to be used in one of the standard online translation sites.

Needs to work with a Mac and I don't mind paying if it does the job well.

 

Any suggestions? My searches just come up with loads of companies advertising their own...

Tarquin

Link to comment
Share on other sites

Google Docs doesn't do this natively ? ... I thought they did.  Try it ? ... if it doesn't work they give you the code and API to write the app yourself .. haha

 

A quick search: https://support.google.com/drive/answer/176692?co=GENIE.Platform%3DDesktop&hl=en

 

Convert PDF and photo files to text

You can convert image files to text with Google Drive.

Prepare the file

These tips will give you the best results:

  • Format: You can convert .JPEG, .PNG, .GIF, or PDF (multipage documents) files.
  • File size: The file should be 2 MB or less.
  • Resolution: Text should be at least 10 pixels high.
  • Orientation: Documents must be right-side up. If your image is facing the wrong way, rotate it before uploading it to Google Drive.
  • Languages: Google Drive will detect the language of the document.
  • Font and character set: For best results, use common fonts such as Arial or Times New Roman.
  • Image quality: Sharp images with even lighting and clear contrasts work best.

Convert an image file

  1. On your computer, go to drive.google.com.
  2. Right-click on the desired file.
  3. Click Open with and then Google Docs.
  4. The image file will be converted to a Google Doc, but some formatting might not transfer:
    • Bold, italics, font size, font type, and line breaks are most likely to be retained.
    • Lists, tables, columns, footnotes, and endnotes are likely not be detected.

 

 

Edit: Here is the tutorial so you can write your own app using Google's Cloud OCR. 

 

https://cloud.google.com/functions/docs/tutorials/ocr

  • I found this Informative 1
Link to comment
Share on other sites

1 hour ago, Brett Breakin' Rocks said:

Google Docs doesn't do this natively ? ... I thought they did.  Try it ? ... if it doesn't work they give you the code and API to write the app yourself .. haha

 

 

 

Thanks, I'm trying that - I can't make it do Russian - opening a scanned Russian text I put on Google Drive just comes out as English characters, like this:

5cfa724194479_Screenshot2019-06-07at15_16_13.thumb.png.92c4fe075f0be98b24830ccd10a5c074.png

Tarquin

Link to comment
Share on other sites

Could you post your scanned document? Maybe someone could get a translation program to work and send the product back to you.

My goal is to leave no stone or fossil unturned.   

See my Arizona Paleontology Guide    link  The best single resource for Arizona paleontology anywhere.       

Link to comment
Share on other sites

44 minutes ago, TqB said:

Thanks, I'm trying that - I can't make it do Russian - opening a scanned Russian text I put on Google Drive just comes out as English characters, like this:

Darn ... that stinks ... it says it can recognize languages. I guess only English ? Sorry bout that .. 

 

Cheers,

Brett

Link to comment
Share on other sites

18 minutes ago, Brett Breakin' Rocks said:

Darn ... that stinks ... it says it can recognize languages. I guess only English ? Sorry bout that .. 

 

Cheers,

Brett

Not at all, I'm learning as I go! There's probably a way to make it work...

Tarquin

Link to comment
Share on other sites

Just now, TqB said:

Not at all, I'm learning as I go! There's probably a way to make it work...

Yeah, sounds like a UI bug. I wonder if the recognizer has to be primed for the incoming non-standard characters. You'd think it being free someone has written an open source specifically for Russian text. 

 

Good Luck !

  • I found this Informative 1
Link to comment
Share on other sites

A couple of knowledgeable friends have independently suggested ABBYY Fine Reader as the best to buy for Cyrillic recognition (and it does all other texts that most people are likely to need). A little expensive but not considering what it does.

Tarquin

Link to comment
Share on other sites

Just now, piranha said:

Send me the pdfs in a PM and I will OCR them for you Russian. :fistbump:

That's very kind, thank you! I'll try a free trial of the ABBYY first and shout if I have problems. :wub:B) I have quite a few to do - other languages as well. Wish this had been around when I did my Ph.D decades ago!

Tarquin

Link to comment
Share on other sites

5 minutes ago, TqB said:

@piranha Once converted, do you have a preferred (free) document translator?

 

 

Google Translate does a good job up to 5K characters at a time.  The best bet for OCR is Adobe, especially for reliable Russian and Chinese.

  • I found this Informative 2

image.png.a84de26dad44fb03836a743755df237c.png

Link to comment
Share on other sites

40 minutes ago, piranha said:

 

 

Google Translate does a good job up to 5K characters at a time.  The best bet for OCR is Adobe, especially for reliable Russian and Chinese.

Thank you!

Tarquin

Link to comment
Share on other sites

×
×
  • Create New...