نحن مع غزة
عبقرية الإرادة: ديسمبر 2007

الثلاثاء، ١٨ ديسمبر، ٢٠٠٧

OCR for Linux, a mini-howto

This is a very lite tutorial to do some OCR on a Linux using ImageMagick to optimize images and using athe trial version of a commercial OCR software, OCR shop XTR, that is really powerful and can do the job very well.

Note: I've no relation with www.vividata.com and I'm not advertising their product.. It's just a product that I've tried and could respect so much.

Requirements:

1- Imagemagick , I think you can find a package of Imagemagick on any famous Linux distro, either oriented for desktop or servers ... if you didn't find any you can download and install it from its website.. http://www.imagemagick.org/

2- OCR shop XTR for Linux, you can download your trial version here. You'll have to provide your machine hostname and your netwrok card mac address to get the key. Installation is really very easy. You can follow the instructions here.

Note: Images processing can take very long time when you processes hundreds of images ... so be patient and test your options on some sample images at 1st before you apply it to many images ...

Steps:

1- We need to optimize your images to be well recognized by ocrxtr ..So we'll use "convert" command to which is bundled with ImageMagick package to do the job .. You can skip this step if you see that you really have images with good resolution and clarity ..

convert sourceimage.ext -resize 200% -fill white -tint 60 -level 0%,80% -sharpen 3 -compress none -monochrome destinationimage.tif

You can finely adjust those options to adapt it for your needs but those were what worked for me after too many attempts.

2- If you skipped step one you need to do this so as the image can be used successfully with ocrxtr :

convert sourceimage.ext -compress none -monochrome destinationimage.tif

3- Now let's use the OCR, assuming that you need to get pdf files that contain the text hidden under the image, transparently, and keep the images in its proper state and you assume overwriting the destination file ..

/opt/Vividata/bin/ocrxtr -overwrite=y -in_res=150 -out_text_format=pdf -out_text_name="%s.pdf" destinationimage.tif

You can read ocr xtr documentation if you want to play with other options..

الأحد، ١٦ ديسمبر، ٢٠٠٧

emovix-modfied, a live mini media player

I've modified the emovix http://movix.sourceforge.net/ CD, that's generated from emovix script not the movix or movix2 versions as both have higher hardware requirements and it was easier to work this way ..

You can use this CD to convert your old PC with less than 32 MB of RAM to a console-based media player that can run ogg, mp3, windows media formats, real media files and some others ... it doesn't support very new versions of codecs and it doesn't support quicktime or such other formats ..

Its size is quite small .. only of 16 MB ..

This is an alpha 0 vesion so it can contain bugs, and it's not installable to hard disk, unless you hack it or so ..

I expect the next release to have a simple installation script so as you don't have to insert the CD, also there can be a USB version too ..

Mainly, I've modified startup scripts and movix.pl file .. I've disabled the movix menu .. so simply use mplayer as a command ..

to play any file, at first you should mount the media where the file is, then simply run

mplayer /path/to/media/file.ext

You can run this CD on a PC with 32 MB of RAM and a graphic card of 2 MB ... I could run some video files on a test machine with those configuration ...

Here is the download link, http://fun.sharnoby.net/emovix-modified-alpha0.iso

I hope you enjoy it and report me any bugs or suggestios you've. :)