How to detect file is binary or ascii?

Post a reply

Confirmation code
Enter the code exactly as it appears. All letters are case insensitive.
Smilies
:D :) ;) :( :o :shock: :? 8-) :lol: :x :P :oops: :cry: :evil: :twisted: :roll: :!: :?: :idea: :| :mrgreen: :geek: :ugeek: :arrow: :angel: :clap: :crazy: :eh: :lolno: :problem: :shh: :shifty: :sick: :silent: :think: :thumbup: :thumbdown: :salute: :wave: :wtf: :yawn: :facepalm: :bravo: :dance: :beard: :morebeard: :xmas: :HeHe: :trollface: :cookie: :rainbow: :monkeysee: :monkeysay: :happybday: :headwall: :offtopic: :superhappy: :terms: :beer:
View more smilies

BBCode is ON
[img] is OFF
[flash] is OFF
[url] is ON
Smilies are ON

Topic review
   

Expand view Topic review: How to detect file is binary or ascii?

Re: How to detect file is binary or ascii?

Post by joedf » 12 Oct 2013, 19:23

Thanks ;)

Re: How to detect file is binary or ascii?

Post by panofish » 12 Oct 2013, 15:15

Holy Smokes Joe ... your a freakin genius!!! WELL DONE SIR!

Re: How to detect file is binary or ascii?

Post by joedf » 10 Oct 2013, 20:33

with the help of this C implementation:
Spoiler
here is a rough copy of the implementation:
Spoiler
Here is the resulting function:

Code: [Select all] [Expand]GeSHi © Codebox Plus



Note: ASCII Extended char-set support has not been added yet.

cheers! ;)

Re: How to detect file is binary or ascii?

Post by MilesAhead » 10 Oct 2013, 15:11

It's no biggie for me. Just a matter of curiosity. I know I looked through the 'file' script. But it was years ago. I was probably running Mandrake 9.1 then. But I bet it does fall through to text as last resort. The frustration is just generally dealing with these super restricted library computers. Not anything to do with this thread. :)

Re: How to detect file is binary or ascii?

Post by joedf » 10 Oct 2013, 14:02

Well I'm working on it when I arrive home, don't worry I can't detect utf-8 with BOM
Just need to get home :P

Re: How to detect file is binary or ascii?

Post by MilesAhead » 10 Oct 2013, 13:58

joedf wrote:the file command actually only checks for the the "signature"... like for exe it's "MZ", bmp it's something like "BM"


I assumed it did so on stuff like image files, printer format files like pdf postscript etc.. but I thought for text it might be able to detect ascii/unicode types. But you're saying "text" is the fall through if nothing else is found?

Dang! I wish I could just look at the script. Hopefully soon I'll have a machine instead of using a library loaner. :)

Re: How to detect file is binary or ascii?

Post by joedf » 10 Oct 2013, 13:44

the file command actually only checks for the the "signature"... like for exe it's "MZ", bmp it's something like "BM"

Re: How to detect file is binary or ascii?

Post by MilesAhead » 09 Oct 2013, 13:07

Hmmm, I'm curious how "file" does it. But I can't look at tar.gz files on this library Windows PC. If anyone is curious, here's the link to the source archive.ftp://ftp.astron.com/pub/file/

I believe it's a bash shell script.

Re: How to detect file is binary or ascii?

Post by joedf » 09 Oct 2013, 09:20

I know of that, hmm but I didn't think that they would be needed...
Hmm Oh Well! I'll add support for that too! Thanks for your feedback ;)

Re: How to detect file is binary or ascii?

Post by just me » 09 Oct 2013, 00:49

Hi joedf,

you might consider that extended ASCII codes like "Ü" (154) are valid in some European languages.

A file without a BOM might be considered to be binary if you find a NULL byte within the first nnn bytes, though it's still a guess.

Re: How to detect file is binary or ascii?

Post by joedf » 08 Oct 2013, 22:01

HotKeyIt wrote:Probably IsBOM() will help?

Yes thank you, it has helped as an example :)
I have done some research on unicode at wikipedia, the official website, Unicode tables, and etc.
here is what i have. seems to work well ;)

Code: [Select all] [Expand]GeSHi © Codebox Plus



Utf-8 without BOM is a know flaw.. working on it :P
when this flaw is fixed, i will add it to the functions topic.
Dont worry, i know how to fix it, just need to sleep first :lol:

cheers! ;)

Re: How to detect file is binary or ascii?

Post by HotKeyIt » 08 Oct 2013, 12:44

Probably IsBOM() will help?

Re: How to detect file is binary or ascii?

Post by panofish » 07 Oct 2013, 16:49

Sorry about that joedf. You are correct. What you created works great for what I need. I just thought I'd point that out for anyone else that may need this. THANKS!

Re: How to detect file is binary or ascii?

Post by joedf » 07 Oct 2013, 16:17

panofish wrote:If the ascii file is not encoded as ANSI, such as UCS-2 Big Endian... then it isBinFile will show Binary because of the 0 bytes.

I Knew about that... But what you're asking is Actually Unicode.
ASCII and Unicode are 2 different character sets.
The original question was precisely ASCII.

I will update it and try to conform, for it to function with Unicode also.
Will post it soon!

Cheers! ;)

Re: How to detect file is binary or ascii?

Post by panofish » 07 Oct 2013, 16:07

If the ascii file is not encoded as ANSI, such as UCS-2 Big Endian... then it isBinFile will show Binary because of the 0 bytes.

Re: How to detect file is binary or ascii?

Post by joedf » 06 Oct 2013, 09:01

Thanks :) if at one point, it does not work, increase the "tolerance" and If it still doesn't work for a certain file..
Report it here, and I'll fix it. ;)

Re: How to detect file is binary or ascii?

Post by Guest10 » 06 Oct 2013, 04:11

tested and works great. i'll be sure to find some applications for this in my scripts! :lol:
joedf wrote:I have made a function: isBinFile
It reads the first few bytes (default: 5) and determines if that byte is within ASCII Printable Chars Range (9-13, 32-126)
seems to work well...

Code: [Select all] [Expand]GeSHi © Codebox Plus



cheers!

Re: How to detect file is binary or ascii?

Post by joedf » 04 Oct 2013, 18:18

I have made a function: isBinFile
It reads the first few bytes (default: 5) and determines if that byte is within ASCII Printable Chars Range (9-13, 32-126)
seems to work well...

Code: [Select all] [Expand]GeSHi © Codebox Plus



cheers!

Re: How to detect file is binary or ascii?

Post by joedf » 04 Oct 2013, 14:27

I have better, but I'm Not at home right now, so i dont have my "setup"

Re: How to detect file is binary or ascii?

Post by MilesAhead » 04 Oct 2013, 13:21

I would look for the source of the Linux "file" command. It's pretty good at catching text files and some printer format file types. I suspect executable it uses the file attribute info that's not built into NTFS but is in Linux file systems.

Top