PDA

View Full Version : As properly view ascii files?



Flako000
06-Sep-2013, 22:16
Hello,
I'm migrating a cobol source files of a SCO Unix SLES11. These files have double line boxes (ascii code 200 to 206, as http://www.elcodigoascii.com.ar/), the source files seem to be encoded with ISO-8859.

I need to insert these codes/symbols and display correctly.
I tried with vim and options 'setglobal fenc', 'set fileencodings' and export TERM=ansi, but I'm doing water properly set.

Since vim with ': e + + enc = CP437' looks fine, but I can not insert with Contol+V, it seems that the character map is still non-ascc, as if the type utf8 (I see from ': digraphs')

That's what I should set? vim or the vaiable TERM?

thanks
(use translate.google.com :( )

jmozdzen
09-Sep-2013, 12:01
Hi flako000,

code pages can be a ***** :)

> These files have double line boxes (ascii code 200 to 206, as http://www.elcodigoascii.com.ar/)

Formally these are *not* ASCII characters, which covers only 0 to 127... (http://en.wikipedia.org/wiki/Ascii). Your description sounds more like CP437.

> ISO-8859

*which* ISO 8859? There are quite a number of them

> That's what I should set? vim or the vaiable TERM?

While you could change your system from using UTF-8 to i.e. some ISO 8859 codepage, I'd wouldn't suggest to do that. I'd rather find a proper tool that is able to handle the non-system encoding and convert it properly for display.

When I run "vim" on my UTF-8 system to edit a ISO-8859-15 encoded file, it will open the file with all characters displayed properly, and give a "[converted]" message after opening. I then can edit the file as I like and all characters are stored ISO-8859-15 encoded. But of course, there are no "box characters" in ISO 8859-1(5), especially not at code point 200 to 206 (see http://en.wikipedia.org/wiki/Iso_8859-1#Codepage_layout)

> These files have double line boxes (ascii code 200 to 206, as http://www.elcodigoascii.com.ar/), the source files seem to be encoded with ISO-8859

If they contain the "box characters", then they cannot be ISO-8859-*-encoded, they are most probably "CP437"-encoded (http://en.wikipedia.org/wiki/Code_page_437). Unfortunately, Linux seems not to be prepared to handle that CP easily. I found http://forums.opensuse.org/english/get-technical-help-here/64-bit/475296-installing-cp437.html, but didn't actually try to follow those instructions, YMMV...

Regards,
Jens

Flako000
10-Sep-2013, 22:53
Hello jmozdzen
Based on what you indicated I GOT move a little but not everything you need. for now is 'YMMV' :)
I write what I did:

With vim (now that I understand well) is the problem of editing. With the option ':e ++enc=CP437' is displayed correctly and Ctrl + K 'xx' can be inserted (not just the range 200-206)

With gnome-terminal setting 'Hebrew (IBM862).' half working. Corretamente Displays a 'cat source.cob' but the run does not look right.

Probe all charsets of luit, but it seems to work (some are not in SLES)

I share your opinion, not reconfigure the entire linux.
So I'm looking for the sets of variables and their values ​​should be modified to work properly.

If you have any other suggestions I agradecere, meanwhile I keep reading ...
Thanks again,

bimbim0302
15-Sep-2013, 08:29
I am not sure about this problem, I just hope you will soon overcome it

jmozdzen
16-Sep-2013, 15:37
Hi Flako000,

as I've had a similar (but much easier) case last Friday, here's a quick summary of what to look out for (this is all about *command line* stuff, not native X11 applications):

- you have a shell (i.e. "bash") with some $LANG setting, telling applications which code page is to be used when displaying things
- you have a program that will read the shell's output stream, interpret it according to *the code page settings of this program* and display it on your display ("program" may i.e. be "konsole" inside an X11 KDE session, but as well "putty" under MS Windows or anything else)
- you have (in your case) a file which contains text that is encoded in a specific codepage, that differs from your general system setup
- there's an application to display the contents of the file (in your case "vim")

More than one of the above can and will alter the character stream and code page!

Let's say you're using bash with some UTF-8 (according to $LANG) setting, an ISO 8859-1 file that contains "" (an umlaut character not in ASCII, used i.e. in the German language, HTML: ä). You are working within a KDE session, have opened "konsole" set to UTF-8, where your "bash" is running and you're calling "vim" to open that file.

You will probably notice that vim reports a "converted" file, but will display the character correctly. Internally (inside vim's code), the file is treated as ISO 8859-1, but vim "prints out" UTF-8 sequences, which are then taken by "konsole" and treated correctly.

It you, instead of using "vim", will use "cat" to output the contents, then you'll see a "funny character", as the code position for the umlaut character doesn't match the UTF-8 code point - but that's how "konsole" will treat the output of "cat" (which does no conversion, unlike "vim").

"konsole" is capable of changing its code page handling - via "view" -> "set encoding", you could change to ISO 8859-1. Then, after *another invocation of "cat"*, you'll see the file content correctly (the old output will not be "re-parsed", but remains displayed as is). Since "konsole" now is using ISO 8859-1, invoking "vim" will show you supposedly *two* characters content of the same unchanged file - which is nothing else then the two bytes that are used to UTF-8 encode the umlaut (remember: "vim" will still believe your terminal to run on UTF-8, since $TERM wasn't changed, and thus internally converts the file content to UTF-8 for display).

Were you to compile some program from that ISO 8859-1 file, the strings in that "program" would most probably be output as ISO 8859-1. So, if you'd run that program on some "konsole" that is set to UTF-8, the output will look garbled. If the konsole is on ISO 8859-1 though, everything will look fine. Again, this is because the program does no conversion of the bytes output for your strings, and will be interpreted accordingly by "konsole".

So if using "vim" or some other auto-display-converting tools for string handling, you need to be especially careful for those "converted" messages when opening files. New files will be created by "vim" in the default character page of your session. So it's really easy to end up with files in different character sets, within one and the same program. Not what you really want. ;)

What I don't understand is why you start playing with totally different character pages though:
> With gnome-terminal setting 'Hebrew (IBM862).' half working.

Either that file is in IBM862, or it's CP437. Make sure your editor tools do support that target code page and convert the *display* of the file's content to the character page your session runs in (i.e. UTF-8). I understand that this is not easy, and I have no solution for working with CP437. But mixing CPs wildly will only make things worse, not better.

Regards,
Jens