PDA

View Full Version : iconv and custom charactersets



swadm
19-May-2014, 16:39
Hi there,

does anybody know if it is possible to compile custom charactersets to be used with iconv.

We have a non-standard custom characterset developed by a company, and would like to easily recode output of their application to, say UNICODE.

Thanks in advance, Thomas

jmozdzen
19-May-2014, 17:37
Hi Thomas,

Hi there,

does anybody know if it is possible to compile custom charactersets to be used with iconv.

We have a non-standard custom characterset developed by a company, and would like to easily recode output of their application to, say UNICODE.

Thanks in advance, Thomas

it should be possible, as "all you need" seems to be a shared object carrying your implementation in /usr/lib(64)/gconf and appropriate customization of the mappings file in that directory.

I've never done this before, so all I can give you is the pointer to http://git.savannah.gnu.org/cgit/libiconv.git/tree/HACKING, where you will find the following comment:

Adding new encodings
====================

For an indication which encodings are acceptable in the official version of
GNU libiconv, take a look at NOTES.

For an indication which files need to be modified when adding a new encoding,
look for example at the 2007-05-25 ChangeLog entry for RK1048. The lib/*.h
file for an encoding is usually generated by one of the tools in the tools/
directory. All you need to provide is the conversion table in the format of
the many *.TXT files.


With regards,
Jens

swadm
20-May-2014, 08:46
Jens, thanks, this is a very promising hint.

http://git.savannah.gnu.org/cgit/libiconv.git/tree/NOTES has more details:


Q: How do I add a new character set?
A: 1. Explain the "why" in this file, above.
2. You need to have a conversion table from/to Unicode. Transform it into
the format used by the mapping tables found on ftp.unicode.org: each line
contains the character code, in hex, with 0x prefix, then whitespace,
then the Unicode code point, in hex, 4 hex digits, with 0x prefix. '#'
counts as a comment delimiter until end of line.
Please also send your table to Mark Leisher <mleisher@crl.nmsu.edu> so he
can include it in his collection.
3. If it's an 8-bit character set, use the '8bit_tab_to_h' program in the
tools directory to generate the C code for the conversion. You may tweak
the resulting C code if you are not satisfied with its quality, but this
is rarely needed.
If it's a two-dimensional character set (with rows and columns), use the
'cjk_tab_to_h' program in the tools directory to generate the C code for
the conversion. You will need to modify the main() function to recognize
the new character set name, with the proper dimensions, but that shouldn't
be too hard. This yields the CCS. The CES you have to write by hand.
4. Store the resulting C code file in the lib directory. Add a #include
directive to converters.h, and add an entry to the encodings.def file.
5. Compile the package, and test your new encoding using a program like
iconv(1) or clisp(1).
6. Augment the testsuite: Add a line to tests/Makefile.in. For a stateless
encoding, create the complete table as a TXT file. For a stateful encoding,
provide a text snippet encoded using your new encoding and its UTF-8
equivalent.
7. Update the README and man/iconv_open.3, to mention the new encoding.
Add a note in the NEWS file.


Now the translation file is no problem, already have one.

As SLES 11 does not include 8bit_tab_to_h, we need to get libiconv compiled.

Somehow I cannot get libiconv compiled: version libiconv-1.14 does not have a ./configure file included,
and ./configure present in libiconv-1.11 will not complete successfully:


checking for strerror... yes
checking for readlink... yes
checking for ssize_t... yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating lib/Makefile
config.status: creating srclib/Makefile
config.status: creating src/Makefile
config.status: creating po/Makefile.in
config.status: creating man/Makefile
config.status: creating tests/Makefile
config.status: error: cannot find input file: include/iconv.h.build.in


Hm, should I try more versions of libiconf?

Tx, Thomas

jmozdzen
20-May-2014, 12:30
Hi Thomas,


Somehow I cannot get libiconv compiled: version libiconv-1.14 does not have a ./configure file included,
and ./configure present in libiconv-1.11 will not complete successfully:

I just pulled libiconv 1.14 from the web site (http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.14.tar.gz) and clearly see a "configure" script in the top-most directory:


:/tmp/iconv/libiconv-1.14> l configure
-rwxr-xr-x 1 jmozdzen users 696987 7. Aug 2011 configure*
:/tmp/iconv/libiconv-1.14>


Maybe a bogus download?

Regards,
Jens

swadm
20-May-2014, 14:00
Jens,

thanks!

I pulled from
http://git.savannah.gnu.org/cgit/libiconv.git/snapshot/libiconv-1.14.tar.gz
instead of http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.14.tar.gz:
the later one appears not to be ready to be configured and compiled.

Now I was able to configure, compile and install.

The small utility 8bit_tab_to_h.c seems not to be in the Makefiles, so I had to compile it manually:


gcc -o 8bit_tab_to_h 8bit_tab_to_h.c

Then I was successful in invoking 8bit_tab_to_h:


./8bit_tab_to_h MYCHARSET MYCHARSET < MYCHARSET.TXT
Creating MYCHARSET.h


I copied MYCHARSET.h to the lib directory, and did a "make clean" and
"make" again.

Trying to invoke the iconv utility, I get a "conversion from MYCHARSET unsupported".

I fear that I missed


"You will need to modify the main() function to recognize the new
character set name, with the proper dimensions, but that shouldn't be
too hard"


I looked at the main() funktion in iconv.c, but have no idea where I
should do modifications to recognize my new characterset: too hard for me at last.

Thanks for further help, if you can afford the time.

Thomas

jmozdzen
20-May-2014, 14:30
Hi Thomas,

while I've only had a glimpse at iconv.c, the key seems to be the file /usr/lib64/gconv/gconv-module - there the mapping names are defined, that will be usable to iconv_open() (see man iconv_open for a general description for that call, not too helpful in your case, though).

Please note that all entries in gconv-module need to be in UPPERCASE. And don't forget to run iconvconfig after each change... or to at least remove the gconv-module.cache file before starting test runs.

The effective code for iconv_open is in lib/iconv.c and the two includes iconf_open1.h and iconv_open2.h. Did your work give you a shared object file that you can put in /usr/lib64/gconv and reference in gconv-module?

Regards,
Jens

swadm
20-May-2014, 14:59
Jens, thanks for your hints.

I have the impression that the file /usr/lib64/gconv/gconv-modules (you missed the trailing "s") is part of the glibc iconf implementation:
http://www.gnu.org/software/libc/manual/html_node/glibc-iconv-Implementation.html

I'm not really a programmer (as you will have noticed), but I think that /usr/lib64/gconv/gconv-modules will not be used by my compilation of the iconv standalone program.

Also I wonder why we should list all possible conversions in gconv-modules:

Thanks and regards, Thomas

swadm
20-May-2014, 15:03
and: no, there were no .so files that emanated from what I've done (which might stress my theory that /usr/lib64/gconv and gconv-modules is not relevant for what I'm planning)

jmozdzen
20-May-2014, 15:37
Hi Thomas,

as I understand it, there is no stand-alone "iconv" program? "iconv" simply invokes the library functions from glibc (i.e. iconv_open() ).

An easy test (that I had conducted before answering here) is to modify the modules list and then call "iconv --list". I had inserted a simple alias statement with a distinct name into the module file, and the entry appeared in the list of available conversions. Thus I feel to have proven that indeed gconv-modules is relevant.

> Also I wonder why we should list all possible conversions in gconv-modules:

Because libiconv is a flexible and modular system - the conversion functions are placed in shared objects and the lib needs to know which one to load. That#s the only way you may extend this functionality at all, without risking your server installation by replacing glibc...

> no, there were no .so files that emanated from what I've done

unfortunately, that leaves a task for a programmer - to provide a way to create that module for you.

There are plenty of other gconv shared objects in /usr/lib64/gconv, I wonder where these come from. I had hoped that these were created during the libiconv build... but I don't have the time right now to deep-dive into this subject :(

Regards,
Jens

jmozdzen
20-May-2014, 16:52
Hi Thomas,

as I understand it, there is no stand-alone "iconv" program? "iconv" simply invokes the library functions from glibc (i.e. iconv_open() ).

[...]
> no, there were no .so files that emanated from what I've done

unfortunately, that leaves a task for a programmer - to provide a way to create that module for you.

There are plenty of other gconv shared objects in /usr/lib64/gconv, I wonder where these come from. I had hoped that these were created during the libiconv build...

I tried to follow my own advice: "If everything else fails, read the documentation"... http://www.gnu.org/software/libc/manual/html_node/glibc-iconv-Implementation.html#glibc-iconv-Implementation , especially "6.5.4.4 iconv module interfaces" has it all - but you'll probably need to find a programmer to get the job done.

Regards,
Jens