Want to understand the Unicode standard? Start here!, 2003-04-02
Reviewer rating:
The book has three main parts:
(1) Unicode in essence: an architectural overview of the Unicode standard (six chapters) where you also get bits of terminology and history.
(2) Unicode in depth: A guided tour of the character repertoire (six chapters) where you get a lot about writing systems that can be represented in Unicode, and less about the Unicode characters.
(3) Unicode in action: implementing and using the Unicode standard (five chapters) where you get information aimed at computer programmers that wish to implement parts of the standard or write applications dealing with multilingual text.
Though this book is very long (~800 pages) it is still shorter and a lot more clear than the Unicode standard itself (over 1000 pages).
Code examples are in Java but they are not ment to be complete solutions and so there is no accompanying website or a CD.
Professional programmers are the target audience of this book. The reader is faced with many topics in linguistics, history and data structures. Readers with computer science background would probably appreciate how classic traditional algorithms were adapted and how data structures are used in character sets with a significantly larger number of character than 256.
The author of the book states that the book is about "representing written language in a computer", which may be misleading to some readers. The book is about the Unicode standard. Obviously, there are many other ways to represent written language other than the methods described in the book. As chapter 2 teaches... There are always more ways (sometimes better ways) to represent your data.
Part 2 of the book will not cover every writing system of the world. A better book for that would be "The world's writing systems".
Part3 is probably the most interesting and useful part for programmers (though the first part is important, in my opinion to those who want to UNDERSTAND Unicode).
You can learn about a lot of things and skip many too (depending on your interest and need). I believe that most readers will skip most of the topics.
This is not a book that is read lightly, but it is hellovalot easier and more fun to read than the Unicode standard itself. It appears that once you read this book and get what you want from it, you will end up going to read the Unicode standard only to see updates, hopefully, not for clarifications.
I am dealing with Natural Language Processing and being a Hebrew speaker I also have a lot of text in Hebrew (almost all the time it is Hebrew with other languages too, e.g. documents that contain Hebrew with some English). This book helps understand the difficulties, the current implementations and give you a solid ground to start thinking how you can make things better. Current infrastructure for Hebrew is either poor or not perfect and in most cases the better solutions are proprietary. There seems to be always problems representing 'plain' text in more than one language without stepping into the trap of the soup of different ways to do it. Unicode is one way to do it (arguably, not the best, yet it is alive and growing) I hope this book can help more people understand what they are up against, clear the fog and help people do better implementations.