Regarding getting specific font data from file
I am developing one application in java(swing) font converter. In which I have to upload .doc file .Then if that file contains marathi,english text then I want to change marathi font to unicode.My converter program is completed.Now problem is that how I get marathi font or any other non unicode font from file whcih I want to unicode?
Thanks in advance.
So after some very short wiki research, can I assume you are using Devan鈍ar?script? If so can I assume the docs are using ISCII encoding?With a little google help I found this: http://office.microsoft.com/en-us/word/HP030745551033.aspx
I think I have not clearly mentioned my question.
I want to do like this...
1.Opening a .doc file in my java program(currently I am using text files)
2.Read that .doc file which contains different fonts data.
3.Then get data which is from dvbttyogesh font and convert it to unicode.
At this stage my program can convert the whole file which is in dvbttyogesh font
to unicode.
font is to unicode like apples is to oranges
Unicode assigns a number to a character.
A charset or encoding is what is used to represent the character as bytes.
A font is used to draw the character to the screen.
You are reading in bytes, and you want to assign unicode values to those bytes. You keep using the word 'font' when font really isn't related to converting it to unicode. What encoding are these docs in? UTF8? UTF16? ISCII?
If this were chinese, you might have a BIG5 charset, but you could have any number of fonts to draw those characters to the screen. You are missing an important step in this conversion.
I am sorry. I want to do like this..Parse a doc/odt file and extract text with a particular font tag, convert this text using our converter and then put back the converted unicode text into o/p file in place of the extracted fileand rest of the file remains intact.
I clearly understand what you are saying. Do you clearly understand what I am saying? You are just dismissing what I say as if I don't know what you are doing.
If you are talking about a Microsoft Word Doc, then it is probably encoded in ISCII as stated here http://office.microsoft.com/en-us/word/HP052584541033.aspx
The other link I posted is a conversion tool to do exactly what you are asking. What you are asking is like asking me to convert 'Times New Roman' to Unicode, they aren't the same types of things so there is no real conversion.