Friday, November 21, 2008

Wednesday, August 27, 2008

Digression analysis in text ?

People seem to go away from the topic from time to time. Quite often they are adding information to or expanding on the topic, to help the listeners, to think. (or prevent them from clear thinking. ) That is, what they say doesn't always fit into the logical flow of the utterance. This may be a kind of important information for text miner. So what are the cues for digression? As a lecturer, I may say "well, now, let's stop here for a second", "That reminds me ...", "By the way", etc. And when to back to the main topic? "Now, let's return to ..", "Anyway, ..", "Getting back to the subject ...". In order to get a full comprehension, it'd be better not to ignore these, but, how can we let machine learn this?

Saturday, August 23, 2008

Hanzi, Hanzi, Hanzi!

How to deal with Chinese character-word mystery in terms of (western) linguistic morphology?

1. In principle, each character stands for a morpheme (morpho-semantically). Exceptions are 葡萄、踟躕、... where two characters are born to tie together.
In these cases, a morpheme 葡萄 is represented by two characters 葡 and 萄, respectively.

2. The types of morphemes that the character stand for vary. E.g., 今: bound root morpheme; 者: bound suffix; 打: free root morpheme; 了: inflectional
morpheme (?).

3. A Chinese word is composed of some possible combinations like (1) morpheme + word 辛苦 (2) word + morpheme  打字機 (3)  morpheme  喝 (4)
morpheme + morpheme 前進

4. A Chinese compound word is composed of (at least) two words/compound words. Formally, (1) w1+w2+.....電腦 (2) cw1+w2 ...電腦螢幕

5. Criteria of the judgement of Chinese wordhood:
5.1 Stand alone in the similar semantic context
5.2 Psychological reality
5.3 Proper name and fixed expressions

6. Operation:

7. Dubious/counter-intuitive cases:  腳踏車 a word or a compound word?