Showing posts with label Text Processing Technology. Show all posts
Showing posts with label Text Processing Technology. Show all posts

Saturday, July 23, 2011

[Grammatical Inference: Colin de la Higuera]: Notes

In Grammar Induction,  what really matters is the data and the relationship between the data and the induced grammar, whereas in Grammatical Inference the actual learning process is what is central and is being examined and measured, not just the result of the process. 


GraIn has been emerged as an independent field connecting bioinformatics, computational linguistics, formal language theory, machine learning and pattern recognition. GraIn is a task where the goal is to learn or infer a grammar (or some device that can generate, recognize or describe strings) for a language and from all sorts of information about this language.



學習語言

「學習語言」是什麼意思?回答此問題,免不了必須先定義「語言」,或是比較方便一點去看,「語言」怎麼被「表達」(represent)出來。這樣想的話,我們有的一種重要的觀看素材,就是「語料」(linguistic data)。

觀看「語料」的一個形式角度,就是把語料看成是字串的組合學(stringology)。這樣下去,就可以再談「語言學習」的機制,或演算法是有哪些可能?

我覺得,從機器學習的角度,人類學習語言的方式與經驗固然值得參考,但是是不是最好的學語言的方式,就很難說。當然我們要先有對於學習的評測(evaluation)有合理的基準,或定義。(叉開說,我覺得絕大多數的討論,都跟定義、立場息息相關)

漢字系統是個很奇妙的東西。從一開始想,我就覺得很著迷,在博士論文中也試圖要解決。後來發現自不量力,這種有歷史深度的東西,沒有相應深厚的人文基礎與計算背景,只會對漢字學研究領域及其社會心理效應,帶來噪音與不安。這點跟佛法很像,它直指最深刻的宇宙人生基本道理,但是由功夫不到家的人去講述傳佈,反而造成「污名化」。其實,這也是合理的。那麼容易理悟,我們還在書堆中混什麼,對吧?

據說一萬小時可成專家,我應該要開始。

Wednesday, August 27, 2008

Digression analysis in text ?

People seem to go away from the topic from time to time. Quite often they are adding information to or expanding on the topic, to help the listeners, to think. (or prevent them from clear thinking. ) That is, what they say doesn't always fit into the logical flow of the utterance. This may be a kind of important information for text miner. So what are the cues for digression? As a lecturer, I may say "well, now, let's stop here for a second", "That reminds me ...", "By the way", etc. And when to back to the main topic? "Now, let's return to ..", "Anyway, ..", "Getting back to the subject ...". In order to get a full comprehension, it'd be better not to ignore these, but, how can we let machine learn this?