Sunday, May 24, 2009

US Inaugural President Address

Dengan kode inaugural, korpus ini berisi pidati pengukuhan presiden sebanyak 55 presiden, mulai dari yang pertama 'Washington' sampai yang terakhir di 2009 ini 'obama'.
berikut cara mengidentifikasinya
>>> from nltk.corpus import inaugural
>>> inaugural.fileids()
['1789-Washington.txt', '1793-Washington.txt', '1797-Adams.txt', ...]
>>> [fileid[:4] for fileid in inaugural.fileids()]
4 merupakan digit angka. jadi selama belum tahun 1000, semua bakal ditampilin
['1789', '1793', '1797', '1801', '1805', '1809', '1813', '1817', '1821', ...]

Plot of a Conditional Frequency Distribution: all words in the Inaugural Address Corpus that begin with america or citizen are counted; separate counts are kept for each address; these are plotted so that trends in usage over time can be observed; counts are not normalized for document length.

>>> cfd = nltk.ConditionalFreqDist(
... (target, fileid[:4])
... for fileid in inaugural.fileids()
... for w in inaugural.words(fileid)
... for target in ['america', 'citizen']
... if w.lower().startswith(target))
>>> cfd.plot()

No comments:

Post a Comment