Dengan kode inaugural, korpus ini berisi pidati pengukuhan presiden sebanyak 55 presiden, mulai dari yang pertama 'Washington' sampai yang terakhir di 2009 ini 'obama'.
berikut cara mengidentifikasinya
>>> from nltk.corpus import inaugural
>>> inaugural.fileids()
['1789-Washington.txt', '1793-Washington.txt', '1797-Adams.txt', ...]
>>> [fileid[:4] for fileid in inaugural.fileids()]
4 merupakan digit angka. jadi selama belum tahun 1000, semua bakal ditampilin
['1789', '1793', '1797', '1801', '1805', '1809', '1813', '1817', '1821', ...]
Plot of a Conditional Frequency Distribution: all words in the Inaugural Address Corpus that begin with america or citizen are counted; separate counts are kept for each address; these are plotted so that trends in usage over time can be observed; counts are not normalized for document length.
>>> cfd = nltk.ConditionalFreqDist(
... (target, fileid[:4])
... for fileid in inaugural.fileids()
... for w in inaugural.words(fileid)
... for target in ['america', 'citizen']
... if w.lower().startswith(target))
>>> cfd.plot()
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment