Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Exactly how you view the results is a little complicated. Every feature in the features matrix has a weighting that indicates how strongly each word applies to that feature, so you can try displaying the top five or ten words in each feature to see what the most important words are in that feature. The equivalent column in the weights matrix tells you how much this particular feature applies to each of the articles, so it’s also interesting to show the top three articles and see how strongly this feature applies to all of them.
Add a new function called showfeatures to newsfeatures.py:
from numpy import *
def showfeatures(w,h,titles,wordvec,out='features.txt'):
outfile=file(out,'w')
pc,wc=shape(h)
toppatterns=[[] for i in range(len(titles))]
patternnames=[]
# Loop over all the features
for i in range(pc):
slist=[]
# Create a list of words and their weights
for j in range(wc):
slist.append((h[i,j],wordvec[j]))
# Reverse sort the word list
slist.sort( )
slist.reverse( )
# Print the first six elements
n=[s[1] for s in slist[0:6]]
outfile.write(str(n)+'\n')
patternnames.append(n)
# Create a list of articles for this feature
flist=[]
for j in range(len(titles)):
# Add the article with its weight
flist.append((w[j,i],titles[j]))
toppat....