Footnote 1 -- "... sources":
This research was partially supported by NSF/DARPA/NASA under grant number IRI94-11330.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Footnote 2 -- "... source,":
The newsgroups included the following hierarchies in alphabetical order: alt.politics, comp, misc, rec, sci, and soc. Only non-empty newsgroups were used, taken from a two-week period in July, 1997.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Footnote 3 -- "... of it.":
Although the Library of Congress publishes the complete LCC on CD-ROM, it is not built with a programming interface.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Footnote 4 -- "... records.":
The distribution formed more of a Zipf rank-frequency distribution [14] than a Gaussian one.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Footnote 5 -- "... content.":
The reason that we strip the headers is to avoid using the name of the newsgroup and the cross-posting groups, which appear in the headers, as an aid to classification. In this way, we attempt to be as unbiased as we reasonably can, since the purpose of the experiment is to attempt to classify by content only. We exclude articles which have no terms which match our list from the MARC records; these articles include rare aberrations, less than 0.1% of the articles, such as one article whose subject was ``s'' and whose entire content was ``1''.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Footnote 6 -- "... nodes.":
These values range from 0 to 1 because we are using cosine weighting.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Ron Dolin
January 15, 1998