- Footnote 1 -- "... sources":
- This
research was partially supported by NSF/DARPA/NASA
under grant number IRI94-11330.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- Footnote 2 -- "... source,":
-
The newsgroups included the following hierarchies in alphabetical order:
alt.politics, comp, misc, rec, sci, and soc. Only non-empty newsgroups were
used, taken from a two-week period in July, 1997.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- Footnote 3 -- "... of it.":
- Although the Library of Congress publishes the complete
LCC on CD-ROM, it is not built with a programming interface.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- Footnote 4 -- "... records.":
- The distribution formed more
of a Zipf rank-frequency distribution [14] than a Gaussian one.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- Footnote 5 -- "... content.":
- The reason that
we strip the headers is to avoid using the name of the newsgroup and the
cross-posting groups, which appear in the headers, as an aid to classification.
In this way, we attempt to be as unbiased as we reasonably can,
since the purpose of the experiment is to attempt to classify by content
only.
We exclude articles which have no terms which match our list from the MARC
records; these articles include rare aberrations,
less than 0.1% of the articles,
such as one article whose
subject was ``s'' and whose entire content was ``1''.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- Footnote 6 -- "... nodes.":
- These values range from 0 to 1 because we are
using cosine weighting.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.