|
Journal of the American Society for Information Science and Technology
(JASIST) -- Table of Contents
Contributed by Richard Hill
American Society for Information Science and Technology
Silver Spring, Maryland, USA
Fax: (301) 495-0810
Phone: (301) 495-0900
[email protected]
VOLUME 52, NUMBER 8
[Note: below the contents of Bert Boyce's "In This Issue" has been cut into the
Table of Contents.]
CONTENTS
Editorial
In this issue
Bert R. Boyce
Page 605
Research
- Mooers' Law: In and out of Context
Brice Austin
Page 607, Published online 26 April 2001
In this issue we begin with ``Mooers' Law: In and Out of Context.''
Brice
points out that Moores meant that having information was not always
considered
a good thing by a user since it required the expenditure of effort to
make use
of it, not that a system might not be used because the use itself was an
expenditure
of extra effort. While this may be a principle of retrieval usage it is
not that
stated by Moores. This leads to a suggestion that system use depends upon
the
user's environmental level of desire for information; If high any IR
system will
be used, if low, no IR.
- Author Inflation Leads to a Breakdown of Lotka's Law
Hildrun Kretschmer and Ronald Rousseau
Page 610, Published online 27 April 2001
Fractional counting of authors of multi-authored papers has been shown
to
lead to a breakdown of Lotka's Law despite its robust character under
most circumstances.
Kretschmer and Rousseau use the normal count method of full credit for
each author
on two five-year bibliographies from each of 13 Dutch physics institutes
where
high co-authorship is a common occurrence. Kolmogorov-Smirnov tests were
preformed
to see if the Lotka distribution fit the data. All bibliographies up to
40 authors
fit acceptably; no bibliography with a paper with over 100 authors fits
the distribution.
The underlying traditional "success breeds success" mechanism assumes new
items
on a one by one basis, but Egghe's generalized model would still account
for
the process. It seems unlikely that Lotka's Law will hold in a high
co-authorship
environment.
- Visualization of Term Discrimination Analysis
Jin Zhang and Dietmar Wolfram
Page 615, Published online 26 April 2001
Zang and Wolfram compute the discrimination value for terms as the
difference
between the centroid value of all terms in the corpus and that value
without
the term in question, and suggest selection be made by comparing density
changes
with a visualization tool. The Distance Angle Retrieval Environment
(DARE) visually
projects a document or term space by presenting distance similarity on
the X
axis and angular similarity on the Y axis. Thus a document icon appearing
close
to the X axis would be relevant to reference points in terms of a
distance similarity
measure, while those close to the Y axis are relevant to reference points
in
terms of an angle based measure. Using 450 Associated Press news reports
indexed
by 44 distinct terms, the removal of the term ``Yeltsin'' causes the
cluster
to fall on the Y axis indicating a good discriminator. For an angular
measure,
cosine say, movement along the X axis to the left will signal good
discrimination,
as movement to the right will signal poor discrimination. A term density
space
could also be used. Most terms are shown to be indifferent
discriminators. Different
measures result in different choices as good and poor discriminators, as
does
the use of a term space rather than a document space. The visualization
approach
is clearly feasible, and provides some additional insights not found in
the computation
of a discrimination value.
- Scholarly Use of Internet-Based Electronic Resources
Yin Zhang
Page 628, Published online 11 April 2001
By Internet resources Zhang means any electronic file accessible by
any Internet
protocol. Their usage is determined by an examination of the citations to
such
sources in a nine-year sample of four print and four electronic LIS
journals,
by a survey of editors of these journals, and by a survey of scholars
with "in
press" papers in these journals. Citations were gathered from Social
Science
Citation Index and manually classed as e-sources by the format used. All
authors
with "in press" papers were asked about their use and opinion of Internet
sources
and for any suggestions for improvement. Use of electronic sources is
heavy and
access is very high. Access and ability explain most usage while
satisfaction
was not significant. Citation of e-journals increases over the eight
years. Authors
report under citation of e-journals in favor of print equivalents.
Traditional
reasons are given for citing and not citing, but additional reasons are
also
present for e-journals.
- FEATURES Real-Time Adaptive Feature and Document Learning for Web
Search
Zhixiang Chen, Xiannong Meng, Richard H. Fowler, and Binhai Zhu
Page 655, Published online 27 April 2001
Chen et alia report on the design of FEATURES, a web search engine
with adaptive
features based on minimal relevance feedback. Rather than developing user
profiles
from previous searcher activity either at the server or client location,
or updating
indexes after search completion, FEATURES allows for index and user
characterization
files to be updated during query modification on retrieval from a general
purpose
search engine. Indexing terms relevant to a query are defined as the
union of
all terms assigned to documents retrieved by the initial search run and
are used
to build a vector space model on this retrieved set. The top ten weighted
terms
are presented to the user for a relevant non-relevant choice which is
used to
modify the term weights. Documents are chosen if their summed term
weights are
greater than some threshold. A user evaluation of the top ten ranked
documents
as non-relevant will decrease these term weights and a positive judgement
will
increase them. A new ordering of the retrieved set will generate new
display
lists of terms and documents. Precision is improved in a test on Alta
Vista searches.
- An Empirical Comparison of Visualization Tools to Assist Information
Retrieval
on the Web
Misook Heo and Stephen C. Hirtle
Page 666, Published online 26 April 2001
The reader of a hypertext document in a web environment, if maximum
use of
the document is to be obtained, must visualize the overall structure of
the paths
through the document as well as the document space. Graphic visualization
displays
of this space, produced to assist in navigation, are classified into four
groups,
and Heo and Hirtle compare three of these classes as to their
effectiveness.
Distortion displays expand regions of interest while relatively
diminishing the
detail of the remaining regions. This technique will show both local
detail and
global structure. Zoom techniques use a series of increasingly focused
displays
of smaller and smaller areas, and can reduce cogitative overload, but do
not
provide an easy movement to other parts of the total space. Expanding
outline
displays use a tree structure to allow movement through a hierarchy of
documents,
but if the organization has a wide horizontal structure, or is not
particularly
hierarchical in nature such display can break down. Three dimensional
layouts,
which are not evaluated here, place objects by location in three space,
providing
more information and freedom. However, the space must be represented in
two dimensions
resulting in difficulty in visually judging depth, size and positioning.
Ten students were assigned to each of eight groups composed of viewers
of
the three techniques and an unassisted control group using either a large
(583
selected pages) or a small (50 selected pages) web space. Sets of 10
questions,
which were designed to elicit the use of a visualization tool, were
provided
for each space. Accuracy and time spent were extracted from a log file.
Users
views were also surveyed after completion. ANOVA shows significant
differences
in accuracy and time based upon the visualization tool in use. A Tukey
test shows
zoom accuracy to be significantly less than expanding outline and zoom
time to
be significantly greater than both the outline and control groups. Size
significantly
affected accuracy and time, but had no interaction with tool type. While
the
expanding tool class out performed zoom and distortion, its performance
was not
significantly different from the control group.
- Use of Relevance Criteria across Stages of Document Evaluation: On
the Complementarity
of Experimental and Naturalistic Studies
Rong Tang and Paul Solomon
Page 676, Published online 26 April 2001
Tang and Solomon, based upon their review of the history of topical
and other
than topical criteria in relevance evaluation, decide to look at a two
stage
model where judgements are first made on surrogate records and then on
full document
text to determine if a criteria shift takes place and if so in what
manner and
to what degree. Both a controlled experiment and a naturalistic study
were used
to study the staging of relevance judgement criteria. In the controlled
environment
90 undergraduate Psychology students were instructed to choose papers
that would
help them meet an assignment from 20 preselected papers on broader topic
that
included that assigned. They first selected on the basis of citation and
abstract,
then read the papers, and in each process filled out a questionnaire on
the importance
of each of 15 criteria at each stage of the two-stage process. In the
naturalistic
study 9 Ph.D. Psychology students conducted literature searches to
support their
own research and were asked to think aloud while making their decisions
from
retrieved surrogates, and later filled out a questionnaire while reading
those
materials that they selected and then interviewed at the end of the
process.
Apparently understandability is important at both stages. Importance
increased
at stage two. Cognitive criteria do not all follow the same pattern
across stages.
The controlled group thought quality of information was most important in
stage
one and topicality most important in stage 2. In the naturalistic study
topicality
was most frequent for stage one and research structure for stage two. A
classification
of criteria by their functionality is suggested as a better approach.
First a
division as to whether a criterion is objectively associated with the
document
as opposed to being subjectively associated with a person's expectations;
then
a division based on primary (essential) or secondary (for assistance)
status.
- Multimedia Exploratory Data Analysis for Geospatial Data Mining: The
Case
for Augmented Seriation
Myke Gluck
Page 686, Published online 1 May 2001
To prevent type-one error, statisticians tend to accept the
possibility of
type-two error, which leads to the rejection of hypotheses later shown to
be
true. In both Exploratory Data Analysis and data mining the emphasis is
more
appropriately on the elimination of type-two error. Thus EDA methods,
including
its visualization tools may be appropriate for Data Mining. Seriation,
creates
a matrix of observations and variables, where the cells contain an icon
whose
size represents its value, and permits the movement of rows and columns
in order
to visually discern patterns. Augmented Seriation, a method of data
mining, adds
computer graphics, sound, color, and extra dimensions to the matrix so
that the
analyst has different modalities for pattern observation. Gluck has
developed
software for such analysis.
|