|
Journal of the American Society for Information Science (JASIS) -- Table of Contents
Contributed by Richard Hill American Society for Information Science Silver Spring, Maryland, USA
[email protected]
VOLUME 51, NUMBER 13
[Note: This table of Contents includes a new feature.
You will note that entries include a new item, "Published online 6 July
2000" after the page number. This refers to Wiley's "Early View" feature,
posting articles as soon as they are approved. ASIS members who wish
electronic access but didn't so elect can contact <[email protected]>.]
CONTENTS
In this issue
Bert R. Boyce
Page 1157
Research
- Web-Based Analyses of E-journal Impact: Approaches, Problems, and
Issues
Stephen P. Harter and Charlotte E. Ford
Page 1159. Published online 5 September 2000
We begin with a look by Harter and Ford at the similarity and
differences between citation in scholarly papers and hyper-linking in
scholarly electronic journal articles. Using the 39 e-journals of
Harter's previous study of impact of e-journals, less those that required
subscription or were defunct, impact was measured through back-links.
E-journals exist at more than one location, in multiple formats, and have
multiple URLs. There is no clear way to gather all possibilities. Thus
one link search per e-journal was conducted using only the http// format, and
choosing the URL listed in the most directories. Because of the normal
hierarchical directory structure of the sites, a single truncated URL
brings in all links to the home page, the articles, and perhaps
associated files. In the cases where a hierarchical directory structure did not
occur a second search at the chief articles site was carried out. Three engines
provided the link search capability; AltaVista, which was very
inconsistent in day to day figures, HotBot, which did not provide the needed
truncation capability, and Infoseek, which produced only half the hits of the other
two. However, the three ranked the journals in a very similar fashion,
with high correlation, and so Infoseek was chosen for its consistency. Saved
search results were concatenated into a file for each of the 39
e-journals with up to 500 URLs in each. Using Grab-a-Site the web pages associated
with these URLs were collected. Pearl programs computed the total number
of back-links, the number to different parts of the e-journal site, and the
numbers generated internally and externally. Self links are quite high at
about 50%; only one in 20 links are to external e-journal articles. Total
external back-links correlate strongly with back-links to external
articles. There appears to be no correlation between citation ranking of
e-journals and back-link ranking. File types linked to e-journals are very
diverse.
- Predicting the Effectiveness of Naive Data Fusion on the Basis of
System Characteristics
Kwong Bor Ng and Paul B. Kantor
Page 1177. Published online 5 September 2000
In system level data fusion, the retrieval status values assigned
by multiple systems are combined to improve overall performance. Ng and
Kantor test fusion against the standard of an "oracle" choice of system made
before search. The measure used, r, is based upon p100, which is the
cumulated number of relevant documents retrieved prior to reaching the
one-hundred-and-first position in a ranked list, divided by 100. The
measure r is the p100 of the poorer scheme over the p100 of the better
scheme. Retrieval scheme similarities are characterized by a measure z
based on the number of pairs of documents placed in different order by
each of two schemes. Measuring the effectiveness of a procedure for predicting
the effectiveness of data fusion requires the use of the "Receiver
Operating Characteristic", ROC, a plot of the correctly predicted
effective cases as a function of ineffective cases predicted to be effective.
Output lists for TREC4 were used for training and TREC5 for
testing. The ordering of the fused list is determined by the sum of the normalized relevance scores. When fusion gives better performance the cases are generally above the z + r = 1 line and concentrated on the right side indicating that dissimilar outputs with comparable performance lead to effective fusion. Curves
generated by logistic regression were used to generate classification
scores to create ROC curves. With a detection rate below 75% predictive
power is far better than random. A non-parametric method ranking the data
after splitting it into 100 bins yields a more powerful ROC curve on the
training data, but has less power on the test data.
- Bibliometric Information Retrieval System (BIRS): A Web Search
Interface Utilizing Bibliometric Research Results
Ying Ding, Gobinda G. Chowdhury, Schubert Foo, and Weizhong Qian
Page 1190. Published online 8 September 2000
BIRS, (Bibliometric Information Retrieval System) provides Web based
co-author, co-citation, and similar keyword maps which can be used to
generate query terms for ten search engines accessible through a common
interface. The maps, created by Ding et alia, are structured from a ten
year database of library and information science literature and layered
as to level of detail. Thirty-five students chose one of six topics provided
and searched in their choice of search engine. The top 20 hits were then
classed as relevant or not relevant. The subjects then used BIRS to
expand their query information and searched the same engine again. They were
then asked to compare the results and comment on BIRS. Eighty percent reported
an improved understanding of the subject area, seventy seven percent
agreed the BIRS was a help in query construction with 91% using the keyword
facility. Actual variations in relevant and retrieved documents are not
reported.
- Shape Recovery: A Visual Method for Evaluation of Information
Retrieval Experiments
Mark Rorvig and Steven Fitzpatrick
Page 1205. Published online 7 September 2000
Rorvig and Fitzpatrick form a document similarity matrix and
use multidimensional scaling to create a set of Cartesian points for visual
evaluation of retrieval performance. The distance from the centroid
document in each cluster to each document, up to one standard deviation
of the mean of all these distances, is then computed, for correlation with
control clusters, and the test and control clusters are displayed. Using
full text from five topic document sets from NIST TREC as control, and 50
and 200 term vectors from a local dictionary with and without stemming as
the four treatments, both visual and correlation comparisons are made.
High apparent shape distortion agrees with low correlation and vice versa.
Stemming has the biggest positive effect when the most distortion is
apparent. The application of categories moves far more non-relevant
documents to the extremities of the visual field than it does relevant
documents. Stemming brings the visual display back closer to the control
but brings back many non-relevant documents.
- Empirical Studies of End-User Information Searching
A.G. Sutcliffe, M. Ennis, and S.J. Watkinson
Page 1211. Published online 8 September 2000
Sutcliffe, et alia, using 17 medical students as subjects,
searched 4 topics on MEDLINE using WinSPIRS. Subjects notes, search strategies and
search history were recorded and their actions and aloud thoughts
subjected to video and audio recording. Recall made use of a standard relevant set;
chosen by experts from a union of subject outputs; precision was defined
as both subject relevant and independent judge relevant over subject
relevant documents. Average recall was 14%. Novices significantly out-performed
more experienced searchers on one question but other differences were not
significant. More experienced searchers had significantly similar ranking
orders of the queries for recall, novices seemed to find all questions
equally difficult. No differences were apparent for precision. There were
no significant differences in retrieval times or evaluation times overall
but some questions indicated differences. Evaluation time was positively
correlated with query complexity. More experienced searchers used more
query iterations and used broadening and narrowing strategies while
novices favored trial and error. Novice searchers used only the AND operator.
These results are seen as indicating the failure of current user interfaces to
assist the searcher.
- Success, a Structured Search Strategy: Rationale, Principles, and
Implications
Chaim Zins
Page 1232. Published online 11 September 2000
Zins evaluates a procedure which he has given the name
"Success," and which involves determining the problem, locating the resources to search, defining the search terms, and executing the search. Three rounds of
structured questionnaires were sent to 15 information specialists in a
typical Delphi approach in an attempt to analyze the strategy's
principles and rationale, review its guidelines, forms and tables, and discuss its
implications for user instruction. There was disagreement on the need for
subject expertise, agreement that both systematic thinking and creativity
were required. A need for a fifth phase, evaluation came forward, as did
the need for a methodology selection guideline and a post evaluation
reiteration guideline. The five phases were considered indispensable, but
sometimes performed using remembered information and thus not observable.
Book Review
- Books, Bytes, and Bridges: Libraries and Computer Centers in Academic
Institutions, edited by Larry Hardesty
P. Scott Lapinski
Page 1248. Published online 11 September 2000
|