Boolean Searching in freeWAIS-sf


Table of Contents:-

  1. Introduction and Credits
  2. Summary
  3. Upper/Lowercase for Search Terms & Boolean Operators
    1. Search terms
    2. Search terms and Boolean operators
  4. Hierarchy of Boolean Operators
  5. Truncation of Search Terms
  6. Stemming

1. Introduction and Credits

Boolean searching capabilities were not part of the original WAIS release, but were soon introduced into WAIS indexing packages due to user demand. All WAIS packages but the earliest versions support AND, OR, NOT, right-hand truncation using the asterisk (*), and the use of parentheses in search statements. How these are implemented differ among the various WAIS versions.
Here we report about freeWAIS-sf. More information about this package can be obtained by connecting to the University of Dortmund where the package was developed.

This document is a free adaptation from four different documents prepared by:
Natalie Oakes Sturr
Systems Librarian, Penfield Library,
SUNY Oswego, Oswego, NY 13126
E-mail: sturr@oswego.edu

By navigating the Web, you can see these four original documents:


2. Summary

  1. Search Terms: only lowercase terms search works consistently
  2. Boolean Operators:
  3. Truncation:
  4. Stemming: Porter stemming available as an option

3. Upper/Lowercase for Search Terms & Boolean Operators

The use of upper or lowercase for either search terms or Boolean operators can affect the results of WAIS searches.

a. Search terms

freeWAIS-sf processes lowercase search terms correctly. However it returns inconsistent search results when search statements include uppercase search terms.
In fact freeWAIS-sf returns 0 (zero) documents when uppercase search terms are combined with either the AND or NOT operators.

b. Search terms and Boolean operators

A major challenge when searching a WAIS database is whether to use uppercase or lowercase Boolean operators. freeWAIS-sf correctly handle both upper and lowercase operators.


4. Hierarchy of Boolean Operators

Boolean operators and the use of parentheses are supported in freeWAIS-sf 2.x. Since operations in parentheses are evaluated first, it is advisable to use parentheses in search statements to assure consistent results.

In freeWAIS-sf, the hierarchy of Boolean operators is:

  1. Operations in parentheses
  2. NOT
  3. AND
  4. OR

NOT is evaluated before AND, which is evaluated before OR. Operations in parentheses are evaluated first of all.

Examples:

Search Statement Executed as
A and B or C
A or B and C
(A or B) and C
(A and B) or C
A or (B and C)
(A or B) and C
A or B and C not D
C not D and A or B
((A or B) and C) not D
(A or (B and (C not D)))
(((C not D) and A) or B)
((A or B) and C) not D


5. Truncation of Search Terms

Right-hand truncation, denoted with an asterisk (*), is listed as a feature of freeWAIS-sf.
This package also provides stemming, similar to automatic right-hand truncation, as an option (see later).

freeWAIS-sf, right-hand truncation produces correct results as long as stemming is NOT turned on.


6. Stemming

Two types of stemming are available in various versions of WAIS: Porter and plural.
freeWAIS-sf offers only Porter stemming.

Porter stemming attempts to identify and index the word stem. If a word and its stem are different, only the word stem is indexed. Thus, it appears to the user that search terms are automatically truncated. For example, both physics and physical stem to physic.

Search Term Retrieves
physics
physical
physics or physical
physics or physical

Plural stemming attempts to identify and index the singular form of a search term. When searching either the singular or plural form of a term, both are retrieved:

Search Term Retrieves
letter
letters
letter OR letters
letter OR letters
family
families
family OR families
family OR families

Although Porter stemming is based on a computer algorithm, the English language is not! This can cause search statements to return unexpected results. For example, play is stemmed to plai. These inconsistencies are compounded when combined with right-hand truncation.

Search Term Retrieves
play
play*
plai
plai*
play, plays
player, playful, playground
play, plays
play, plays, plain, plains,
plainly, plaintiff,plait

Data used to determine how truncation and stemming are implemented in freeWAIS-sf are presented below. They refer to freeWAIS-sf 1.1 that is the more recent version tested.
All searches were performed on a small database constructed for the purpose of testing truncation. The database has 10 records (one word per record) and was indexed using the paragraph format (-t para). waissearch was used to search the databases. The words listed are those retrieved with each search statement. The database contains the following words:

Search
Statement
freeWAIS-sf-1.1Expected
Results
No StemmingStemming
playplayplay
plays
play
playsplaysplay
plays
plays
play* play
plays
player
playful
playground
player
playful
playground
play
plays
player
playful
playground
plai play
plays
plai*plain
plainly
plains
plaintiff
plait
play
plays
plain
plains
plainly
plaintiff
plait
plain
plainly
plains
plaintiff
plait
plainplainplain
plains
plain
plain*plain
plainly
plains
plaintiff
plain
plainly
plains
plaintiff
plain
plainly
plains
plaintiff
pla*play
plays
player
playful
playground
plain
plainly
plains
plaintiff
plait
play
plays
player
playful
playground
plain
plainly
plains
plaintiff
plait
play
plays
player
playful
playground
plain
plainly
plains
plaintiff
plait



THIS PAGE REFERENCES:
© 1996-97 BioPD - University of Padova - Author: Leopoldo Saggin
Mail to: lsaggin@civ.bio.unipd.it - Last Revision: August 21, 1997
Tested on Netscape 1.22 and higher