This page show you how to submit queries to Altavista.
A more complete help can be found
here.
- paris "petite galerie" louvre
- Finds documents containing as many of these words and phrases as
possible, ranked so that documents with the most matches are presented first.
- A phrase is any string of adjacent words.
The preferred way to link words into a phrase is to use quotes.
- Lower-case search will find matches of capitalized words also. For example,
paris will find matches for paris, Paris, and
PARIS.
- Capital letters in a search will force an exact case match on the entire
word.
For example, submitting a query for parIS will search only for
matches of parIS. (Don't be surprised if there are none.)
- +noir +film -"pinot noir"
- Matches may be required, or prohibited. Precede
a required word or phrase with + and a prohibited one with -.
This query finds documents containing
film and noir, but not containing pinot noir.
- antique;pump;organ
- Punctuation glues words into a phrase, just as quotes do. Punctuation
is treated as white space, so this example is equivalent to
"antique pump organ" (that is, three words enclosed in quotes).
- quilt*
- This query matches pages that contain at least one word
such as quilt, quilts, quilting, quilted,
quilter etc. Hint: The *-notation
is also useful for searching for variant spellings. For example,
alumi*m will find matches for both aluminum and the British English
aluminium. More about its use
Examples of Simple Queries
To find the documents most relevant to what you need, construct
your query as precisely as you can.
AltaVista ranks the documents found so the ones
matching the most words and phrases in the query are listed first.
Even so, you might not find exactly what you want
at the head of the list if your search is too general.
For example, suppose you wanted information about
the languages of American Indians but you did not know any specific language to search for.
You might start with the following query:
american indian language.
(The word-count numbers quoted here are not updated as new pages are indexed. They
serve as an example only.)
- american indian language
- result:
- word count: indian 395185, language 2048030, american 2654433.
100000 documents found containing as many of these words as possible,
in both upper and lower case.
- observation:
- This search is much too broad. Of the first ten documents found, the first few
appear relevant, but the rest are documents about
languages in the Asian subcontinent.
- strategy:
- Make clear how you want the query to be parsed.
In other words, link american and
indian together as a phrase. Include the plural of language
in the search also by using
the *-notation.
- "american indian" language*
- result:
- word count: american indian 30000, language* 2050463.
20000 documents found.
- observation:
- The documents found are now relevant to information about
American Indian languages, enabling you to refine your search further.
For example, suppose you want to know
more about the ojibwe language that was mentioned in one of the documents found by
this query.
- strategy:
- Require that the word ojibwe and its variants ojibway
and ojibwa be included in your next search.
Since this is an American Indian word, you could now omit
american indian from the search.
- language* +ojibw*
- result:
- word count: ojibw* 3625, language* 2050463. 1000 documents found.
- observation:
- Bingo!
Ranking Simple Queries
For Simple Queries,
AltaVista will rank the results based on a scoring algorithm; documents with a
higher score appear at the head of the ranking list.
A document has a higher score if the following hold:
- the query words or phrases are found in the first few words
of the document (for example,
in the title of a Web page or in the headers of Usenet
news articles).
- the query words or phrases are found close to one another
in the document.
- the document contains more than one instance of the query word or phrase.
You are therefore likely to find what you want close to the head of the
resulting list of matches.
Constraining searches
It is possible to restrict searches to certain portions of documents by
using the following syntax. The keyword (link, title, image,...) should be
in lower-case, and immediately followed by a colon.
Constraining searches in Web pages:
- anchor:click-here
- Matches pages with the phrase click here in the text
of a hyperlink.
- applet:NervousText
- Matches pages containing the name of the Java applet class found in
an applet tag; in this case, NervousText.
- host:digital.com
- Matches pages with the phrase digital.com in the host name of
the Web server.
- image:comet.jpg
- Matches pages with comet.jpg in an image tag.
- link:thomas.gov
- Matches pages that contain at least one link to a page with
thomas.gov in its URL.
- text:algol68
- Matches pages that contain the word algol68 in any part of
the visible text of a page.
(ie, the word is not in a link or an image, for example.)
- title:"The Wall Street Journal"
- Matches pages with the phrase The Wall Street Journal
in the title.
- url:home.html
- Matches pages with the words home and html
together in the page's URL. Equivalent to url:"home html".
Constraining searches in Usenet news articles:
- from:napoleon@elba.com
- Matches news articles with the words napoleon@elba.com
in the From: field.
- subject:"for sale"
- Matches news articles with the phrase for sale in the
Subject: field.
- You can combine this with a word or phrase. For example,
subject:"for sale" "victorian chamber pots".
- newsgroups:rec.humor
- Matches news articles posted (or crossposted) in news groups
with rec.humor in the name.
- summary:invest*
- Matches news articles with the word invest,
investment, investiture, etc., in the summary.
- keywords:NASA
- Matches news articles with the word NASA in all caps
in the keyword list.
More about Words, Phrases, Capitalization,
Accents, and the *-Notation
Words
AltaVista treats every page on the Web and every article of Usenet
news as a sequence of words. A word in this context means any string
of letters and digits delimited either by punctuation and other
non-alphabetic characters (for example, &, %, $, /, #, _, ~), or by
white space (spaces, tabs, line ends, start of document, end of
document). To be a word, a string of alphanumerics does not have to be
spelled correctly or be found in any dictionary. All that is
required is that someone typed it as a single word in a Web page
or Usenet news article. Thus, the following are words if they
appear delimited in a document: HAL5000,
Gorbachevnik, 602e21, www, http,
EasierSaidThanDone, etc.
The following are all considered to be two words because the
internal punctuation separates them: don't,
digital.com, x-y, AT&T, 3.14159,
U.S., All'sFairInLoveAndWar.
Only the words in a document are significant to AltaVista. AltaVista
does not index punctuation or white space, so you can use
AltaVista to look only for words and phrases, not punctuation.
Phrases
A phrase is a string of words that are adjacent in a document,
although they may be separated by any amount of white space or
punctuation. They do not have to be grammatical in any human
language--they just have to occur in a document as an adjacent
sequence of words. Some examples:
- President of the U.S.A. (6-word phrase)
- http://www.election.digital.com (5-word phrase)
Since the punctuation and white space are insignificant to AltaVista
(except that they delimit words), the phrases above are
indistinguishable from the following variants:
- President of the U S A
- http www election digital com
There are two conventions for typing a phrase in a query. The best
way, leading to the least ambiguity, is to type the phrase as "a
sequence of words separated by spaces and surrounded by double
quotes". However, as an alternative, you may type the words of
the phrase with punctuation (and no white space) between each pair
of words. For example, these are all equivalent as queries:
- "President of the U S A"
- President-of-the-U-S-A
- President/of/the/U/S/A
- President.of.the.U-S-A
The first is the one we generally recommend. Be aware that
the punctuation characters & | ! and ~ have meaning in Advanced
queries, and * indicates the *-notation used in both Simple and Advanced
queries.
Capitalization
Capital letters are considered distinct from lower-case letters.
When a word is found in a Web page or a news article, its
case is preserved when it is stored in the index.
When you enter a word in a query, therefore, it is always safe, and
generally recommended, to type it all in lower-case, because
lower-case letters indicate a case-insensitive match. If you
type any capital letters, you force an exact case match on the
entire word.
Thus, the word turkey in a query will
match any of turkey, Turkey, tUrKeY or
TURKEY occurring in a document. But the capitalized word
Turkey in a query will match only Turkey in the
document, and not any of the other capitalization variants.
Accents
Accents are treated in the same way as capitalization.
An accented word used in a query forces an exact match on the entire word.
For example, if you use éléphant in a query, you will
match only the French spelling for the pachyderm.
However, if you do not care to enter accents in the search window
(something which is browser, platform, and keyboard-dependent), you can always
safely omit the accents, thereby matching both the French and English spellings.
The *-notation
To search for occurrences of any of a group of
words with a similar pattern, AltaVista provides the *-notation.
For example, you might want to search for matches of
sing, singer, singers,
singing. In this case,
place the *-notation at the end of the word whose inflections you want to
include in the search: sing*. But, a word of warning.
AltaVista will also match words lexically unrelated to your query word. So the query
sing* will also find matches for
singe, single, singular, and for foreign words such as
French singulier.
The *-notation cannot be used without restriction. To make such queries
computationally feasible, AltaVista requires that the * be used only
after at least three letters. The *-notation will match from zero
up to five additional letters in lower-case only.
Capital letters and digits
will not therefore be matched.
The *-notation can sometimes be useful for finding variant spellings:
for example,
cantalo* will find matches for cantaloup,
cantaloupe, cantalope, and their plurals.
But take care how you construct the query word. For example,
if you want to find matches for both color and colour, a query
of the form col*r is not the most efficient. This query will also
find matches for collector and atomic collider.
In this case, it is more efficient to submit the query colo*r,
which will find matches for both color and colour.
Finally, if your search using the *-notation finds too many matches,
AltaVista will ignore the query.
The query inte*, for example, produces the result,
Ignored inte*: 4292323
No documents match this query
The META tag: Controlling how your Web page is indexed by AltaVista
In the absence of any other information, AltaVista will index all words
in your document (except for comments), and will use the first few words
of the document as a short abstract.
It is however possible for you to control how your page is indexed by using
the META tag to specify both additional keywords to index, and a short description.
Let's suppose your page contains:
<META name="description"
content="We specialize in grooming pink poodles.">
<META name="keywords" content="pet grooming, Palo Alto, dog">
AltaVista will then do two things:
- It will index both fields as words,
so a search on either poodles or dog will match.
- It will return the description with the URL.
In other words, instead of showing the first couple of
lines of the page, a match will look like the following:
- Pink Poodles Inc
- We specialize in grooming pink poodles.
http://pink.poodle.org/ - size 3k - 29 Feb 96
Copyright © 1996
Digital Equipment Corporation.
All rights reserved.