Top Bar Search
The Top Bar Search provides a simple way to search the PDB for words and combinations of words that can be found in the text of PDB structures.
Furthermore, the auto-complete suggestion pop-up box of the Top Bar Search, guides users into focused and precise searches, in an intiutive way.
Finally, the different search modes that can be activated by selecting each of the corresponding tabs, can be used to proactively select the area of interest for Authors, Macromolecules, Sequence or Ligands.
- PDB ID or Text
- Suggestion Categories
- Search Mode Icons
PDB ID or Text
The "PDB ID or Text" search is the default search option on the site when the user does not use any of the provided search suggestions.
A 4-character PDB ID is assigned to each new structure at the time of deposition. The IDs are automatically assigned and do not have meaning. However, they serve as the unique, immutable identifier of each entry in the Protein Data Bank. As such, they are used throughout the scientific literature (e.g. in journal articles and in other databases) to refer to entries in the Protein Data Bank. Hence, if the PDB ID of an entry in the Protein Data Bank is known, it is the most direct way to retrieve it from the database.
If the search term is not a valid PDB ID, a full text search is performed instead. This search is a Lucene full text search of the content of the structure files in mmCIF format. For example, a search for actin would return all structures that have the word actin appear somewhere in the mmCIF coordinate file.
The full text search also supports operator syntax. Currently we support AND, OR and NOT search operators. We also support exact phrase syntax. By enveloping multiple terms in double quotes ("), the query will only return hits with that exact phrase. Here are some examples on how they are used:
- actin AND "skeletal muscle" returns all hits where both the word actin and the phrase skeletal muscle are present.
- actin OR "skeletal muscle" returns all hits where either the word actin or the phrase skeletal muscle or both are present.
- actin NOT myosin returns all hits where the word actin is present but the word myosin is not.
We also support grouping terms to form more complex queries. This is done with parentheses (). For example:
- BCL AND (apoptosis OR "Programmed cell death") returns hits that contain BCL and apoptosis, or BCL and Programmed cell death.
We also support wildcard searching both single character (?) or multicharacter (*). For example:
- BCL? returns hits containing BCL6, BCLX, etc.
- actin* returns all hits containing e.g. actin, actinin, acting, etc.
Note: Wildcard searches at the beginning of a word or word phrase are NOT supported (i.e. *synthesis)
The options on top determine the scope of the search, i.e. whether to include all fields ("Everything") or to restrict the search to author names, macromolecule names, sequences, or chemical components ("Ligand"). The Search History and Previous Results are stored only for the currently active session.
The suggestions box provides matches from several fields of the PDB and associated classifications or ontologies.
Apart from authors, macromolecules and organisms there will often be suggestions from educational resources (Molecule of the Month articles), chemical names (of PDB ligands), various identifiers (PDB, ligand 3-letter codes, PubMed, Uniprot), journal names, common words in the PDB text, Enzyme or Domain classifications and even protein sequences or chemical formulas.
For example if the user pastes the text string "GNAAAAKKGSEQESVKEFLAKAKEDFLKKWETPSQN" in the search box, he will be provided with options that will perform a BLAST search against the PDB, while if he types the text "C14 O5 S" he will be directed to a chemical formula search for PDB ligands.
The suggestion box will provide just a few terms from each category that match partially the inputed text and will rank them based on their similarity with the input as well as with the number of results that they will return.
So the suggestion box is in effect also doing auto-completion of hard to spell words. For instance, just by typing the few first letters "pse" for the word "Pseudomonas aeruginosa" will provide it as a suggestion for an organism search.
In case that the first few suggestions from a category do not happen to give what the user is looking for, he can also click on the "more" link, to focus on suggestions in a particular category.
So if the user is looking in a rarely found organism like "Colinus virginianus (northern bobwhite)" and he types "coli" in the search box, he will not be able to see it immediately since there are various variants of "E. Coli" that are a lot more common in the PDB. But by clicking "more" under the Organism category, he can find it in the longer list of suggestions.
Finally the user may also choose to search directly for his input in one of the categories, bypassing the provided options. Suppose that the user wants to retrieve all structures of authors with the name "Korkegian" (with or without middle name). He can do that by clicking the "Find all" link bellow the "Author" suggestions.
Search Mode Icons
There is an alternative way to use the search functionality for users that prefer to pre-actively specify the type of search that they want.
By clicking on the word above the search box a user can switch to a different "search mode".
That will affect both the suggestions that will be provided by the auto-complete box (which will be focused on that category) as well as the results that will be returned if the user clicks the search button.
Select on "Author" to perform directly search for entry authors (primary citation and deposition authors).
Only authors matching the input text will appear in the suggestion box.
This search mode is matching names of macromolecules as defined by sequence reference databases (like UniProt) and their cross-references to the PDB.
This method offers precise results and deals in a better way with the PDB experiment artifacts.
As an example a user may select the "Macromolecule name" search mode by clicking on the word "Macromolecule" above the search box and search for "prothrombin".
He will get all PDB entries which have cross-references to UniProt entries with that name, for different organisms (human, cow and mouse).
The user can then utilize the drill-down functionality, in case he wants to focus on a particular organism.
The user can simply select the "Sequence" search mode, type or paste a protein sequence like "GNAAAAKKGSEQESVKEFLAKAKEDFLKKWETPSQN" and do directly a BLAST search against the PDB.
This search mode also offers a link with the label "Options" right next to it.
This will forward the user to an advanced search form where he can specify more detailed search parameters.
In a similar way the user may use the "Ligand" search mode and type a chemical (small molecule) name or 3-letter-code. The result will be directly the PDB ligand page he is looking for. For example if the user types "biotin", he can click directly on the Ligand Summary page of BTN, which will be one of the suggestions that will pop-up.
Similarly a search for "aspartame" will match the ligand PME (N-L-alpha-aspartyl L-phenylalanine 1-methylester) which has the chemical synonym "Aspartame".
Alternatively the user may click on the "Find all" link and retrieve all PDB ligands with the word "biotin" in their name.
The molecules names and synonyms that are used for matching are provided by the Chemical Component Dictionary.
The "Ligand" search mode also offers a link to an advanced interface (called "Options", just next to "Ligand") where the user can search for ligands by specifying Chemical structure, Name, or Formula.
Please contact us if, after reading the Top Bar Search explanations, additional help on searching is needed.