The Barley EST database (B-EST)

The database contains original data and results from BlastX2 searches against the major protein sequence databases NRPEP. The current number of entries is available at the status-page.

cDNA libraries:

For information click here, or use the library button in the search results page

Sequences:

Vector sequences and sequence ends were trimmed from the 5´-and 3´-end until a 50 bp window contains less than two ambiguities. The maximum length was set to 700 bp. In a second step CrossMatch (Green 1996) was used to detect remaining vector artefacts. Only sequences longer than 100 bp after this process were included in the dataset.

Sequence identifier:

The EST-sequence names (e.g. HK01A01u) consists of three parts:

     letters 1-2 : library code (click zur Library page)

     letters 3-7 : plate address

     letter 8: primer code

Primer code:

     r: M13rev, 5' cDNA end (3' cDNA end in case of HK library)

     T: T3,  5' cDNA end

     V: modification of T3, 5' cDNA end (only used in HW and HY library)

     S: SK, 5' cDNA end (only used in HO library)

     u: M13uni, 3' cDNA end (5' cDNA end in case of HK library)

     w: T7, 3' cDNA end (5' cDNA end in case of HC, HH, HL and HQ library)
     
     x: pTriplex2, 5' cDNA end (used in HJ library)
     
     y: SP6, 3' cDNA end (used in HC and HH library)

Sequence similarity searches:

Using the BlastX2 program, all sequences were compared to NRPEP, a database containing non-redundant protein sequences from GENBANK translations, PDB, SWISSPROT and PIR. (for details on databases see: http://genome.dkfz-heidelberg.de/).
The first ten hits were included into B-EST. Until now, similarity search results are available for barley EST sequences generated at the IPK. Sequence similarity searches against databases were conducted using BlastX2 from release 2.0.9 of the Blast2 (NCBI) suite of programs.
These programs use filtering tools by default (SEG). Searches were performed using the default parameters (matrix: blosum62, -EXP=10, -WORD=3, -THRES=11, -EXT=15, -GAP=11, -LEN=1).

Stack_Pack consensus sequence:

We included Blast results for consensi of a clustering process into the database to provide the user with information about redundancy and potentially more significant similarity scores as compared to single EST sequences.

There are three different types of consensus sequences available in the B-EST database:

The first clustering project contains results of cluster analysis of 13,109 ESTs. These results use the normal stackPACK identifiers cl#ct#cn#, whereby the '#' stands for a running number.
The second project contains results of cluster analysis of 41,600 ESTs. The identifiers are extended with 'g01', e. g. cl#ct#cn#g01.
The third project contains results of cluster analysis of 111,090 ESTs. In this case the identifiers are extended with a 'g02', e. g. cl#ct#cn#g02.

In all cases primary and alternative consensus sequences are stored. For more information about these kinds of sequences, please have a look at the Egenetics webpage.

NOTE: Clustering data will change periodically, therefore cluster IDs are not useful as reference points.
EST clustering was performed with stackPACK 2.1 (see also under http://www.egenetics.com). For information of the clustering process see ISMB99 EST clustering tutorial (http://www.sanbi.ac.za).

Requests for clones:

Dr. Patrick Schweizer, schweiz@ipk-gatersleben.de