Description of ClustScan-Professional
This document is intended for users who have no previous experience with ClustScan-Professional. It should give the reader a basic understanding of the following concepts:
This is a hands-on tutorial that walks the reader through the basic operations that are needed to use ClustScan-Professional taking the erythromycin producing polyketide synthase gene cluster (GeneBank reference AY661566.1) as an example. It is not intended to be a complete reference. If you need extra help or information contact firstname.lastname@example.org
To run ClustScan-Professional you will first need to install a Java Runtime Environment (JRE) 6. This can be downloaded from: http://java.sun.com/javase /downloads/index.jsp.
Next, download and extract the ClustScan-Professional files from: http://reg.bioserv.pbf.hr/. Start the program by double-clicking on the ClustScan-Professional icon .
After you start ClustScan-Professional, it's splashscreen will appear.
finishes loading, a login pop-up will appear.
Enter your username (E-mail address) and password details and using your mouse click ‘OK’. The login procedure might take a couple of minutes to load depending on the size of your Workspace and speed of your network connection. If you are logging in for the first time your Workspace will be empty and the login procedure should finish quickly. To change your password, use your mouse to select the change password command from the Help Import function from the File pull-down menu.
When ClustScan-Professional finishes with the login procedure, the annotation interface will open.
Entering your Sequence
Use your mouse to select the Import DNA function from the File pull-down menu or by clicking on the icon once. The Import DNA dialog box will appear. ClustScan-Professional accepts input sequences either directly from the PBF-Server or as FastA accession files (or any other format supported by the ReadSeq - biosequence conversion tool). Regardless of the input method, you must enter a name for your query sequence, this will become the export file for your annotated sequence.
Using accession files
Selecting & Submitting your Search
ClustScan-Professional allows you to annotate your query sequence in two stages – finding all genes using either “Genemark” or “Glimmer” and then annotating PKS and/or NRPS gene clusters using “HMMER”. These tools, which can be selected from the Search pull down menu or by clicking on the icons displayed in the menu bar.
How to use “Genemark”
“Genemark” is a gene prediction program developed at the Georgia Institute of Technology, USA. The program determines the protein-coding potential of a DNA sequence using Markov models for bacterial and archaeal coding and non-coding regions. This tool will not differentiate between overlapping genes, the results must be parsed manually before exporting to third party software such as Artemis. Contact email@example.com for assistance with this.
When you select “Genemark”,
a dialog box will appear, pull down the choose DNA menu bar to select
your query sequence. You then have the option to change the default prediction
model for another species, by pulling down the choose prediction model
menu bar. Click Start button to run the program.
The Genemark server operation dialog box will appear that will provide information on the progress of your annotation. The annotation can be stopped at any time by clicking on the Cancel button on the dialog box. When the annotation is complete the dialog box disappears and the message no operations to display at this time will appear in the Progress filed at the bottom of the screen.
Go now to the Viewing
& Navigating your Results section, or proceed directly with the
second stage of your annotation – “HMMER”.
How to use
“Glimmer” is an
alternative gene prediction tool offered by ClustScan-Professional and
is the primary microbial gene finder used at The Institute for Genomic Research
(TIGR), where it was first developed. “Glimmer” will differentiate between
overlapping genes, negating the need for manual parsing of the data before
exporting to third party software.
When you select “Glimmer”, a dialog box will appear giving you the choice of either searching for genes in your query sequence directly using Glimmer, or using your own gene-prediction models. To use Glimmer directly, pull down the choose DNA menu bar to select your query sequence and click start to run the program. The results can be viewed in the annotation editor window. The results can also be saved as your own gene prediction model by clicking on the save model option and entering a label. Both Glimmer and your own gene-prediction models can be run using more stringent prediction criteria by clicking on the stringent box.
The Glimmer server operation dialog box will appear that will provide information on the progress of your annotation. The annotation can be stopped at any time by clicking on the Cancel button on the dialog box. When the annotation is complete the dialog box disappears and the message no operations to display at this time will appear in the Progress filed at the bottom of the screen.
Go now to the Viewing
& Navigating your Results section, or proceed directly with the
second stage of your annotation – “HMMER”.
“HMMER” will translate your DNA query sequence into all six reading frames and then search this protein sequence data using HMM profiles for existing protein families within the Pfam database or your own customised profiles.
When you select
“HMMER”, a ClustScan-HMMER dialog box will appear, pull down the choose
DNA menu bar to select your query sequence. You can now limit your search
by clicking on the add profile search button. A choose profile dialog
box will appear.
Select the protein family you wish to search by dragging down the group menu bar and selecting for either your own (“user”) customised HMM profiles, the entire “Pfam” database, “PKS”, “NRPS”, “PKS & NRPS” or “All”. Click on the add all button to display HMM profiles, these will appear in the added list at the bottom of the dialogue box.
Alternatively, you can select for specific HMM families by entering a key word in the search box and pressing the search button. All HMM profiles containing the key word will appear in the search result list. Click on the required profile (or holding down the control key on your key board and then clicking on more than one profile) followed by the add button to display the profile(s) in the added list at the bottom of the dialogue box. By clicking on add all, only the profiles listed in the added list will appear in the ClustScan-HMMER dialogue box.
Now select the level of stringency for the ‘HMMER’ predictions by clicking on the HMMER parameters box for either Stringent or Relaxed. Click start to run the program.
The HMMER server operation dialog box will appear that will provide information on the progress of your annotation. The annotation can be stopped at any time by clicking on the cancel on the dialog box. When the annotation is complete the dialog box disappears and the message no operations to display at this time will appear in the Progress filed at the bottom of the screen. When the annotation is complete, a graphical representation of your results will appear in the annotation editor .
Go now to the Viewing
& Navigating your Results section.
How to build your own customised HMM profile
To create your own
customised HMM profile, click on the create new button in the ClustScan-HMMER
dialogue box. Choose whether you want to create an alignment or to use an
existing alignment previously created in ClustalW.
To create an
alignment, paste your sequences into the text area (to use an existing
alignment, paste the alignment directly into the text editor). By clicking on check
format, you can ensure that your sequences are in FASTA format. Now you can
choose the matrix to construct the alignment by dragging down the matrix drop
down-list and selecting for either “BLOSUM”, “PAM”, “GONNET” or “ID”. Click on start
align to create the alignment.
Clicking on the next
button will bring you to the build and calibrate alignment dialogue box
where you must provide a profile name.
You should click on the check profile name to ensue your profile name does not already exist. Now select either local or global criteria for the profile and then click on build profile. The profile will be displayed in the text editor. Now click on finish to go back to the ClustScan-HMMER dialogue box. Your profile is now added in “user” as a customised HMM profile.
Go now to the How
to use “HMMER” section.
Default colours can be changed by selecting the preference option from the menu pull down icon.
Viewing the results of “Genemark” or “Glimmer”
The forward and reverse DNA strands are represented by green bands with the sequences labelled every 200 bp. Putative genes are represented by brown bands on the appropriate reading frame. ClustScan-Professional offers different zoom levels which you can choose by clicking the icon from the menu bar.
A pull down menu
will appear with possible zoom levels. You have the option of viewing the
coding sequence for each reading frame by selecting font from the zoom
pull down menu. This menu also gives the option to expand the viewing field to
give an overview of the annotation. The annotation editor can also be expanded by
clicking on the maximize icon and you can navigate the entire sequence using
your mouse to move the scroll key or by clicking left or right.
The properties of each gene, such as the DNA coordinates and protein reading frame, can be viewed in the Details field. This can be accessed in two ways, either by using your mouse to click twice on the gene of interest in the annotation editor , or by selecting your choice of gene by expanding the Workspace . Use the "+”/”-” keys to navigate trough the Workspace . A single click with your mouse on any of these genes with icon will take you to the corresponding co-ordinates in the annotation editor .
Viewing the results of “HMMER”
Putative domains will be represented as a blue default colour overlaying a particular gene in the annotation editor . Use the "+”/”-” keys to navigate trough the Workspace . A single click with your mouse on any of these domains will take you to the corresponding co-ordinates in the annotation editor .
You are now ready to edit your annotation and to construct a biosynthetic gene cluster!
Editing your annotation, assembly and displaying gene clusters
The first step in editing is to remove poor annotation from your query sequence. Simply click with your mouse on a domain in the Workspace to orientate your position on in the annotation editor Domain information will also now be displayed in the Details field. You need to decide on the significance of the domain information. As a ‘rule of thumb’ a good E-value should be much less than 1 with a corresponding high score. A domain can be deleted from your query sequence by re-selecting the domain in the Workspace followed by clicking on the delete icon with your mouse. If you make a mistake, you can rescue your annotation by using the undo/redo functions from the edit pull down menu or by clicking the icons . If the activity is predicted as inactive, then the domain is likely to be inactive, but you may not necessarily want to delete these domains from your dataset.
Next, assuming biosynthetic genes form clusters, you will want to create a putative cluster from your query sequence. Navigate the annotation editor to search for domains located on adjacent genes – these genes will be the likely candidates to form a cluster. Remember, these genes may be encoded on different reading frames. Holding down the Ctrl button on your keyboard, select (from left to right) the first gene in the annotation editor by clicking once with your mouse. The gene will now be surrounded by a black line. Still holding down the Ctrl button, select the final gene in your putative cluster, again a single click with your mouse will give the gene a black border. A single right click on your mouse will allow you to select the command create cluster.
All of the genes, together with gene and domain information, will now be displayed in a create cluster dialog box where there is a field for you to enter a name for your cluster.
The upper window gives a graphical overview of all the genes together with the number of domains (in parentheses).
The upper window
gives a graphical overview of all the genes together with the number of domains
(in parentheses). Select the gene of interest by a single click with your
mouse. To rearrange the order of genes in the cluster simply click on the icons
located in the cluster
editor . The bottom window will
display the gene you have selected together with the domain annotations.
Properties of the gene and domains will again be shown in the Details
field. You are now ready to create modules from the domains within your gene
domains into modules
You will need to have a prior knowledge of the organization of typical PKS or NRPS enzymes to do this, for example, a PKS module catalyzing a single cycle of chain extension and complete reduction must contain the domains KS-AT-DH-ER-KE-ACP.
From the bottom window of the cluster editor , select the first domain in your module by clicking once with your mouse while holding down the Ctrl button on your keyboard. The module will now be surrounded by a black line. Still holding down the Ctrl button, select the final domain in your module, again a single click with your mouse will give the gene a black border. A single right click on your mouse will allow you to select the command create module. A create module dialogue box will appear displaying the properties of the domains you have selected. You can now enter a module name followed by a single click on the create button.
Continue to create
modules for each gene you have selected in your putative cluster.
An alternative way to get an overview of the catalytic domains encoded by an individual gene or entire gene cluster you have just annotated is to select the Count function in the Tools pull down menu. A pop-up dialogue box will appear, a single left click with your mouse on the DNA chooses button will bring up a list of genes or gene clusters from your workspace. Select the annotation to be summarized and select a location of your hard-drive where this will be saved in a “comma separated - csv”, Excel compatible format by clicking on the browse button. Enter a file name and click on the open button. The dialogue box will appear again, click on the Write to file button, if successful the message Write to file successful will appear. You can now view your annotation summary by selecting the out-put file name from your hard-drive.
You are now ready
to find the putative linear structures for polyketides and/or peptides encoded
by your cluster.
prediction of chemistry & exporting your data
ClustScan-Professional can predict the linear chemical structures for all possible products of your gene cluster. The Workspace displays a history of your annotation. A single right click with your mouse on the gene-cluster icon for your query sequence will allow you to select the command get molecules. By selecting this command a numerical list of molecules will be displayed below the module icon. Simply double click with your mouse on any of these will display the linear molecule in the molecule editor .
An export file can be generated using either GenBank, EMBL or XML formats by selecting the export command from the File scroll down menu. A pop-up dialogue box will appear where you can record information pertaining to the annotation.
To export your data in any one of the three available formats (EMBL, GENBANK or DDBJ), select the export annotation icon from the FILE drop down menu. A pop-up form will appear, fill in the appropriate fields and press the EXPORT TO FILE button to export the data to a location of your choice. An example of the beginning of a file exported in GenBank format is shown on the next page.
LOCUS 32010 bp DNA 06-May-2008
How to extract your sequence data in FASTA format
There may be situations where you would like to export the sequences of certain domains, for example, if you wanted to construct sequences alignments. To do this select the get fasta option in the Tools pull-down menu. Select individual or multiple clusters (using control or shift keys simultaneously with a single left click on your mouse). Clicking on the add domains button will call up the choose profile dialog box. Refer to “HMMER” for instructions on how to complete this section. Select the cluster (or clusters) from the upper dialogue box and domains from the lower dialogue box using control or shift keys simultaneously with a single left click on your mouse. Select a location of your hard-drive where this will be saved in FASTA format by clicking on the browse button. Enter a file name and click on the open button. The dialogue box will appear again, click on the Get FASTA button. You can now view your annotation summary by selecting the out-put file name from your hard-drive.
CSDB (ClustScan-Professional DataBase) manager
You might want to save your data in the ClustScan database, which is in the public domain, or in a customized database that we can create and which you can restrict for your own use. Please contact us at firstname.lastname@example.org for terms and conditions.
CompGen: A module of ClustScan-Professional to generate recombinant clusters synthesizing novel chemical entities
CompGen is an integrated program package containing an expert system to model natural homeologous recombination in silico between two ClustScan-Professional annotated gene clusters. CompGen can be accessed by selecting the Recombination option in the Tools pull-down menu. Clusters you want to recombine can be selected from the first and second cluster pull down menus in the Recombination dialogue box. A list of default parameters to model the recombination can be modified by the user. In our experience, homeologous recombination requires about 30 bp of identical sequence (the MEPS length parameter), flanked by around 200 bp of sequence (MEPS extend parameter) sharing greater than 75% similarity (Required identity parameter). The CompGen recombination program finds identity and similarity in two sequences and links their co-ordinates to the parent clusters previously annotated in ClustScan-Professional so that the domains, linkers and dockers involved in the recombination process can be automatically inherited by the recombinants.
Pressing the Recombine button starts the recombination process. Each recombinant cluster will appear as a separate object in a new folder entitled Recombinants which is created in the Workspace.
A single left click on the icon with your mouse will give you details of the recombination, while a double click will also give you a schematic representation of the recombination event. By clicking on the icon the recombination can be viewed and edited in the ClustScan Annotation editor.
The predicted molecules that could be synthesized by each recombinant can be viewed by a single right click with your mouse on the icon and then selecting Get molecules. All molecules that could be synthesized from all possible recombination events between the two parent clusters can be viewed by a single right click with your mouse on the icon and selecting Get all molecules. To obtain details of each molecule in a text format (SMILE) or graphically (ball and stick model) refer to the “In silico prediction of chemistry & exporting your data” section.
Some of the recombinants will likely produce the same molecules. These can be grouped together by using the option Group by smiles which can be selected by a single right click with your mouse on the icon.
Recombinants can be filtered to include or exclude module architectures inherited from either or both parent clusters, for example, Proper Star (loading module), Proper End (thioesterase domain), Proper Start & End (loading module + thioesterase domain). The biosynthetic order of subsequent modules can be conserved using the Stringent option, while the order of these modules is not taken into consideration using the Relaxed option. Individual modules and/or genes can be included or excluded from either parent cluster by selecting them in the appropriate dialogue box.
Any problems encountered can be solved by contacting us at: email@example.com