Description of ClustScan-Professional

The ClustScan-Professional program package is designed for rapid, semi-automatic, annotation of modular biosynthetic clusters starting from DNA sequences. Potential protein coding regions and catalytic domains can be identified by programs running on a server and the results presented to the user in a graphical interface with a Java client program. The user can edit the results and assemble biosynthetic clusters. The activity and specificity of biosynthetic clusters can be used to deduce the chemical structure of linear products as well as making a simplified prediction of potential cyclic structures. The user can save results in their own workspace on the server, or export the results to applications in other third party software.

Intended Audience

 

This document is intended for users who have no previous experience with ClustScan-Professional. It should give the reader a basic understanding of the following concepts:

  • What ClustScan-Professional is
  • How to get information using ClustScan-Professional
  • How to export information from ClustScan-Professional

 

This is a hands-on tutorial that walks the reader through the basic operations that are needed to use ClustScan-Professional taking the erythromycin producing polyketide synthase gene cluster (GeneBank reference AY661566.1) as an example. It is not intended to be a complete reference. If you need extra help or information contact novalis@novalis.hr

Getting Started

To run ClustScan-Professional you will first need to install a Java Runtime Environment (JRE) 6. This can be downloaded from: http://java.sun.com/javase /downloads/index.jsp.

 

Next, download and extract the ClustScan-Professional files from: http://reg.bioserv.pbf.hr/. Start the program by double-clicking on the ClustScan-Professional icon Your browser may not support display of this image.. 

After you start ClustScan-Professional, it's splashscreen Your browser may not support display of this image. will appear. 

When ClustScan-Professional finishes loading, a login pop-up will appear. 
 

 

Your browser may not support display of this image.

 
 

Enter your username (E-mail address) and password details and using your mouse click ‘OK’. The login procedure might take a couple of minutes to load depending on the size of your Workspace Your browser may not support display of this image. and speed of your network connection. If you are logging in for the first time your Workspace Your browser may not support display of this image. will be empty and the login procedure should finish quickly. To change your password, use your mouse to select the change password command from the Help Import function from the File pull-down menu. 

When ClustScan-Professional finishes with the login procedure, the annotation interface will open.

 

 

 

Your browser may not support display of this image.

 
 
 
 

Entering your Sequence 

Use your mouse to select the Import DNA function from the File pull-down menu or by clicking on the icon Your browser may not support display of this image. once. The Import DNA dialog box will appear. ClustScan-Professional accepts input sequences either directly from the PBF-Server or as FastA accession files (or any other format supported by the ReadSeq - biosequence conversion tool). Regardless of the input method, you must enter a name for your query sequence, this will become the export file for your annotated sequence. 

 

 

Using accession files 

  1. Clicking once on your mouse, select the main input field titled "browse” in the dialog box.
  2. Locate and upload your file of choice by clicking on the file with your mouse. Then select OK to start uploading your file onto the ClustScan-Professional server.
  3. You should now see the name of your query sequence in the Workspace Your browser may not support display of this image. and the forward and reverse DNA strands in the annotation window.

 

 

Your browser may not support display of this image.

 
 
 

Your browser may not support display of this image.

 

Selecting & Submitting your Search 

ClustScan-Professional allows you to annotate your query sequence in two stages – finding all genes using either “Genemark” or “Glimmer” and then annotating PKS and/or NRPS gene clusters using “HMMER”. These tools, which can be selected from the Search pull down menu or by clicking on the icons Your browser may not support display of this image.Your browser may not support display of this image.Your browser may not support display of this image.displayed in the menu bar. 

How to use “Genemark” 

“Genemark” is a gene prediction program developed at the Georgia Institute of Technology, USA. The program determines the protein-coding potential of a DNA sequence using Markov models for bacterial and archaeal coding and non-coding regions. This tool will not differentiate between overlapping genes, the results must be parsed manually before exporting to third party software such as Artemis. Contact novalis@novalis.hr for assistance with this. 

When you select “Genemark”, a dialog box will appear, pull down the choose DNA menu bar to select your query sequence. You then have the option to change the default prediction model for another species, by pulling down the choose prediction model menu bar. Click Start button to run the program. 
 

Your browser may not support display of this image.

 

 

The Genemark server operation dialog box will appear that will provide information on the progress of your annotation. The annotation can be stopped at any time by clicking on the Cancel button on the dialog box. When the annotation is complete the dialog box disappears and the message no operations to display at this time will appear in the Progress filed at the bottom of the screen. 

Go now to the Viewing & Navigating your Results section, or proceed directly with the second stage of your annotation – “HMMER”. 
 

How to use “Glimmer” 
 

“Glimmer” is an alternative gene prediction tool offered by ClustScan-Professional and is the primary microbial gene finder used at The Institute for Genomic Research (TIGR), where it was first developed. “Glimmer” will differentiate between overlapping genes, negating the need for manual parsing of the data before exporting to third party software. 
 

When you select “Glimmer”, a dialog box will appear giving you the choice of either searching for genes in your query sequence directly using Glimmer, or using your own gene-prediction models. To use Glimmer directly, pull down the choose DNA menu bar to select your query sequence and click start to run the program. The results can be viewed in the annotation editor window. The results can also be saved as your own gene prediction model by clicking on the save model option and entering a label. Both Glimmer and your own gene-prediction models can be run using more stringent prediction criteria by clicking on the stringent box. 

 

Your browser may not support display of this image.

 
 

The Glimmer server operation dialog box will appear that will provide information on the progress of your annotation. The annotation can be stopped at any time by clicking on the Cancel button on the dialog box. When the annotation is complete the dialog box disappears and the message no operations to display at this time will appear in the Progress filed at the bottom of the screen. 

Go now to the Viewing & Navigating your Results section, or proceed directly with the second stage of your annotation – “HMMER”. 
 

How to use “HMMER” 

“HMMER” will translate your DNA query sequence into all six reading frames and then search this protein sequence data using HMM profiles for existing protein families within the Pfam database or your own customised profiles.

 

 

When you select “HMMER”, a ClustScan-HMMER dialog box will appear, pull down the choose DNA menu bar to select your query sequence. You can now limit your search by clicking on the add profile search button. A choose profile dialog box will appear. 
 

Your browser may not support display of this image.

 
 

Select the protein family you wish to search by dragging down the group menu bar and selecting for either your own (“user”) customised HMM profiles, the entire “Pfam” database, “PKS”, “NRPS”, “PKS & NRPS” or “All”. Click on the add all button to display HMM profiles, these will appear in the added list at the bottom of the dialogue box.

 

 

Your browser may not support display of this image.

 
 
 

Alternatively, you can select for specific HMM families by entering a key word in the search box and pressing the search button. All HMM profiles containing the key word will appear in the search result list. Click on the required profile (or holding down the control key on your key board and then clicking on more than one profile) followed by the add button to display the profile(s) in the added list at the bottom of the dialogue box. By clicking on add all, only the profiles listed in the added list will appear in the ClustScan-HMMER dialogue box.

 

Your browser may not support display of this image.

 
 

Now select the level of stringency for the ‘HMMER’ predictions by clicking on the HMMER parameters box for either Stringent or Relaxed. Click start to run the program. 

The HMMER server operation dialog box will appear that will provide information on the progress of your annotation. The annotation can be stopped at any time by clicking on the cancel on the dialog box. When the annotation is complete the dialog box disappears and the message no operations to display at this time will appear in the Progress filed at the bottom of the screen. When the annotation is complete, a graphical representation of your results will appear in the annotation editor Your browser may not support display of this image..

 

Your browser may not support display of this image.

 
 

Go now to the Viewing & Navigating your Results section. 
 

How to build your own customised HMM profile 

To create your own customised HMM profile, click on the create new button in the ClustScan-HMMER dialogue box. Choose whether you want to create an alignment or to use an existing alignment previously created in ClustalW. 
 

Your browser may not support display of this image.

 

 

To create an alignment, paste your sequences into the text area (to use an existing alignment, paste the alignment directly into the text editor). By clicking on check format, you can ensure that your sequences are in FASTA format. Now you can choose the matrix to construct the alignment by dragging down the matrix drop down-list and selecting for either “BLOSUM”, “PAM”, “GONNET” or “ID”. Click on start align to create the alignment. 
 

 
 

Clicking on the next button will bring you to the build and calibrate alignment dialogue box where you must provide a profile name. 
 

 

You should click on the check profile name to ensue your profile name does not already exist. Now select either local or global criteria for the profile and then click on build profile. The profile will be displayed in the text editor. Now click on finish to go back to the ClustScan-HMMER dialogue box. Your profile is now added in “user” as a customised HMM profile. 

Go now to the How to use “HMMER” section. 
 

Viewing & Navigating your Results 

Default colours can be changed by selecting the preference option from the menu pull down icon. 

Viewing the results of “Genemark” or “Glimmer” 

You can view your results by clicking twice with your mouse onto the name of your query sequence in the Workspace Your browser may not support display of this image.. Your results will appear in the annotation editor Your browser may not support display of this image.. 

Your browser may not support display of this image.

 

The forward and reverse DNA strands are represented by green bands with the sequences labelled every 200 bp. Putative genes are represented by brown bands on the appropriate reading frame. ClustScan-Professional offers different zoom levels which you can choose by clicking the icon Your browser may not support display of this image.from the menu bar. 

A pull down menu will appear with possible zoom levels. You have the option of viewing the coding sequence for each reading frame by selecting font from the zoom pull down menu. This menu also gives the option to expand the viewing field to give an overview of the annotation. The annotation editor Your browser may not support display of this image.can also be expanded by clicking on the maximize icon and you can navigate the entire sequence using your mouse to move the scroll key or by clicking left or right. 
 

Your browser may not support display of this image.

 
 

The properties of each gene, such as the DNA coordinates and protein reading frame, can be viewed in the Details field. This can be accessed in two ways, either by using your mouse to click twice on the gene of interest in the annotation editor Your browser may not support display of this image., or by selecting your choice of gene by expanding the Workspace Your browser may not support display of this image.. Use the "+”/”-” keys to navigate trough the Workspace Your browser may not support display of this image.. A single click with your mouse on any of these genes with icon Your browser may not support display of this image.will take you to the corresponding co-ordinates in the annotation editor Your browser may not support display of this image.. 

Viewing the results of “HMMER” 

Putative domains will be represented as a blue default colour overlaying a particular gene in the annotation editor Your browser may not support display of this image.. Use the "+”/”-” keys to navigate trough the Workspace Your browser may not support display of this image.. A single click with your mouse on any of these domains will take you to the corresponding co-ordinates in the annotation editor Your browser may not support display of this image.. 

Your browser may not support display of this image.

 

Your browser may not support display of this image.

 

You are now ready to edit your annotation and to construct a biosynthetic gene cluster! 

Editing your annotation, assembly and displaying gene clusters 

The first step in editing is to remove poor annotation from your query sequence. Simply click with your mouse on a domain in the Workspace Your browser may not support display of this image.to orientate your position on in the annotation editor Your browser may not support display of this image.Domain information will also now be displayed in the Details field. You need to decide on the significance of the domain information. As a ‘rule of thumb’ a good E-value should be much less than 1 with a corresponding high score. A domain can be deleted from your query sequence by re-selecting the domain in the Workspace Your browser may not support display of this image.followed by clicking on the delete icon Your browser may not support display of this image.with your mouse. If you make a mistake, you can rescue your annotation by using the undo/redo functions from the edit pull down menu or by clicking the icons Your browser may not support display of this image.Your browser may not support display of this image.. If the activity is predicted as inactive, then the domain is likely to be inactive, but you may not necessarily want to delete these domains from your dataset. 

Next, assuming biosynthetic genes form clusters, you will want to create a putative cluster from your query sequence. Navigate the annotation editor Your browser may not support display of this image.to search for domains located on adjacent genes – these genes will be the likely candidates to form a cluster. Remember, these genes may be encoded on different reading frames. Holding down the Ctrl button on your keyboard, select (from left to right) the first gene in the annotation editor Your browser may not support display of this image.by clicking once with your mouse. The gene will now be surrounded by a black line. Still holding down the Ctrl button, select the final gene in your putative cluster, again a single click with your mouse will give the gene a black border. A single right click on your mouse will allow you to select the command create cluster. 

All of the genes, together with gene and domain information, will now be displayed in a create cluster dialog box where there is a field for you to enter a name for your cluster.

 

Your browser may not support display of this image.

 

By clicking on the create button, you will next enter a cluster editor Your browser may not support display of this image.displaying two windows. 

Your browser may not support display of this image.

 

 

The upper window gives a graphical overview of all the genes together with the number of domains (in parentheses). 

The upper window gives a graphical overview of all the genes together with the number of domains (in parentheses). Select the gene of interest by a single click with your mouse. To rearrange the order of genes in the cluster simply click on the icons Your browser may not support display of this image.Your browser may not support display of this image.located in the cluster editor Your browser may not support display of this image.. The bottom window will display the gene you have selected together with the domain annotations. Properties of the gene and domains will again be shown in the Details field. You are now ready to create modules from the domains within your gene cluster! 
 

Grouping domains into modules 
 

You will need to have a prior knowledge of the organization of typical PKS or NRPS enzymes to do this, for example, a PKS module catalyzing a single cycle of chain extension and complete reduction must contain the domains KS-AT-DH-ER-KE-ACP. 

From the bottom window of the cluster editor Your browser may not support display of this image., select the first domain in your module by clicking once with your mouse while holding down the Ctrl button on your keyboard. The module will now be surrounded by a black line. Still holding down the Ctrl button, select the final domain in your module, again a single click with your mouse will give the gene a black border. A single right click on your mouse will allow you to select the command create module. A create module dialogue box will appear displaying the properties of the domains you have selected. You can now enter a module name followed by a single click on the create button.

 

Your browser may not support display of this image.

 

Continue to create modules for each gene you have selected in your putative cluster. 
 

Your browser may not support display of this image.

 
 

When you have finished you can view the genes, proteins and domain organization using the "+”/”-” buttons in the Workspace Your browser may not support display of this image.or using annotation Your browser may not support display of this image.(above) and cluster Your browser may not support display of this image.(below) editors.

 

Your browser may not support display of this image.

 
 

An alternative way to get an overview of the catalytic domains encoded by an individual gene or entire gene cluster you have just annotated is to select the Count function in the Tools pull down menu. A pop-up dialogue box will appear, a single left click with your mouse on the DNA chooses button will bring up a list of genes or gene clusters from your workspace. Select the annotation to be summarized and select a location of your hard-drive where this will be saved in a “comma separated - csv”, Excel compatible format by clicking on the browse button. Enter a file name and click on the open button. The dialogue box will appear again, click on the Write to file button, if successful the message Write to file successful will appear. You can now view your annotation summary by selecting the out-put file name from your hard-drive.

 

You are now ready to find the putative linear structures for polyketides and/or peptides encoded by your cluster. 
 

In silico prediction of chemistry & exporting your data 
 

ClustScan-Professional can predict the linear chemical structures for all possible products of your gene cluster. The Workspace Your browser may not support display of this image.displays a history of your annotation. A single right click with your mouse on the gene-cluster icon Your browser may not support display of this image.for your query sequence will allow you to select the command get molecules. By selecting this command a numerical list of molecules will be displayed below the module icon. Simply double click with your mouse on any of these will display the linear molecule in the molecule editor Your browser may not support display of this image..

 

 

 

An export file can be generated using either GenBank, EMBL or XML formats by selecting the export command from the File scroll down menu. A pop-up dialogue box will appear where you can record information pertaining to the annotation. 

 

 
 

Your browser may not support display of this image.

 
 
 

To export your data in any one of the three available formats (EMBL, GENBANK or DDBJ), select the export annotation icon export_wiz.gif  from the FILE drop down menu. A pop-up form will appear, fill in the appropriate fields and press the EXPORT TO FILE button to export the data to a location of your choice. An example of the beginning of a file exported in GenBank format is shown on the next page.

 

LOCUS                            32010 bp    DNA                06-May-2008

           

ACCESSION  

  AUTHORS  

COMMENT    

DEFINITION 

KEYWORDS   

SOURCE     

VERSION    

  TITLE    

  ORGANISM 

REFERENCE  

FEATURES             Location/Qualifiers

     gene            289..10638

                     /gene="eryAI"

                     /note

     gene            10695..11969

                     /gene="Gene 2"

                     /note

     gene            12080..22783

                     /gene="eryAII"

                     /note

     gene            22784..32299

                     /gene="eryAIII"

                     /note

     Module          1

                     /region_name="LD"

     Region          327..1275

                     /region_name="PKS_AT"

                     /note

     Region          1407..1608

                     /region_name="PKS_ACP"

                     /note

     Module          1

                     /region_name="M1"

     Region          1680..2949

                     /region_name="PKS_KS"

                     /note

     Region          3252..4221

                     /region_name="PKS_AT"

                     /note

     Region          4998..5496

                     /region_name="PKS_KR"

                     /note

     Region          5829..6030

                     /region_name="PKS_ACP"

                     /note

     Module          1

                     /region_name="M2"

     Region          6105..7374

                     /region_name="PKS_KS"

                     /note

     Region          7686..8646

                     /region_name="PKS_AT"

                     /note

     Region          9333..9828

                     /region_name="PKS_KR"

                     /note

     Region          10170..10371

                     /region_name="PKS_ACP"

                     /note

     Module          1

                     /region_name="M3"

     Region          12180..13452

                     /region_name="PKS_KS"

                     /note

     Region          13761..14724

                     /region_name="PKS_AT"

                     /note

     Region          15477..15939

                     /region_name="PKS_KR"

                     /note

     Region          16278..16479

                     /region_name="PKS_ACP"

                     /note

     Module          1

                     /region_name="M4"

     Region          16554..17823

                     /region_name="PKS_KS"

                     /note

     Region          18123..19080

                     /region_name="PKS_AT"

                     /note

     Region          19194..19701

                     /region_name="PKS_DH"

                     /note

     Region          20574..21474

                     /region_name="PKS_ER"

                     /note

    Region          21504..21999

                     /region_name="PKS_KR"

                     /note

     Region          22332..22533

                     /region_name="PKS_ACP"

                     /note

     Module          1

                     /region_name="M5"

     Region          22908..24147

                     /region_name="PKS_KS"

                     /note

     Region          24453..25404

                     /region_name="PKS_AT"

                     /note

     Region          26133..26628

                     /region_name="PKS_KR"

                     /note

     Region          26964..27165

                     /region_name="PKS_ACP"

                     /note

     Module          1

                     /region_name="M6"

     Region          27249..28524

                     /region_name="PKS_KS"

                     /note

     Region          28836..29775

                     /region_name="PKS_AT"

                     /note

     Region          30453..30942

                     /region_name="PKS_KR"

                     /note

     Region          31251..31452

                     /region_name="PKS_ACP"

                     /note

     Region          31656..32283

                     /region_name="PKS_TE"

                     /note

     CDS             289..10638

                     /gene="eryAI"

                     /transl_table=11

                     /codon_start="FORWARD_1"

                     /translation="VVRGVARPSAPVVFVFP GQGAQWAGMAGELLGESRVFAAAMDAC

ARAFEPVTDWTLAQVLDSPEQSRRVEVVQPA LFAVQTSLAALWRSFGVTPDAVVGHSI

GELAAAHVCGAAGAADAARAAALWSREMIPL VGNGDMAAVALSADEIEPRIARWDDDV

VLAGVNGPRSVLLTGSPEPVARRVQELSAEG VRAQVINVSMAAHSAQVDDIAEGMRSA

LAWFAPGGSEVPFYASLTGGAVDTRELVADY WRRSFRLPVRFDEAIRSALEVGPGTFV

EASPHPVLAAALQQTLDAEGSSAAVVPTLQR GQGGMRRFLLAAAQAFTGGVAVDWTAA

YDDVGAEPGSLPEFAPAEEEDEPAESGVDWN APPHVLRERLLAVVNGETAALAGREAD

AEATFRELGLDSVLAAQLRAKVSAAIGREVN IALLYDHPTPRALAEALAAGTEVAQRE

TRARTNEAAPGEPVAVVAMACRLPGGVSTPE EFWELLSEGRDAVAGLPTDRGWDLDSL

FHPDPTRSGTAHQRGGGFLTEATAFDPAFFG MSPREALAVDPQQRLMLELSWEVLERA

GIPPTSLQASPTGVFVGLIPQEYGPRLAEGG EGVEGYLMTGTTTSVASGRIAYTLGLE

GPAISVDTACSSSLVAVHLACQSLRRGESSL AMAGGVTVMPTPGMLVDFSRMNSLAPD

GRCKAFSAGANGFGMAEGAGMLLLERLSDAR RNGHPVLAVLRGTAVNSDGASNGLSAP

NGRAQVRVIQQALAESGLGPADIDAVEAHGT GTRLGDPIEARALFEAYGRDREQPLHL

GSVKSNLGHTQAAAGVAGVIKMVLAMRAGTL PRTLHASERSKEIDWSSGAISLLDEPE

PWPAGARPRRAGVSSFGISGTNAHAIIEEAP QVVEGERVEAGDVVAPWVLSASSAEGL

RAQAARLAAHLREHPGQDPRDIAYSLATGRA ALPHRAAFAPVDESAALRVLDGLATGN

ADGAAVGTSRAQQRAVFVFPGQGWQWAGMAV DLLDTSPVFAAALRECADALEPHLDFE

VIPFLRAEAARREQDAALSTERVDVVQPVMF AVMVSLASMWRAHGVEPAAVIGHSQGE

IAAACVAGALSLDDAARVVALRSRVIATMPG NKGMASIAAPAGEVRARIGDRVEIAAV

NGPRSVVVAGDSDELDRLVASCTTECIRAKR LAVDYASHSSHVETIRDALHAELGEDF

HPLPGFVPFFSTVTGRWTQPDELDAGYWYRN LRRTVRFADAVRALAEQGYRTFLEVSA

HPILTAAIEEIGDGSGADLSAIHSLRRGDGS LADFGEALSRAFAAGVAVDWESVHLGT

GARRVPLPTYPFQRERVWLEPKPVARRSTEV DEVSALRYRIEWRPTGAGEPARLDGTW

LVAKYAGTADETSTAAREALESAGARVRELV VDARCGRDELAERLRSVGEVAGVLSLL

AVDEAEPEEAPLALASLADTLSLVQAMVSAE LGCPLWTVTESAVATGPFERVRNAAHG

ALWGVGRVIALENPAVWGGLVDVPAGSVAEL ARHLAAVVSGGAGEDQLALRADGVYGR

RWVRAAAPATDDEWKPTGTVLVTGGTGGVGG QIARWLARRGAPHLLLVSRSGPDADGA

GELVAELEALGARTTVAACDVTDRESVRELL GGIGDDVPLSAVFHAAATLDDGTVDTL

TGERIERASRAKVLGARNLHELTRELDLTAF VLFSSFASAFGAPGLGGYAPGNAYLDG

LAQQRRSDGLPATAVAWGTWAGSGMAEGPVA DRFRRHGVIEMPPETACRALQNALDRA

EVCPIVIDVRWDRFLLAYTAQRPTRLFDEID DARRAAPQAAAEPRVGALASLPAPERE

KALFELVRSHAAAVLGHASAERVPADQAFAE LGVDSLSALELRNRLGAATGVRLPTTT

VFDHPDVRTLAAHLAAELGGATGAEQAAPAT TAPVDEPIAIVGMACRLPGEVDSPERL

WELITSGRDSAAEVPDDRGWVPDELMASDAA GTRRAHGNFMAGAGDFDAAFFGISPRE

ALAMDPQQRQALETTWEALESAGIPPETLRG SDTGVFVGMSHQGYATGRPRPEDGVDG

YLLTGNTASVASGRIAYVLGLEGPALTVDTA CSSSLVALHTACGSLRDGDCGLAVAGG

VSVMAGPEVFTEFSRQGALSPDGRCKPFSDE ADGFGLGEGSAFVVLQRLSDARREGRR

VLGVVAGSAVNQDGASNGLSAPSGVAQQRVI RRAWARAGITGADVAVVEAHGTGTRLG

DPVEASALLATYGKSRGSSGPVLLGSVKSNI GHAQAAAGVAGVIKVLLGLERGVVPPM

LCRGERSGLIDWSSGEIELADGVREWSPAAD GVRRAGVSAFGVSGTNAHVIIAEPPEP

EPVPQPRRMLPATGVVPVVLSARTGAALRAQ AGRLADHLAAHPGIAPADVSWTMARAR

QHFEERAAVLAADTAEAVHRLRAVADGAVVP GVVTGSASDGGSVFVFPGQGAQWEGMA

RELLPVPVFAESIAECDAVLSEVAGFSVSEV LEPRPDAPSLERVDVVQPVLFAVMVSL

ARLWRACGAVPSAVIGHSQGEIAAAVVAGAL SLEDGMRVVARRSRAVRAVAGRGSMLS

VRGGRSDVEKLLADDSWTGRLEVAAVNGPDA VVVAGDAQAAREFLEYCEGVGIRARAI

PVDYASHTAHVEPVRDELVQALAGITPRRAE VPFFSTLTGDFLDGTELDAGYWYRNLR

HPVEFHSAVQALTDQGYATFIEVSPHPVLAS SVQETLDDAESDAAVLGTLERDAGDAD

RFLTALADAHTRGVAVDWEAVLGRAGLVDLP GYPFQGKRFWLLPDRTTPRDELDGWFY

RVDWTEVPRSEPAALRGRWLVVVPEGHEEDG WTVEVRSALAEAGAEPEVTRGVGGLVG

DCAGVVSLLALEGDGAVQTLVLVRELDAEGI DAPLWTVTFGAVDAGSPVARPDQAKLW

GLGQVASLERGPRWTGLVDLPHMPDPELRGR LTAVLAGSEDQVAVRADAVRARRLSPA

HVTATSEYAVPGGTILVTGGTAGLGAEVARW LAGRGAEHLALVSRRGPDTEGVGDLTA

ELTRLGARVSVHACDVSSREPVRELVHGLIE QGDVVRGVVHAAGLPQQVAINDMDEAA

FDEVVAAKAGGAVHLDELCSDAELFLLFSSG AGVWGSARQGAYAAGNAFLDAFARHRR

GRGLPATSVAWGLWAAGGMTGDEEAVSFLRE RGVRAMPVPRALAALDRVLASGETAVV

VTDVDWPAFAESYTAARPRPLLDRIVTTAPS ERAGEPETESLRDRLAGLPRAERTAEL

VRLVRTSTATVLGHDDPKAVRATTPFKELGF DSLAAVRLRNLLNAATGLRLPSTLVFD

HPNASAVAGFLDAELGTEVRGEAPSALAGLD ALEAALPEVPATEREELVQRLERMLAA

LRPVAQAADASGTGANPSGDDLGEAGVDELL EALGRELDGD" 

.......... 

How to extract your sequence data in FASTA format

There may be situations where you would like to export the sequences of certain domains, for example, if you wanted to construct sequences alignments. To do this select the get fasta option in the Tools pull-down menu. Select individual or multiple clusters (using control or shift keys simultaneously with a single left click on your mouse). Clicking on the add domains button will call up the choose profile dialog box. Refer to “HMMER” for instructions on how to complete this section. Select the cluster (or clusters) from the upper dialogue box and domains from the lower dialogue box using control or shift keys simultaneously with a single left click on your mouse. Select a location of your hard-drive where this will be saved in FASTA format by clicking on the browse button. Enter a file name and click on the open button. The dialogue box will appear again, click on the Get FASTA button. You can now view your annotation summary by selecting the out-put file name from your hard-drive.

 

CSDB (ClustScan-Professional DataBase) manager

You might want to save your data in the ClustScan database, which is in the public domain, or in a customized database that we can create and which you can restrict for your own use. Please contact us at clustscan@pbf.hr for terms and conditions.

 

CompGen: A module of ClustScan-Professional to generate recombinant clusters synthesizing novel chemical entities

CompGen is an integrated program package containing an expert system to model natural homeologous recombination in silico between two ClustScan-Professional annotated gene clusters. CompGen can be accessed by selecting the Recombination option in the Tools pull-down menu. Clusters you want to recombine can be selected from the first and second cluster pull down menus in the Recombination dialogue box. A list of default parameters to model the recombination can be modified by the user. In our experience, homeologous recombination requires about 30 bp of identical sequence (the MEPS length parameter), flanked by around 200 bp of sequence (MEPS extend parameter) sharing greater than 75% similarity (Required identity parameter). The CompGen recombination program finds identity and similarity in two sequences and links their co-ordinates to the parent clusters previously annotated in ClustScan-Professional so that the domains, linkers and dockers involved in the recombination process can be automatically inherited by the recombinants.

Pressing the Recombine button starts the recombination process. Each recombinant cluster will appear as a separate object in a new folder entitled Recombinants which is created in the Workspace.

A single left click on the recombinant.gif icon with your mouse will give you details of the recombination, while a double click will also give you a schematic representation of the recombination event. By clicking on the defaulttarget_obj_002.gif icon the recombination can be viewed and edited in the ClustScan Annotation editor.

The predicted molecules that could be synthesized by each recombinant can be viewed by a single right click with your mouse on the icon recombinant.gif and then selecting Get molecules. All molecules that could be synthesized from all possible recombination events between the two parent clusters can be viewed by a single right click with your mouse on the recombinants.gif icon and selecting Get all molecules. To obtain details of each molecule in a text format (SMILE) or graphically (ball and stick model) refer to the “In silico prediction of chemistry & exporting your data” section.

Some of the recombinants will likely produce the same molecules. These can be grouped together by using the option Group by smiles which can be selected by a single right click with your mouse on the recombinants.gif icon.

Recombinants can be filtered to include or exclude module architectures inherited from either or both parent clusters, for example, Proper Star (loading module), Proper End (thioesterase domain), Proper Start & End (loading module + thioesterase domain). The biosynthetic order of subsequent modules can be conserved using the Stringent option, while the order of these modules is not taken into consideration using the Relaxed option.  Individual modules and/or genes can be included or excluded from either parent cluster by selecting them in the appropriate dialogue box.

 

Any problems encountered can be solved by contacting us at:  clustscan@pbf.hr