Mabellini Database

An on-line source for Mycobacterium abscessus modeled structural proteome

How to cite:

Marcin J Skwark, Pedro H M Torres, Liviu Copoiu, Bridget Bannerman, R Andres Floto, Tom L Blundell, Mabellini: a genome-wide database for understanding the structural proteome and evaluating prospective antimicrobial targets of the emerging pathogen Mycobacterium abscessus, Database, Volume 2019, Issue 1, 2019, baz113, https://doi.org/10.1093/database/baz113

Fast Help Links for Important Pages:

Query the database

In Mabellini Database there are six different querying methods:

  1. Gene ID
  2. Uniprot ID
  3. Pfam ID
  4. Gene Ontology(GO)
  5. Enzyme Commission number(EC)
  6. Blast Search

For each one of the different queries, you can either use (i) the exact code for that query, (ii) a keyword to be matched to that specific type of query or (iii) a partial keyword to be matched in that specific query.

Here are some example codes for searching:

Query Type Code
Gene ID MAB_1133c, MAB_2104c, MAB_2530c, MAB_2778c
Uniprot ID B1MKC9, B1MPD4, B1MBJ0, B1MC86
Pfam ID PF00069, PF01180, PF00199, PF00162
Gene Ontology GO:0004674, GO:0004152, GO:0016491, GO:0006096
EC number 1.3.5.2, 1.11.1.6, 2.7.2.3

Keywords and Partial Keywords

The partial keywords searches are achieved through the use of asterisks as wildcards. Adding the asterisk at the beginning will search for all entries with words ending in the partial keyword, whereas adding it to the end of the pattern will search for al words that begin with that patter. Adding an asterisk both before and after the search pattern will search for words containing that pattern anywhere. For example, you might be interested in all proteins whose function is related to the metabolism and processing of groups containing phosphate. Therefore, if you search for *phospho* in the gene search tab, you will get a total of 88 entries such as:

MAB_0133 Probable phosphoglycerate mutase
MAB_0163c Probable phosphotransferase
MAB_0200c Probable phosphoesterase, PA-phosphatase related protein
MAB_0254c Putative phosphoglycerate mutase
MAB_0310c Putative cyclopropane-fatty-acyl-phospholipid synthase
MAB_0172 Probable phosphoesterase
MAB_0313c Putative aminoglycoside phosphotransferase

Analogously, you can also perform such searches on different querying methods, with the exception of the BLAST search, which will take a protein sequence as input. Searching for keywords in Pfam tab, for example, will return entries that contain that keyword (or partial keyword) in the Pfam name.

Blast Search

To perform a blast search against Mabellini database, simply input a amino acid sequence in the search box. That sequence that will be queried using Blastp against a database containing only the proteins in the M. abscessus proteome (Proteome ID: UP000007137). The Blastp search maximum E-value is 0.001 and results are displayed if they have a minimum identity and coverage values of 30%.

Sunburst queries

To facilitate navigation and exploration of the data available in Mabellini we chose to make sunburst queries available. Four different sunburst graphs can be explored in this way: (I) GO Biological Process, GO Molecular Functions, GO Cellular Component and Enzyme Comission Numbers. Links for all these queries are located in the home page.

Upon navigating to one of these pages, you will be presented with hierarchical classification of the proteins (that have thus been annotated), and by clicking on any section of the graph, it will automatically morph to provide easier access to the sub-levels of the clicked section. Hovering the mouse pointer onto a given section of the graph will pop-up a tooltip informing the name of the section, the code, and the number of proteins that will be retrieved upon querying that level. Clicking the central circle will navigate to upper levels.

By clicking the query button, right below the sunburst wheel, you will retrieve all proteins annotated with that particular GO term (or EC number) and all sub-levels and they will be presented as a results table.

Multiple Hits and Browse Pages

By clicking the Browse tab, it is possible to retrieve a table showing the complete list of genes. All proteins in the UP000007137 proteome are listed, even if no model has been generated for it, in which case the row will be highlighted in light red. This table is much like a Multiple Hits table, which returns a list of results based on your query input, and they will both be described jointly in this section.

These tables have the following fields: Gene ID, Gene Name, Gene Length, Percentage Modelled, Uniprot, Pfam, Enzyme Comission and Available Models. The table can be reordered by any of the fields, by clicking on the ▲▼ symbols on the table header, as well as queried by inputting terms on the fields right below each column header.

For the results table, the queried terms are presented on the top left area of the page.

Models Page

The Models Page is divided into three sections:

  1. Gene Information Cards
  2. Model Tables
  3. Model Viewer

Section 1 - Gene Information Cards

This section comprises information about the protein, such as the UniProt ID (which is also a link to the external UniProt entry), the name of the protein, its length and a brief description of its function, if known.

Furthermore, you can also quickly query the database for proteins that are annotated with the same Pfam domains or the same E.C. numbers, by hitting the green query button in the tables to the right.

In this section one can also find the protein sequence, highlighted according to the coverage achieved by the selected model in the tables right below it; i.e. the sequence will be highlighted in yellow to match the sequence of the three-dimensional structure shown in the model viewer below.

Section 2 - Model Tables

This section contains up to five models divided into two tables, according to the liganded state. Models without ligands are displayed in the first table while liganded models are displayed in the bottom one.

These tables contain the model quality in percentile rank (Q1, see original manuscript), the model coverage, length, starting and ending residues, the name of the ligand (if present) the templates used in the modelling and a download button, clicking upon which will retrieve the model structure in the PDB format. Both tables are orderable according to any of the numeric fields. Immediately below the table there is a “download all models” button, which will generate a compressed file containing all structures.

Section 3 - Model Viewer

For models with ligands, residues 4Å from any ligand atom are displayed as ball and stick representation in

Viewer General Movement Controls

Representation styles

Five available colouring schemes:

Interactions

NOTE: All deffinitions follow the NGL constrains and the RCSB guidlines

Programmatic Access

The currently implemented API includes queries by (i) Identifiers (Pfam, Ordered Locus, UniProt, Enzyme Comission, Gene Ontology), (ii) text, (iii) ligand ID, (iv) single model retrieval and (v) best models. The queries return JSON files that can be easily parsed programmatically containing information about the genes and models. The usage is described below.

Search by identifier:

Each of the identifiers is case insensitive. A query takes a single identifier and returns the information about gene (or genes) and links to the model JSON files.

Usage:
http://mabellinidb.science/api/gene/[ID] - for ordered locus names
http://mabellinidb.science/api/oln/[ID] - alias for the former
http://mabellinidb.science/api/uniprot/[ID] - for UniProt IDs
http://mabellinidb.science/api/pfam/[ID] - for Pfam IDs
http://mabellinidb.science/api/EC/[ID] - for EC IDs
http://mabellinidb.science/api/GO/[ID] - for GO terms
Examples:
http://mabellinidb.science/api/gene/MAB_1234

Free text search:

Case sensitive search through the free-form fields of the database, that is names of genes and descriptions. The text is not tokenized, thus out-of-order search does not match.

Usage:
http://mabellinidb.science/api/text/[text]
Example:
http://mabellinidb.science/api/text/cell%20wall

Single model retrieval:

Given the internal model ID (returned by the search function above), return the URL of model's PDB and mmCIF files, as well as model descriptors in terms of quality, templates and ligand status.

Usage:
http://mabellinidb.science/api/model/[ID]
Example:
http://mabellinidb.science/api/model/MAB_1668__rank01__apo__1-341__Q_0.836_0.378.pdb

The function also recognizes queries in a form [ordered locus name]_[model rank]. The example below would retrieve the first ranked model for MAB_1234.

Example:
http://mabellinidb.science/api/model/MAB_1234__1

Retrieval of model IDs by ligand:

Given a three-letter ligand code (as per PDB small molecule dictionary), returns URLs for all the JSON records for models containing the ligand.

Usage:
http://mabellinidb.science/api/ligand/[ID]
Example:
http://mabellinidb.science/api/ligand/COA

Best models:

A convenience, parameter-free function, returning URLs of all the JSON records for the first ranked models.

Usage:
http://mabellinidb.science/api/bestModels