Database info

Tables desciption

SQL4PDB is an application that allows to execute SQL query over a relational database with PostgreSQL, within the database is stored large quantities of PDB entries. However, some entries are discarded because they are not relevant or they are not a protein. Below, a general decription of every table in the database is given. These tables represent a record inside a PDB file. For example, SEQRES record is represented with chain and residue tables. Additionally, the amount of tuples is presented.

Table name Description NÂș tuples
AtomRepresents the atoms inside an amino acid with their information and coordinates 890,552,918
HelixContains the information of a helix 4,110,517
Helix_subchainContains the amino acid sub-chains of a helix 42,582,125
HetatomContains the information of the atoms in a ligand 15,108,800
HeterogenRepresents the ligands contained by a protein with theri name or chemical formula 4,734,798
KeywordProtein keywords 629,679
Prot_chainRepresents a chain inside a protein, with the name, length and a amino acid chain 717,529
Prot_sourceSource of a protein 192,807
ResidueRepresents an amino acid of a protein chain, with the position in the chain and the ID 119,384,120
SheetContains infomation of a sheet 949,298
SSBondRepresents the disulfure bonds of a protein 223,633
Standard_aminoContains the general information of an amino acid 31
StrandContains information about the strands of a sheet 3,999,837
ProteinContains the general information of a protein, like ID, name, clasification or number of components 135,199
Strand_subchainContains the amino acid sub-chain of a strand 18,176,390

Also, there are some discarded entries because they have a different PDB format or they are not to important to analize. The discarded entries are the followings:

  • DNA/RNA entries.
  • Entries with RNA or ELECTRON CRISTALOGRAPHY experiment.
  • Entries with aminoacids located in an alphanumeric position within a chain.
  • Entries that has a different aminoacid in a chain (Not standard).
  • Entries without aminoacid chains or without SEQRES record.

Relational schema

The database relational schema is presented below. This schema allows to visualize every table inside the database with their attributes. The bolded attributes are the primary key or the identification inside the database and the connections represents an attribute making reference to a primary key of another table, so it's possible to identify every entry data in the database.