SQL4PDB

Tables desciption

SQL4PDB is an application that allows to execute SQL query over a relational database with PostgreSQL, within the database is stored large quantities of PDB entries. However, some entries are discarded because they are not relevant or they are not a protein. Below, a general decription of every table in the database is given. These tables represent a record inside a PDB file. For example, SEQRES record is represented with chain and residue tables. Additionally, the amount of tuples is presented.

Table name	Description	Nº tuples
Atom	Represents the atoms inside an amino acid with their information and coordinates	890,552,918
Helix	Contains the information of a helix	4,110,517
Helix_subchain	Contains the amino acid sub-chains of a helix	42,582,125
Hetatom	Contains the information of the atoms in a ligand	15,108,800
Heterogen	Represents the ligands contained by a protein with theri name or chemical formula	4,734,798
Keyword	Protein keywords	629,679
Prot_chain	Represents a chain inside a protein, with the name, length and a amino acid chain	717,529
Prot_source	Source of a protein	192,807
Residue	Represents an amino acid of a protein chain, with the position in the chain and the ID	119,384,120
Sheet	Contains infomation of a sheet	949,298
SSBond	Represents the disulfure bonds of a protein	223,633
Standard_amino	Contains the general information of an amino acid	31
Strand	Contains information about the strands of a sheet	3,999,837
Protein	Contains the general information of a protein, like ID, name, clasification or number of components	135,199
Strand_subchain	Contains the amino acid sub-chain of a strand	18,176,390

Also, there are some discarded entries because they have a different PDB format or they are not to important to analize. The discarded entries are the followings:

DNA/RNA entries.
Entries with RNA or ELECTRON CRISTALOGRAPHY experiment.
Entries with aminoacids located in an alphanumeric position within a chain.
Entries that has a different aminoacid in a chain (Not standard).
Entries without aminoacid chains or without SEQRES record.

Relational schema

The database relational schema is presented below. This schema allows to visualize every table inside the database with their attributes. The bolded attributes are the primary key or the identification inside the database and the connections represents an attribute making reference to a primary key of another table, so it's possible to identify every entry data in the database.