CisBP Update Log

  • Build 2.00 This build contains the data used in the follow-up Cis-BP publication (Lambert et al., Nature Genetics 2019). Details are provided in that publication. Major updates include:
    • Methodological updates
      • Motif inferences are now performed using the improved “Similarity Regression” method described in Lambert et al., instead of the original “%DNA binding domain amino acid identity” method.
      • When two predicted DBDs overlap in a given protein, only the DBD with the most significant HMMER p-value is now retained.
      • Matches to the Pfam Myb/SANT domain are now further subclassified into Myb (which binds DNA specifically; this also includes Myb-like sequences, which are also likely to bind DNA), or SANT (which does not bind DNA specifically). This procedure is now applied to both remove SANT-only containing proteins (which are not TFs), and remove SANT domains from proteins that contain both Myb and SANT domains.
      • We now remove 1-1 orthologs (reciprocal best BLAST hits) of metazoan proteins with hand-curated false-positive human TFs derived from a recent curation effort (Lambert et al., Cell 2018).
    • Data contents updates
      • We have updated the Pfam DBD models to use the latest versions. We now include three new models: EBF1 (COE1_DBD), FLYWCH, and ICP4 (Herpes_ICP4_N). We have removed models for DP and SART-1, which are now known to not bind DNA with specificity.
      • We have updated the genome builds (protein sequences, protein and gene IDs, gene names, and gene aliases).
      • The set of human TFs now matches the set of 1,639 curated TFs provided in our recent publication (Lambert et al., Cell, 2018)
      • We now include more genomes: 741 total (previously 340).
      • The new genomes result in more TFs: 392,333 known and putative eukaryotic TFs (previously 167,081).
      • We now include additional motifs: new motifs were obtained from 38 new sources, bringing the current database contents to 11,491 motifs for 4,559 distinct TFs (previously 6,559 motifs for 3,202 distinct TFs).
      • We have updated the motif sets to use the latest database builds for UNIPROBE, Transfac, JASPAR, and HOCOMOCO.

  • Build 1.02 This build adds PBM experiments for 129 C. elegans TFs from (Narasimhan et al., eLife 2015) and motifs derived from modENCODE ChIP-seq experiments taken from (Boyle et al., Nature 2014). The motif inferences have also been updated.

  • Build 1.01 This is the first “official” build of the database (released on October 28, 2014).
    • Additions to this build:
      1. Incorporation of TF binding motifs from the HOCOMOCO database (
      2. Update to the latest version of Transfac
      3. Update to the latest version of JASPAR
    • Data updates for this build:
      1. Inferences are now included based on data from Jolma et al. Cell, 2013
      2. Several minor bug fixes relating to mappings of motifs to proper TF isoforms, multi-DBD TFs, etc.

  • Build 1.00 This build was "recalled" because it was pointed out that many of the motifs we included from the latest version of Transfac were based on purely computational predictions, which is in direct conflict with our definition of a "Directly" determined motif (Direct motifs must be obtained from an experimental assay).

  • Build 0.90 This build contains the data used in the associated publication (Weirauch et al., Cell 2014).