Tips on Parameter Choice

Node (Gene Set inclusion) Parameters

  • Node specific parameters filter the gene sets included in the enrichment map.

  • For a gene set to be included in the enrichment map it needs to pass both p-value and q-value thresholds.

P-value

  • All gene sets with a p-value with the specified threshold or below are included in the map.

FDR Q-value

  • All gene sets with a q-value with the specified threshold or below are included in the map.

  • Depending on the type of analysis the FDR Q-value used for filtering genesets by EM is different

    • For GSEA the FDR Q-value used is 8th column in the gsea_results file and is called “FDR q-val”.

    • For Generic the FDR Q-value used is 4th column in the generic results file.

    • For David the FDR Q-value used is 12th column in the david results file and is called “Benjamini”.

    • For Bingo the FDR Q-value used is 3rd column in the Bingo results file and is called “core p-value”

Edge (Gene Set relationship) Parameters

  • An edge represents the degree of gene overlap that exists between two gene sets, A and B.

  • Edge specific parameters control the number of edges that are created in the enrichment map.

  • Only one coefficient type can be chosen to filter the edges.

Jaccard Coefficient

Jaccard Coefficient = [size of (A intersect B)] / [size of (A union B)]

Overlap Coefficient

Overlap Coefficient = [size of (A intersect B)] / [size of (minimum( A , B))]

Combined Coefficient

  • the combined coefficient is a merged version of the jacquard and overlap coefficients.

  • the combined constant allows the user to modulate reciprocally the weights associated with the jacquard and overlap coefficients.

  • When k = 0.5 the combined coefficient is the average between the jacquard and overlap.

Combined Constant = k
Combined Coefficient = (k * Overlap) + ((1-k) * Jaccard)

Tips on Parameter Choice

P-value and FDR Thresholds

GSEA can be used with two different significance estimation settings: gene-set permutation and phenotype permutation. Gene-set permutation was used for Enrichment Map application examples.

Gene-set Permutation

Here are different sets of thresholds you may consider for gene-set permutation:

Very permissive:
  • p-value < 0.05

  • FDR < 0.25

Moderately permissive:
  • p-value < 0.01

  • FDR < 0.1

Moderately conservative:
  • p-value < 0.005

  • FDR < 0.075

Conservative:
  • p-value < 0.001

  • FDR < 0.05

For high quality, high coverage transcriptomic data, the number of enriched terms at the very conservative threshold is usually 100-250 when using gene-set permutation.

Phenotype Permutation

Recommended:
  • p-value < 0.05

  • FDR < 0.25

In general, we recommend to use permissive thresholds only if your having a hard time finding any enriched terms.

Jaccard vs. Overlap Coefficient

  • The Overlap Coefficient is recommended when relations are expected to occur between large-size and small-size gene-sets, as in the case of the Gene Ontology.

  • The Jaccard Coefficient is recommended in the opposite case.

  • When the gene-sets are about the same size, Jaccard is about the half of the Overlap Coefficient for gene-set pairs with a small intersection, whereas it is about the same as the Overlap Coefficient for gene-sets with large intersections.

  • When using the Overlap Coefficient and the generated map has several large gene-sets excessively connected to many other gene-sets, we recommend switching to the Jaccard Coefficient.

Overlap Thresholds

  • 0.5 is moderately conservative, and is recommended for most of the analyses.

  • 0.3 is permissive, and might result in a messier map.

Jaccard Thresholds

  • 0.5 is very conservative

  • 0.25 is moderately conservative