Querying the pangenome ====================== Cypher is Neo4j’s graph query language that lets you ask specific questions or retrieve data from the graph database. The Cypher query language depicts patterns of nodes and relationships and filters those patterns based on labels and properties. While using node and relationship patterns in databases queries may seem a little daunting, it is easy to pick up! This page contains some example queries to help you get started. Feel free to email us if you have any question regarding Cypher queries. More information on Neo4j and the Cypher language: | `Neo4j Cypher Manual v3.5 `_ | `Neo4j Cypher Refcard `_ | `Neo4j API `_ **Match and return 100 nucleotide nodes** .. code:: text MATCH (n:nucleotide) RETURN n LIMIT 100 **Find all the genome nodes** .. code:: text MATCH (n:genome) RETURN n Retrieve the pangenome node .. code:: text MATCH (n:pangenome) RETURN n **Match and return 100 genes** .. code:: text MATCH (g:gene) RETURN g LIMIT 100 **Match and return 100 genes and order them by length** .. code:: text MATCH (g:gene) RETURN g ORDER BY g.length DESC LIMIT 100 **The same query as before but results are now returned in a table** .. code:: text MATCH (g:gene) RETURN g.name, g.address, g.length ORDER BY g.length DESC LIMIT 100 **Return genes which are between 100 and 250 bp. This can also be applied to other features such as exons introns or CDS.** .. code:: text MATCH (g:gene) where g.length > 100 AND g.length < 250 RETURN * LIMIT 100 **Find genes located on first genome** .. code:: text MATCH (g:gene) WHERE g.address[0] = 1 RETURN * LIMIT 100 **Find genes located on first genome and first sequence** .. code:: text MATCH (g:gene) WHERE g.address[0] = 1 AND g.address[1] = 1 RETURN * LIMIT 100 **Obtain genes between 100 and 250 nucleotides** .. code:: text MATCH (g:gene) where g.length > 100 AND g.length < 250 RETURN * **Return pfam identifiers for genes between 100 and 250 nucleotides long** .. code:: text match (n:mRNA)--(m:pfam) where n.length > 100 and n.length < 150 return m.id **Return all genes for a specific contig and count them** .. code:: text MATCH (n:gene) WHERE n.address[0] = 1 and n.address[1] = 1 RETURN count(n) **Return all genes genes between 1000-1500 nucleotides and order them by length** .. code:: text MATCH (n:gene) WHERE n.length > 1000 and n.length < 1500 RETURN n order by n.length DESC **Returns the homology group matching your gene of interest** .. code:: text MATCH (n:homology_group)--(m:mRNA)--(g:gene) WHERE g.name = 'GENE\_NAME' RETURN * **Returns the genes of genome 1 that don’t have a homolog in a the other genome** .. code:: text MATCH (n:homology_group)--(m:mRNA)--(g:gene) where n.num_members = 1 and g.genome = 1 RETURN g **Retrieve unique GO identifiers for mRNA’s with a signal peptide** .. code:: text MATCH (m:mRNA)--(g:GO) where m.signalp_signal_peptide = true RETURN DISTINCT m.id, g.id **Return all sequence nodes for a specific contig** .. code:: text MATCH (n)-[r]->() WHERE exists (r.'a1\_1') and (n:degenerate or n:node) RETURN id(n), n.sequence , r.'a1\_1' **Return all sequence nodes for a specific contig within the range of position 1000 and 2000** .. code:: text MATCH (n)-[r]->() WHERE exists (r.'a1\_1') and (n:degenerate or n:node) and r.'a1'\_1[0] > 1000 and r.'a1\_1'[0] < 2000 RETURN id(n), n.sequence, r.'a1\_1' **Find SNP bubbles in the graph. For simplification we only use the FF relation** .. code:: text MATCH p= (n:nucleotide) -[:FF]-> (a1)-[:FF]->(m:nucleotide) <-[:FF]-(b1) <-[:FF]- (n) return * limit 50