Data, data, data everywhere.

Hypi offers a powerful query language as part of its platform. In this post we’ll take an introductory look at its syntax. 

Why a query language? Another query language! 
We evaluated several options when we were designing our API, in the end we simply couldn’t find one that did everything we wanted AND was easily extended to support our custom features. 

Hypi in the end designed a query language, HypiQL that is modelled off of the Apache Lucene query language. We went further and included some SQL like features e.g. SORT, FROM, LIMIT. 

We’re currently working to launch machine learning and distributed compute which the query language will integrate with (and why it was important we could extend easily).

Another reason we went for HypiQL is SQL and other options we considered were not a very good fit as our API is GraphQL based, GraphQL is already the means by which we perform data selections and it excels at this. What is missing from GraphQL is the capability to filter, which the standard left as an exercise to the reader!

HypiQL is made up of four major components.

<query> <sort> <from> <limit> i.e. it is very SQL like. Designed intentionally so to help make the learning curve as small as possible for third party developers. If they’re familiar with SQL or the Lucene query language they’ll probably be able to write HypiQL queries by just guessing at it.

Let’s break the four components down but leave query for last since it is the most complex.

<sort> – For specifying how to sort matching results

SORT fieldName (ASC|DESC)? (,fieldName ASC|DES)*

Examples:

SORT a
SORT a ASC
SORT a DESC
SORT a, b.c DESC, c

<from> – For specifying a paging token

FROM 'some token` this is all there is to it. When you search, every object returned by the API includes their paging token in the special hypi field. Take the token from the last object we returned to you and pass it back. We will then send back results after this object.

<limit> – For limiting the number of results returned

LIMIT 50 this is all there is to it, when you search we will impose a max limit the query, currently 1024.

<query> – The filter(s)

Firstly, there are different types of queries. Currently these are:

  1. Term queries
  2. Phrase queries
  3. Prefix queries
  4. Wildcard queries
  5. Fuzzy queries
  6. Range queries
  7. Match all queries

Though they are not query types in their own right, we also support two assertions that can be combined with any of the available queries, these are:

  1. EXIST fieldName – only return results where fieldName does exists
  2. NOT EXIST fieldName – only return results where fieldName does not exist

TERM queries

A term query is a simple filter asking to return results that match the value provided exactly. Examples:

a = 'some string'
a = 123
a = 'some string' OR 123 AND 'abc'

Boolean Logic is possible on all query types. The general form is as demonstrated on line 3 (the last example). The last example says “return objects where field a is some string or where a is 123 AND a is abc“.

This query is non-sensical because and will only return objects that have a set to abc .

The reason for this is that AND is treated as an absolute assertion requiring that the given field MUST have the requested value in order for it to match.

Phrase queries

A phrase query is similar to what an end user might expect a search engine to do. You search for New York it will return objects containing this exact phrase or the individual words.

a ~ 'some string'
a ~ 123
a ~ 'some string' OR 123

Prefix queries

A prefix query will take the terms you’ve searched for and match any object where the contents of the field starts with those terms

a ^ 'some string'
a ^ 123
a ^ 'some string' OR 123

Wildcard queries

A wildcard query takes the terms searched for and treats * and ? as special characters.

  1. * means match anything from this point onwards
  2. ? means match any single character at this positiona * ‘some 
    a * 123

    a * ‘some?str*’ OR 123

Fuzzy queries

A fuzzy query takes the terms searched for and tries to match words that are similar even if spelt slightly differently e.g. tamename and game would match if you searched for tame.

~ a ~ 'some string'
~ a ~[1] 'some string' OR 'other string'
~ a ~[1,5] 'some string' OR 'other string'
~ a ~[1,5,10] 'some string' OR 'other string'
~ a ~[1,5,10,true] 'some string' OR 'other string'

The numbers and the boolean all you to tweak how the fuzzy algorithm runs. The parameters are

  1. max edits
  2. prefix length
  3. max expansion
  4. allow transpositions

In that order.

Range queries

Range queries are a means for you to search for content that falls within the given range.

a IN [0, 1)
a IN (0, 1]
a IN (0, 1)
a IN [0, 1]
//any of the above can also use boolean logic e.g.
a IN [0, 1 OR 5,10 AND 10, 11)
//all of the above also works for strings
a IN ['America', 'Jamaica')

These are the standard mathematical representations of ranges and work exactly as you’d expect i.e.

  1. [0, 1) – left inclusive, i.e. including 0, excluding 1
  2. (0, 1] right inclusive, i.e. excluding 0, including 1
  3. (0, 1) exclusive, i.e. not including 0 or 1, only those between
  4. [0, 1] inclusive, i.e. including both 0, 1 and everything in between

Match all queries

A match all query is simple a query with the value *, it will return all documents unless other filters restrict it.

Combining the various queries work as expected

a = 'something' OR b ~ 'New York' AND c ^ 'alpha' OR ~d~ "something" OR a IN [0,10)

1
Leave a Reply

avatar
1 Comment threads
0 Thread replies
0 Followers
 
Most reacted comment
Hottest comment thread
0 Comment authors
Intro Hypi High Performance Distributed In Memory Computing with Apache Ignite - Hypi Recent comment authors
  Subscribe  
newest oldest most voted
Notify of
trackback

[…] The filter parameter in the API refers to a HypiQL filter  […]