Data, data, data everywhere.
Hypi offers a powerful query language as part of its platform. In this post we’ll take an introductory look at its syntax.
Why a query language? Another query language!
We evaluated several options when we were designing our API, in the end we simply couldn’t find one that did everything we wanted AND was easily extended to support our custom features.
Hypi in the end designed a query language, HypiQL that is modelled off of the Apache Lucene query language. We went further and included some SQL like features e.g. SORT, FROM, LIMIT.
We’re currently working to launch machine learning and distributed compute which the query language will integrate with (and why it was important we could extend easily).
Another reason we went for HypiQL is SQL and other options we considered were not a very good fit as our API is GraphQL based, GraphQL is already the means by which we perform data selections and it excels at this. What is missing from GraphQL is the capability to filter, which the standard left as an exercise to the reader!
HypiQL is made up of four major components.
<query> <sort> <from> <limit> i.e. it is very SQL like. Designed intentionally so to help make the learning curve as small as possible for third party developers. If they’re familiar with SQL or the Lucene query language they’ll probably be able to write HypiQL queries by just guessing at it.
Let’s break the four components down but leave query for last since it is the most complex.
<sort> – For specifying how to sort matching results
SORT fieldName (ASC|DESC)? (,fieldName ASC|DES)*
SORT a SORT a ASC SORT a DESC SORT a, b.c DESC, c
<from> – For specifying a paging token
FROM 'some token` this is all there is to it. When you search, every object returned by the API includes their paging token in the special
hypi field. Take the token from the last object we returned to you and pass it back. We will then send back results after this object.
<limit> – For limiting the number of results returned
LIMIT 50 this is all there is to it, when you search we will impose a max limit the query, currently
<query> – The filter(s)
Firstly, there are different types of queries. Currently these are:
- Term queries
- Phrase queries
- Prefix queries
- Wildcard queries
- Fuzzy queries
- Range queries
- Match all queries
Though they are not query types in their own right, we also support two assertions that can be combined with any of the available queries, these are:
EXIST fieldName– only return results where
NOT EXIST fieldName– only return results where
fieldNamedoes not exist
A term query is a simple filter asking to return results that match the value provided exactly. Examples:
a = 'some string' a = 123 a = 'some string' OR 123 AND 'abc'
Boolean Logic is possible on all query types. The general form is as demonstrated on line 3 (the last example). The last example says “return objects where field a is
some string or where a is
123 AND a is
This query is non-sensical because and will only return objects that have a set to
The reason for this is that
AND is treated as an absolute assertion requiring that the given field
MUST have the requested value in order for it to match.
A phrase query is similar to what an end user might expect a search engine to do. You search for
New York it will return objects containing this exact phrase or the individual words.
a ~ 'some string' a ~ 123 a ~ 'some string' OR 123
A prefix query will take the terms you’ve searched for and match any object where the contents of the field starts with those terms
a ^ 'some string' a ^ 123 a ^ 'some string' OR 123
A wildcard query takes the terms searched for and treats
? as special characters.
*means match anything from this point onwards
?means match any single character at this positiona * ‘some ‘
a * 123
a * ‘some?str*’ OR 123
A fuzzy query takes the terms searched for and tries to match words that are similar even if spelt slightly differently e.g.
game would match if you searched for
~ a ~ 'some string' ~ a ~ 'some string' OR 'other string' ~ a ~[1,5] 'some string' OR 'other string' ~ a ~[1,5,10] 'some string' OR 'other string' ~ a ~[1,5,10,true] 'some string' OR 'other string'
The numbers and the boolean all you to tweak how the fuzzy algorithm runs. The parameters are
In that order.
Range queries are a means for you to search for content that falls within the given range.
a IN [0, 1) a IN (0, 1] a IN (0, 1) a IN [0, 1] //any of the above can also use boolean logic e.g. a IN [0, 1 OR 5,10 AND 10, 11) //all of the above also works for strings a IN ['America', 'Jamaica')
These are the standard mathematical representations of ranges and work exactly as you’d expect i.e.
[0, 1)– left inclusive, i.e. including 0, excluding 1
(0, 1]right inclusive, i.e. excluding 0, including 1
(0, 1)exclusive, i.e. not including 0 or 1, only those between
[0, 1]inclusive, i.e. including both 0, 1 and everything in between
Match all queries
A match all query is simple a query with the value
*, it will return all documents unless other filters restrict it.
Combining the various queries work as expected
a = 'something' OR b ~ 'New York' AND c ^ 'alpha' OR ~d~ "something" OR a IN [0,10)