Querying with Apache Ignite

In my last blog, I covered the basics of starting an ignite node or a cluster. We also wrote a simple Hello-World Program in Java that starts an ignite node as jar, and how we can put data into cache(cache.put) and retrieve it based on a key(cache.get) i.e a basic lookup.

However, ignite is not limited to data retrieval using keys only! We can also do advanced stuffs like querying using fields, values etc. In this article, we will see how. Here are a few things about ignite querying:

Ignite loads all the data to it’s memory and performs all the computation in memory, so it is fast – very fast!
It internally runs an H2 engine (H2 in memory/disk based database)
We may write SQL like queries to query data on fields of our interest
Ignite, by default, stores data in off-heap memory, thereby reducing load on JVM and caveats around garbage collection etc.

Quoting from the official website:

An embedded H2 instance is always started as a part of an Apache Ignite node process whenever ignite-indexing module is added to the node’s classpath. If the node is started from a terminal using ignite.sh{bat} script then copy {apache_ignite}\libs\optional\ignite-indexing directory to {apache_ignite}\libs\. If you use Maven then add the dependency below to a pom.xml file

<dependency>
    <groupId>org.apache.ignite</groupId>
    <artifactId>ignite-indexing</artifactId>
    <version>${ignite.version}</version>
</dependency>

Broadly categorising, we may query data using:

SQL queries
SQL fields queries (something like ‘SELECT * FROM WHERE …’)
Scan queries
Text queries

Let’s explore them one by one. Please find the github link for entire source code listed in the reference below from where the code snippets have been used.

SQL Queries

SQL queries are particularly helpful when we need the entire object based on certain simple conditions that may be run on the fields. If we are running ignite as a replicated cluster, then the query is executed locally on the node. However, if running on a partitioned cache mode, the query runs on the remote node where data is stored locally and then aggregated in the resultset.

Note: Make sure to annotate the fields to query as @QuerySqlField. Optionally, we may set indexing to true by passing argument (index=true)

SqlQuery sql = new SqlQuery<AffinityKey<Integer>, Data>(Data.class, "index > ?");

// Find all data with index greater than 90.
System.out.println("Query result SQL query: ");

cache.query(sql.setArgs(90)).getAll()
     .forEach(x -> System.out.println(x.toString()));

Scan Query

Scan queries are highly scalable queries that suite best in a partitioned and distributed mode for querying cache in distributed form based on some user defined predicate.

The requested predicate is sent to each node in the cluster.
Each node queries its own cache for entries that satisfy the given predicate.
The predicate requester consolidates the results received from each node into a single set.

Note: Make sure to annotate the fields to query as @QuerySqlField. Optionally, we may set indexing to true by passing argument (index=true)

//Filter
ScanQuery query = new ScanQuery();

query.setFilter((key, data) -> {
  Data d = (Data) data;
  return d.getIndex() > 90;
});

 
// Find all data with index greater than 90.
System.out.println("Query result using Scan Query: ");
cache.query(query).getAll()
     .forEach(x -> System.out.println(x.toString()));

SQL Fields Query

Data can be queried from multiple caches as part of a single SqlQuery or SqlFieldsQuery. In this case, cache names act as schema names in conventional RDBMS like SQL queries. The name of the cache that is used to create an IgniteCache instance, that is used to execute the query, will be used as a default schema name and does not need to be explicitly specified. The rest of the objects, that are stored in different caches and will be queried, have to be prefixed with the names of their caches (additional schemas names).

Note: Make sure to annotate the fields to query as @QuerySqlField. Optionally, we may set indexing to true by passing argument (index=true)

// Find all data where subdata contains regex subdata5
SqlFieldsQuery sql = new SqlFieldsQuery("SELECT * from Data where (subData REGEXP 'subdata5')");

System.out.println("Query result SQL query: ");
cache.query(sql).getAll()
     .forEach(x -> System.out.println(x.toString()));

Text Query

Text queries leverage lucene indexes on field attributes. By default, the texts are tokenised based on ” ” token (space character). Given a text string, it will search in the indexes across all the fields marked @QueryTextField. All those records that has atleast one field satisfying the text match condition will be returned.

TextQuery query = new TextQuery(Data.class, "subdata5");

// Find all data with atleast one field's value as "subdata5"
System.out.println("Query result using text query: ");

cache.query(query).getAll()
     .forEach(x -> System.out.println(x.toString()));

Conclusion

Ignite is more suited if the Record has a structured schema. Though it supports NoSQL persistent store (will cover in my next blog), it somehow does feel an elegant way of handling unstructured data in a structured manner. It is more of a retro fitting of sorts. But, that’s how ignite is designed and implemented.
It is not, in any sense, a search framework, and hence shouldn’t be used as one.
In some cases, I have faced it would have been better iv we could have composite queries (a combination of text query with sql fields query for example). But if so, probably we are not using it the way it is supposed to. It is strictly an in-memory data grid.

Querying with Apache Ignite

SQL Queries

Scan Query

SQL Fields Query

Text Query

Conclusion

References

Published by Sam Banerjee

Leave a comment Cancel reply

SQL Queries

Scan Query

SQL Fields Query

Text Query

Conclusion

References

Share if you care

Related

Published by Sam Banerjee

Leave a comment Cancel reply