Stratio Streaming 0.4.0 Released

  We are pleased to announce the release of Stratio Streaming 0.4.0 available in: API, maven central repository. Engine and shell, download page. Changelog The engine now handles action requests more quickly, even in large event processing batches. Now actions in streams are fully distributed. Heavy refactoring to discard unnecesary libraries. Added more unit tests. […]

How to use the Meta API from a Python Script

We have received messages of people asking us how to use Meta from Python. This post intends to give an answer to this question. We hope it will be useful to Stratio users. Queries can be sent to the Meta Server, using the Meta API, from a Python script using Py4j. This would allow us to […]

Stratio: a Certified Spark Distribution

Stratio has been certified by Databricks as a certified Spark distribution More at the Databricks blog: When Stratio Met Spark: A True Love Story. Congrats to the Stratio team!!!

Connecting to the Stratio Big Data Platform using ODBC

We are proud to announce the first release of our ODBC connector. This connector represents an entry point for external applications to connect to the Stratio Big Data Platform. By relying on the ODBC standard we open the possibility for BI tools to consume data from our platform. Additionally, the use of a well-known standard […]

Top-k queries in Cassandra: An embedded mapreduce approach

Stratio has just added top-k queries support to its Lucene based implementation of the Cassandra’s secondary indexes. This implementation was originally designed to allow embedded full-text and multivariable search in Apache Cassandra. The previous release included an ad-hoc mechanism to perform distributed relevance queries based on the Lucene’s scoring algorithm. The current release generalizes this […]

Advanced Stratio Deep tweaking: bisect factor

In my previous post describing the general architecture of Stratio Deep, I explained the basic idea behind Stratio Deep in which we mapped each Cassandra’s data split to a Spark partition. In this post I’ll explain how this could lead to sub-optimal performance due to the loss of data locality during data fetch, I’ll provide […]

Advanced search in C*

Stratio has released a Lucene based implementation of the Cassandra secondary indexes. It’s open sourced under the Apache License, v2.0 and you can get it on Apache Cassandra is a fast, robust and easily scalable database widely used by Stratio. Its NoSQL key-value/column oriented data model requires denormalization for already defined queries. MapReduce frameworks as Hadoop […]

A beginner’s guide to Stratio’s batch analysis tool

Over the past few months, while the Stratio platform was still in its early incubation stage, we had this feeling that the winning tools for the next generation BigData deployments were Apache Spark and Apache Cassandra. It was frustrating to see how little effort was made by the community to efficiently integrate the two to […]

Meta 0.0.4 released

With the increasing popularity and availability of Big Data, more and more companies are solving problems they were not able to solve before, by either using batch processing tools (like Hadoop), NoSQL databases (HBase, MongoDB, DynamoDB…), or stream processing tools (such as Storm). We are proud to announce the first release of Stratio Meta, that […]