Abstract
In the era of the Big Data, how to analyze such a vast quantity of data is a challenging problem, and conducting a multi-way theta-join query is one of the most time consuming operations. MapReduce has been mentioned most in the massive data processing area and some join algorithms based on it have been raised in recent years. However, MapReduce paradigm itself may not be suitable to some scenarios and multi-way theta-join seems to be one of them. Many multi-way theta-join algorithms on traditional parallel database have been raised for many years, but no algorithm has been mentioned on the CMD (coordinate modulo distribution) storage method, although some algorithms on equal-join have been proposed. In this paper, we proposed a multi-way theta-join method based on CMD, which takes the advantage of the CMD storage method. Experiments suggest that it's a valid and efficient method which achieves significant improvement compared to those applied on the MapReduce.
| Original language | English |
|---|---|
| Pages (from-to) | 62-78 |
| Number of pages | 17 |
| Journal | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
| Volume | 8421 LNCS |
| Issue number | PART 1 |
| DOIs | |
| Publication status | Published - 2014 |
| Externally published | Yes |
| Event | 19th International Conference on Database Systems for Advanced Applications, DASFAA 2014 - Bali, Indonesia Duration: 21 Apr 2014 → 24 Apr 2014 |
Keywords
- CMD
- Multi-way Theta-Join