Sybase Blog -Anything About Sybase ASE,REP,IQ.

Friday, January 28, 2011

Joins Algorithms

Source : sybase.com and www

Time Complexity of Nested Loop Join Algo : O(N * M)

Time Complexity of a HASH JOIN is O(N + M), where N is the hashed table and M the is lookup table. Hashing and hash lookups have constant complexity.

Time Complexity of a MERGE JOIN is O(N*Log(N) + M*Log(M)): it's the sum of times to sort both tables plus time to scan them.

Hash Join Algo
====================
The hash join algorithm builds an in-memory hash table of the smaller of its two inputs, and then reads the larger input and probes the in-memory hash table to find matches, which are written to a work table. If the smaller input does not fit into memory, the hash join operator partitions both inputs into smaller work tables. These smaller work tables are processed recursively until the smaller input fits into memory.
The hash join algorithm has the best performance if the smaller input fits into memory, regardless of the size of the larger input. In general, the optimizer will choose hash join if one of the inputs is expected to be substantially smaller than the other.

Merge Join Algo
==================
The merge join algorithm reads two inputs which are both ordered by the join attributes. For each row of the left input, the algorithm reads all of the matching rows of the right input by accessing the rows in sorted order.
If the inputs are not already ordered by the join attributes (perhaps because of an earlier merge join or because an index was used to satisfy a search condition), then the optimizer adds a sort to produce the correct row order. This sort adds cost to the merge join.
One advantage of a merge join compared to a hash join is that the cost of sorting can be amortized over several joins, provided that the merge joins are over the same attributes. The optimizer will choose merge join over a hash join if the sizes of the inputs are likely to be similar, or if it can amortize the cost of the sort over several operations.

Nested Loops Join Algo
==================
The nested loops join computes the join of its left and right sides by completely reading the right hand side for each row of the left hand side. (The syntactic order of tables in the query does not matter, because the optimizer chooses the appropriate join order for each block in the request.)

The optimizer may choose nested loops join if the join condition does not contain an equality condition, or if it is optimizing for first-row time.
Since a nested loops join reads the right hand side many times, it is very sensitive to the cost of the right hand side. If the right hand side is an index scan or a small table, then the right hand side can likely be computed using cached pages from previous iterations. On the other hand, if the right hand side is a sequential table scan or an index scan that matches many rows, then the right hand side needs to be read from disk many times. Typically, a nested loops join is less efficient than other join methods. However, nested loops join can provide the first matching row quickly compared to join methods that must compute their entire result before returning.
Nested loops join is the only join algorithm that can provide sensitive semantics for queries containing joins. This means that sensitive cursors on joins can only be executed with a nested loops join.

Tuesday, January 25, 2011

Data Explosion : R V Ready?

We have social networking profiles like orkut, facebook, Linkedin & many more online application.

Have you relized that last 5 years how many profiles have been created ? The number is so big.

Have you relized how much data is stored behind these profiles?

In india, we started using these profile from our generation only. Think the number of user after  two generation and the amount of data due to that.

When we are making a call , when we do shopping, when we do banking, when we accessing social network site and making new profile, we are only creating data.

Do you think, in a sinlge minute how many people do the calls and how many people makes their new profile?
Every minutes we are increasing data.

What you think, we will not crosss tera-peta byte databases very soon?

We will must cross, for that do our existing technologies are capable enough? I cant say yes.

Second thing, it would be very tough to decide the obsolete data,to purge our databases. Isnt it?

We are now  started creating the data very fast, We will certainly need to enhance our technologies, may be
we need to think beyond our existing DB technologies.

Whats your thoughts?