Reset to default
Trending sort is based off of the default sorting method — by highest score — but it boosts votes that have happened recently, helping to surface more up-to-date answers.
It falls back to sorting by highest score if no posts are trending.
You have to do a look up for every where condition and for every join…on condition. The two work the same.
Suppose we write
select name from customer where customerid=37;
Somehow the DBMS has to find the record or records with customerid=37. If there is no index, the only way to do this is to read every record in the table comparing the customerid to 37. Even when it finds one, it has no way of knowing there is only one, so it has to keep looking for others.
If you create an index on customerid, the DBMS has ways to search the index very quickly. It’s not a sequential search, but, depending on the database, a binary search or some other efficient method. Exactly how doesn’t matter, accept that it’s much faster than sequential. The index then takes it directly to the appropriate record or records. Furthermore, if you specify that the index is “unique”, then the database knows that there can only be one so it doesn’t waste time looking for a second. (And the DBMS will prevent you from adding a second.)
Now consider this query:
select name from customer where city='Albany' and state="NY";
Now we have two conditions. If you have an index on only one of those fields, the DBMS will use that index to find a subset of the records, then sequentially search those. For example, if you have an index on state, the DBMS will quickly find the first record for NY, then sequentially search looking for city=’Albany’, and stop looking when it reaches the last record for NY.
If you have an index that includes both fields, i.e. “create index on customer (state, city)”, then the DBMS can immediately zoom to the right records.
If you have two separate indexes, one on each field, the DBMS will have various rules that it applies to decide which index to use. Again, exactly how this is done depends on the particular DBMS you are using, but basically it tries to keep statistics on the total number of records, the number of different values, and the distribution of values. Then it will search those records sequentially for the ones that satisfy the other condition. In this case the DBMS would probably observe that there are many more cities than there are states, so by using the city index it can quickly zoom to the ‘Albany’ records. Then it will sequentially search these, checking the state of each against ‘NY’. If you have records for Albany, California these will be skipped.
Every join requires some sort of look-up.
Say we write
select customer.name from transaction join customer on transaction.customerid=customer.customerid where transaction.transactiondate="2010-07-04" and customer.type="Q";
Now the DBMS has to decide which table to read first, select the appropriate records from there, and then find the matching records in the other table.
If you had an index on transaction.transactiondate and customer.customerid, the best plan would likely be to find all the transactions with this date, and then for each of those find the customer with the matching customerid, and then verify that the customer has the right type.
If you don’t have an index on customer.customerid, then the DBMS could quickly find the transaction, but then for each transaction it would have to sequentially search the customer table looking for a matching customerid. (This would likely be very slow.)
Suppose instead that the only indexes you have are on transaction.customerid and customer.type. Then the DBMS would likely use a completely different plan. It would probably scan the customer table for all customers with the correct type, then for each of these find all transactions for this customer, and sequentially search them for the right date.
The most important key to optimization is to figure out what indexes will really help and create those indexes. Extra, unused indexes are a burden on the database because it takes work to maintain them, and if they’re never used this is wasted effort.
You can tell what indexes the DBMS will use for any given query with the EXPLAIN command. I use this all the time to determine if my queries are being optimized well or if I should be creating additional indexes. (Read the documentation on this command for an explanation of its output.)
Caveat: Remember that I said that the DBMS keeps statistics on the number of records and the number of different values and so on in each table. EXPLAIN may give you a completely different plan today than it gave yesterday if the data has changed. For example, if you have a query that joins two tables and one of these tables is very small while the other is large, it will be biased toward reading the small table first and then finding matching records in the large table. Adding records to a table can change which is larger, and thus lead the DBMS to change its plan. Thus, you should attempt to do EXPLAINS against a database with realistic data. Running against a test database with 5 records in each table is of far less value than running against a live database.
Well, there’s much more that could be said, but I don’t want to write a book here.
Let’s say you’re looking for a friend in another city. One way would be to go from door to door and ask whether this is the house you’re looking for. Another way is to look at the map.
The index is the map to a table. It can tell the DB engine exactly where the thing you’re looking for is. Thus, you index every column that you think you will have to search for, and leave out the columns that you are just reading data from, and never searching for.
Don’t think about optimizing databases. Think about optimizing queries.
Generally, you optimize one case at the expense of others. You just have to decide which cases you’re interested in.
I don’t know about MySql tools but in MS SqlServer you have a tool that shows all of the operations a query would take and how much of the processing time of the entire query would take.
Using this tool helped me to understand how queries are optimized by the query optimizer much more than I think any book could help because what the optimizer does is often not easy to understand. By tweaking the query and possibly the underlining database I could see how each change affected the query plan. There are certain key points in writing queries but to me it looks like you already have an idea of those so optimizing in your case is much more about this than any general rules. After a few years of db development I did look at a few books specifically aimed at database optimization on the SQL Server and found very little useful info.
Quick googling came up with this: http://www.mysql.com/products/enterprise/query.html which sounds like a similar tool.
This was of course on a query level, database level optimizations are again a different kettle of fish, but there you are looking at parameters such as how your database is divided on the hard drives etc. At least in SqlServer you can select to divide tables to different hdd’s and even disk plates and this can have a big effect because the drives and drive heads can work in parallel. Another is how you can build your queries so that the database can run them in several threads and processors in parallel, but both of these issues again depend on the database engine and even version you are using.
[Caution: Most of this Answer does not apply to MySQL. I bring this up because the OP tagged the Question with mysql.]
“I’m interested particularly in how indices will affect joins”
As an example, I’ll take the case of equijoin (SELECT FROM A,B WHERE A.x = B.y).
If there are no indexes at all (which is possible in theory but I think not in SQL), then basically the only way to compute the join is to take the entire table A and partition it over x, take the entire table y and partition it over y, then match the partitions, and finally for each pair of matching partitions compute the result rows. That’s costly (or even outright impossible due to memory restrictions) for all but the smallest tables.
Same story if there do exist indexes on A and/or B, but not any of them has x resp. y as its first attribute.
If there does exist an index on x, but not on y (or conversely), then another possibility opens up : scan table B, for each row pick value y, lookup that value in the index and fetch the corresponding A rows to compute the join. Note that this still won’t win you much if no other further restrictions apply (AND z = …) – except in the case where there are only few matches between x and y values.
If ordered indexes (hash-based indexes are not ordered) exist on both x and y, then a third possibility opens up : do a matching scan on the indexes themselves (the indexes themselves are likely to be smaller than the tables themselves, so scanning the index itself will take a shorter time), and for the matching x/y values, compute the join of the corresponding rows.
That’s the baseline. Variations arise for joins on x>y etc.