Blog ► How the TwoCans Q&A System Works

How the TwoCans Q&A System Works

Post by Blake
...since I'm sure you're all wondering!

A few notes about SQL

If you know SQL, you can skip this paragraph. If you're not familiar with SQL databases, this is all you need to know to understand: SQL databases are comprised of many tables. I (the programmer) define what columns are in each of the tables. However, the actions of the user will create, update, or delete rows in this table. The set of tables and their columns are called "the schema". The rows of the table are called "entries". For example, on this website there is a table called "users" and each row represents one registered user. When you registered, a row was added. There are columns for your username, email address, password, etc. Each time you log in, there's a column that indicates the last time you used the website which gets updated. This is how dynamic websites generally work.

How the Q&A used to be

For the Q&A on TwoCans, initially there were only two tables. "questions" and "answers". Each question has a unique ID#. The questions table would of course have the question ID#, the user that asked it, and the text of the question itself. The answers table would have the question ID# it was an answer to, who answered it, a 1 or 0 depending on if the user had read the answer, and of course the text of the answer.

Problems

This was all fine and dandy at first when there were only a few thousand questions and answers. However a few problems arose once this number got closer to a million.
  • When you answer questions, you don't want to see questions you've already answered. This required me to check not only the questions table, but also the answers table to see if you have any answers for a particular question. When there are a million answers, this is a very slow operation.
  • When you pick a random item in a table, SQL does this by selecting the contents of the WHOLE table, ordering it randomly, and then taking the first one. Shuffling a whole table to get 1 entry is a very slow operation.
  • Some people's ancient questions were still getting answers while others were not getting answers to their new questions.

At this point the page load time had slowed to about 2 seconds. Additionally, the database would crash and need to be rebooted every few minutes. I had to take the site down and rewrite it in a panic.

The New (Current) System

Working off a giant data set is something that should be avoided whenever possible. Therefore, there are now several tables that allow me to creatively work around doing things directly to the questions/answers table.

The Active Pile

There is a table called "active_questions". This table contains all the questions that have been asked or re-activated in the last 24 hours. This table usually contains around 600 entries at any given moment, last I checked. This table contains just the question ID#'s of active questions and a few other bits of interesting information such as the number of answers the question has, the user who asked it, the language it was asked in, and the time it was last activated or asked. This entry is deleted once it is more than 24 hours old. However the question itself is not deleted.

When you load the Answer page, the question you receive is coming from this table. The text of the question still comes from the questions table, however the full questions table is indexed against the question ID# so looking that up is not a big deal in terms of page load time. If you understand SQL, you know indexes are magical. If you do not understand SQL, then understand this: indexes are magical.

However, selecting a question randomly and checking to see if you already answered it is still a bit slow. So there are a few more optimizations...

The Question Plate

As I said earlier, the way SQL picks a random row out of a table is to shuffle the order of a whole table and then pick the first row. This is stupid slow for 1 row. It also seems silly if I need to do this many times one after another. So there is a table called "question_plate". What I do instead is just get the last 100 questions in the active_questions table and return them. No random ordering. Once the rows are received, they are randomly ordered by PHP and are inserted into the question_plate table along with your user ID# attached to them. This is your personal queue of randomly ordered questions. You may notice the page load is a bit slow once in a while, but just slightly.

After this point, each time you load the page, the next question in your queue is pulled out and fetched from the questions table. This is a nearly instantaneous operation (because of the index on the question ID#). Your queue is cleared and restocked after every few minutes or when you run out of questions in your queue. Those page loads are slightly slower than the others, but not really noticeable.

The Question Acknowledgement

The last bit of information is in a table called question_ack. This table contains a list of question ID#'s paired with user ID#'s. This list indicates the questions that you have "acknowledged". This means you have either answered the question or pressed the "Skip Forever" button.

When a question is pulled from the active_questions table to put in your queue, it is checked against the question_ack table. If an entry exists that corresponds to you and that question, it is ignored. Also, once an entry is pulled out of the queue, it is checked again. This prevents bugs such as if you have to browsers open and are answering questions in both. Imagine if you left your browser open on question #123456 and you opened a new browser and after a while your queue was restocked. Then when you got to #123456 in your second browser and answered it, then go back and answer it in the old browser, that would be a problem because you've answered the same question twice. This prevents that.

That's pretty much it. Now you have instant page loads instead of 5-second-long page load times. I hope I didn't put you to sleep too much.
SPiEkY at September 2nd, 2011, 7:00 AM
Huh. Cool, I always wondered how it worked.
Sya at September 2nd, 2011, 7:00 AM
Must reread this when it's not midnight.
antidaeophobic at September 2nd, 2011, 10:21 AM
got to the third paragraph and my brain exploded
Hydrogen777 at September 2nd, 2011, 12:48 PM
Cool stuff. I'm learning SQL soon in my DBMS class. ^_^
The_Erotic_Hobo at September 2nd, 2011, 2:12 PM
Very cool. I hope to learn that stuff someday.
SuperJesus2.0 at September 2nd, 2011, 2:59 PM
Me too
asdfgh at September 2nd, 2011, 4:25 PM
Interesting.
SPiEkY at September 2nd, 2011, 7:29 PM
Cool stuff. I'm learning SQL soon in my DBMS class. ^_^


I accidentally misread that as "BDSM class". Now I feel foolish.

<.<

Edit: It's worth noting that I have dyslexia.
Antagonist at September 2nd, 2011, 7:33 PM
Best class ever.
Harok at September 2nd, 2011, 7:33 PM
I was wondering.
Chickadee at September 2nd, 2011, 10:29 PM
That's hilarious.

And thanks, Blake. Pretty nifty.
millrcrt at September 3rd, 2011, 12:15 AM
That's pretty cool. Thanks.
talonhand94 at September 4th, 2011, 12:45 AM
I still don't get this whole programming thing. My dad is a programmer. I think I'll ask him.
soulgirlKC at September 7th, 2011, 2:38 PM
^So is mine. Maybe he'd explain this for me.
drodge at September 7th, 2011, 3:03 PM
So, is there a way to get to the older questions that remain active? I find that once you burn through the pool of fresh questions, they start to get very repetitive very quickly. Soon, you end up with about 20-30 questions that seem to just churn over. I'd love to see a link to a questions_old table. I would be understanding that this pool would take significantly longer to load based on the pool size.
asdasasefasd at September 10th, 2011, 10:29 PM
I like what kind of people post in the comments on this post, awesome people.
Firebelley at September 11th, 2011, 1:49 AM
This is absolutely brilliant!
asdasasefasd at September 14th, 2011, 4:47 AM
I was thinking about this while in school, this website is taking over my life. @_@
Sya at September 14th, 2011, 1:31 PM
I was like that last year, but j visited here and IRC so often that they blocked both.
asdasasefasd at September 14th, 2011, 7:42 PM
I only check my messages in school during lunch.
Actsline at January 5th, 2012, 3:05 PM
Great Site. I hope I could create like this site too, but I am still thinking for a different useful purpose. I am an IT student 3rd yr. college and studied web programming. All I want is a free web hosting as .com on the internet, Internet offers a free but they have 2 dots in an address, one dot you should pay.
awesomeguy at January 7th, 2012, 12:03 AM
You're gonna have to pay for a .com address, at least without a subdomain. I know that .tk TLDs are free though.
theking at April 27th, 2012, 1:47 PM
I still don't get it... But I guess that doesn't matter.
tigerlivie at May 20th, 2012, 6:41 AM
Huh, that's surprisingly simple. Very cool and effective though. I wondered why you needed to shuffle the entries anyway; particularly if you're just taking the same initial 100.
wyatt at July 21st, 2012, 8:41 AM
i love this guy
Person at August 6th, 2012, 1:10 AM
How does the messaging system work?
cookiemonster105 at September 21st, 2013, 1:23 AM
got to the third paragraph and my brain exploded

This made my day.
hummahz at January 5th, 2015, 8:14 PM
So, when we load a list of our questions, do we again load from the giant data pile?
onehappycan at January 16th, 2017, 11:46 PM
Scrolled through article to determine length.

Skipped straight to comments.
alfanewmerik at May 24th, 2018, 4:48 AM
hummahz said:
So, when we load a list of our questions, do we again load from the giant data pile?

I have now claimed your soul, even if you are inactive.
Garfield.Me.It'sGarfield at June 14th, 2019, 7:58 PM
If you ever need a new mod:
Antimony Pentaflouride
ちょたの_chotano at June 14th, 2019, 8:05 PM
Any reason for reviving more than a year old thread?

Edit: Also on a forum thread that has nothing to do with moderating/moderators?
PoikiChoi at February 2nd, 2021, 7:45 PM
hey what happens if I revive this for literally no reason?
jaxxie at February 2nd, 2021, 8:19 PM
Ka-ban is what happens. Also, thread locked.