View Full Version : Could use advice/thoughts on bidirectional relationship

12-29-2011, 06:50 PM
I was thinking about this earlier, and I'm not quite sure how its done, or what the thought process behind it is, but...

If you have a bidirectional relationship, such as facebook friends, whats the mindset behind creating the table to represent that? My initial thought was a table with basically userID and friendID, where both are foreign keys pointing to a user table. But then I realized if you did that, you'd either have to create two entries representing the relationship in both directions, which would be redundant data, or you'd have to write the query to search both columns.

Is there another technique I don't know of? Am I on the right path? Is there another reasoning I don't know?

12-29-2011, 08:04 PM
I believe one thing I saw a forum do in their MySQL structure is have a "friendid" column that contains a list of user ids that are friends of the user (separated by some deliminator, such as "," or " "). You would just load the user row, get the entry, split the list of ids, and do whatever you want with them. Of course, this isn't literally bidirectional (as your solution isn't either), but you can make it seem bidirectional by making sure each added friend has the user's id in their "friendid" column.

Old Pedant
12-29-2011, 08:48 PM
Oh, PLEASE do not even *THINK* of using Apothem's "solution"!!!

That's the worst advice you could get.

Sorry, but true. *NEVER* store a delimited list in a single field in *ANY* relational database.

Just for starters (and I mean that), how would you then use SQL to answer a simple question such as "How many users have 'John Doe' in their FRIENDS list?"

Oh, you can code it, but it will be a really ugly query and it will be slow, as there is no way to take advantage of any indexing.

But now give a shot at coding "How many users have 'John Doe' in their FRIENDS list and, in addition, have at least 4 other friends in common?"

We'll wait for you to give us an answer sometime in 2013.


Old Pedant
12-29-2011, 08:57 PM
The right answer is the usual one: NORMALIZATION.

A simple MANY-TO-MANY table:

CREATE TABLE friends (
userid INT REFERENCES users(userid),
friendid INT REFERENCES users(userid),
PRIMARY KEY (userid,friendid)

Whether you store "John is a friend of Mary" as one row in that table or as two (that is, so you also have "Mary is a friend of John") is up to you. The extra space needed is miniscule (8 bytes or so, per friend link) and it does simplify queries. But it's really no big deal to join to the friends table via both fields:

SELECT username
FROM users, friends
WHERE users.userid IN ( friends.userid, friends.friendid )

If you do that, you would surely want to also have a separate (non-unique) index on the friendid field. And that could quite possibly eat up most of the space advantage of not storng the friend relationship both ways.

Anyway, one way or both ways is not a really important decision. You could always add the other way later if you saw more reasons for it. But either one is orders of magnitude better than storing a delimited list.

And, by the by, answering a question such as ""How many users have 'John Doe' in their FRIENDS list and, in addition, have at least 4 other friends in common?" is really trivial now, especially if you store both directions of the friend relationship (but only a tiny bit more complex if you don't).

12-29-2011, 09:32 PM
"How many users have 'John Doe' in their FRIENDS list?"
Well we know the OP wants a bidirectional relationship between two users. If you suppose that he implements it correctly, the total list of user IDs in John Doe's freindid is the total people who have John Doe in their friend list. Though the process of getting to that point may be (MUCH) longer than what you have said.

As for the other question, that's a really good question. I have absolutely no idea. Such a feature would definitely, at least from my knowledge, take many queries.

Though to my inexperienced mind your solution and OP's initial solution are fairly similar. I have not played with many functionality that InnoDB (I only played with MyISAM) provides, so I am just ignorant of how much more power things like references bring. What does references really do? What is its benefits?

Old Pedant
12-29-2011, 09:50 PM
It's not just the REFERENCES capability that is in question here.

You can use REFERENCES with MyISAM, but MyISAM won't enforce the qualities as does INNODB. INNODB, along with most any decent database engine (even Microsoft's "toy" database Access does this) *enforces* referential integrity. That means that if you try, in this example, to add a friendid of 9999 and there is not userid in the users table with that value, then you get an error. Similarly, if you tried to delete a record from the users table that had a userid that is in use as a foreign key (REFERENCES, in your words) in another table, you get an error and the delete is disallowed.

All foreign keys must refer to existing primary keys at all times.

On top of that, you can optionally add characteristics to foreign keys. For example, CASCADE DELETE. When that is declared on a foreign key (REFERENCES link), if the record with the primary key is deleted, then the *ALL* records with the matching foreign key are *automatically* also deleted. Referential integrity is thus still enforced. (There is also a CASCADE UPDATE option which is much less used.)


But, again, it's not the referential integrity that is the important part in this case. Or at least not the most important part.

In *ANY* database, NORMALIZATION is the most important part.

If you aren't familiar with it, GOOGLE is your friend.

FWIW, to find out all users that 'John Doe' is a friend of using your scheme, you would need to code something like this:

SELECT * FROM friends
WHERE CONCAT(',', listOfFriends, ',') LIKE CONCAT( '%,', $johns_userid, ',%' )

(That assumes that you use comma for the delimiter in the list...use any non digit character if you wish.)

First of all LIKE is about the slowest operation that MySQL (or any DB) can do. It's complex and requires character-by-character scanning of fields. In the case of MySQL, it also can *NOT* be perfomed using an index. MySQL will only do it by scanning every single record in the given table.

Compare that to this:

SELECT u.username
FROM users AS u, friends AS f
WHERE u.userid = f.userid
AND f.friendid = $johns_userid

Assuming that the field friends.friendid is indexed, that query is as efficient as it gets in any database.

NOTE: Actually, the code equivalent to the version using the list of friends would simply be

SELECT * FROM friends
WHERE friendid = $johns_userid

Old Pedant
12-29-2011, 09:59 PM
"How many users have 'John Doe' in their FRIENDS list and, in addition, have at least 4 other friends in common?"

SELECT U1.username AS user1,
U2.username AS user2,
COUNT(*) AS mutualFriends,
SUM( IF(U1.userid=$johns_userid,1,0) ) AS johnIsFriend
FROM users AS U1, users AS U2, friends AS F1, friends AS F2
WHERE U1.userid = F1.userid
AND U2.userid = F2.userid
AND U1.friendid = U2.friendid
GROUP BY U1.username, U2.username
HAVING johnIsFriend > 0 AND mutualFriends >= 5

Untested, but I think that's right. And really pretty simple *AND* should be quite fast.

12-29-2011, 10:03 PM
I see. Very interesting. When I continue to program a project that uses a database, I'll definitely look up normalization for the type of database I will be working on.


What I was thinking of (i.e. "my method") was something like this:

SELECT friendid FROM users WHERE userid = $uid

Then lets say you're using PHP and you retrieved the result, and that friendid is not null:

$results = <get cell> // I can't remember it off the top of my head
$friends = explode(",", $results);
sizeof($friends); // total friends

Just a single query; fast and efficient.

Suppose you also want to get all the names like you have:

$list = implode("','", $friends);

SELECT username FROM users WHERE userid IN('{$list}')

If userid is index, as they normally are, it would still be relatively fast right?

Of course you can immediately see that my solution adds a layer of indirection, while yours does not.

Edit: To be quite blatant, my mind cannot break down your second (longer) query :P

Old Pedant
12-29-2011, 10:14 PM
Your solution requires that the *ENTIRE* users table (well, at least one field in each record) be copied from MySQL to PHP where you would then do the filtering in PHP code.

If you have a miniscule number of records, and your site is not very busy, that's an okay solution.

But if you had a million (or 100 million!) users, a la Facebook, that solution would be disastrous.

When at all possible, filtering of records should be done in the database, not in PHP (or any other database client).

Remember, the database engine runs in a DIFFERENT PROCESS than the web server. And MySQL could even run on a different *computer* than the web server--almost always true in shared hosting systems [and I know it's true with GoDaddy, for example].

So for the web server (which includes PHP) to get data from the database, it has to *COPY* the data *FROM THE OTHER PROCESS OR COMPUTER* into the process space of the web server.

When the two are on the same machine, this will usually be done via pipes or shared memory. Reasonably efficient, but it still involved the DB engine putting the data from its internal buffers into one end of the pipe (or shared memory) and then PHP pulling the data from the pipe/shared memory and putting it into PHP's internal buffers. That is *TWO* EXTRA operations on every byte of data. When the two servers are on different machines, the data has to actually be sent across a *wire*, which (by the nature of inter-machine operations) could entail another two extra copy operations, not to mention the delay in converting bytes to bits to send them and from bits to bytes to receive them. INTER PROCESS COMMUNICATIONS are expensive and should be kept to a minimum.

Little of this matters a whole lot in toy websites. If you don't have more than, say, 1000 web hits per day then code things any way you wish. But when you get up to hundreds of hits per minute, you'll be sorry if you didn't pay attention to the bits and bytes.

Old Pedant
12-29-2011, 10:17 PM
Also, this solution is wrong:

$list = implode("','", $friends);
SELECT username FROM users WHERE userid IN('{$list}')

Oh, it will work. But assuming userid is an INT field, as it should be, then the apostrophes are unneeded *AND* they will slow up MySQL (admittedly by a tiny amount) as it converts each string to a number before comparing to the int value of userid.

If you must use that solution, just use:

$list = implode(",", $friends);

SELECT username FROM users WHERE userid IN({$list})

01-01-2012, 12:35 AM
As always Old Penant, you are a source of wisdom.

Ok, so whether I store one relationship and join on both columns or create both relationships really just falls down to personal taste and how else it interacts with my code, if I understand you correctly. In that case, I was on somewhat the right track. Thanks as always.

Old Pedant
01-01-2012, 05:40 AM
I personally feel that storing both directions will be easier and probably perform better in the long run, but I doubt that it matters much until you get up into the many thousands of users.