If the database of Twitter was implemented using SQL Server, particularly the Twitter user profile and followers, how would the database structure look like? Here's an attempt of designing a database structure that can be used to store profile information and connections (also known as Followers and Followings) similar to the ones used by Twitter.
Disclaimer: The author of this article does not work for Twitter and has not worked for Twitter. It is just the author's attempt on designing the database that may work on such site as Twitter. Any similarities to the database design implemented by Twitter are mere coincidences (and luck).
The basic building block or the basic element in Twitter is an account. When a user signs up, the user is asked for the following information:
- Full Name
- Email Address
- Password
- User Name
Given these information, we can now create the first table in our database design, which we will call the Account table:
CREATE TABLE [dbo].[Account](
[AccountID] INT IDENTITY(1,1) NOT NULL,
[Email] VARCHAR(100) NOT NULL,
[FullName] NVARCHAR(100) NOT NULL,
[Password] VARBINARY(100) NOT NULL,
[UserName] VARCHAR(15) NOT NULL,
[FollowerCount] INT NOT NULL DEFAULT ( 0 ),
[FollowingCount] INT NOT NULL DEFAULT ( 0 ),
[TweetCount] INT NOT NULL DEFAULT ( 0 ),
CONSTRAINT [PK_Account] PRIMARY KEY CLUSTERED ( [AccountID] ASC )
)
GO
Other user information such as the profile photo and address can be added to this table later but the basic requirements for now are just these columns.
Once an account has been created, the next thing a user usually does aside from posting their first tweet is follow other twitter users. This relationship between Twitter users can be stored in a table which we will call the Follower table:
CREATE TABLE [dbo].[Follower](
[FollowerID] INT IDENTITY(1,1) NOT NULL,
[AccountID1] INT NOT NULL REFERENCES [dbo].[Account] ( [AccountID] ),
[AccountID2] INT NOT NULL REFERENCES [dbo].[Account] ( [AccountID] ),
CONSTRAINT [PK_FriendRequest] PRIMARY KEY CLUSTERED ( [FriendRequestID] ASC )
)
GO
When a user follows another Twitter user, a record is inserted in this table where [AccountID1] will be the account ID of the current user and the [AccountID2] will be the account ID of the Twitter user being followed. After this, the number of followers and followings should now be updated in the [dbo].[Account] table. This can automatically be done using an INSERT trigger on the [dbo].[Follower] table:
CREATE TRIGGER [IncreaseAccountCounts] ON [dbo].[Follower]
AFTER INSERT
AS
UPDATE A
SET [FollowerCount] = [FollowerCount] + 1
FROM [dbo].[Account] A INNER JOIN [inserted] B
ON A.[AccountID] = B.[AccountID2]
UPDATE A
SET [FollowingCount] = [FollowingCount] + 1
FROM [dbo].[Account] A INNER JOIN [inserted] B
ON A.[AccountID] = B.[AccountID1]
GO
Similarly, if a user decides to unfollow another Twitter user, the corresponding relationship will be deleted from the [dbo].[Follower] table and the number of followers and followings for the affected users will be updated using a DELETE trigger:
CREATE TRIGGER [DecreaseAccountCounts] ON [dbo].[Follower]
AFTER DELETE
AS
UPDATE A
SET [FollowerCount] = [FollowerCount] - 1
FROM [dbo].[Account] A INNER JOIN [deleted] B
ON A.[AccountID] = B.[AccountID2]
UPDATE A
SET [FollowingCount] = [FollowingCount] - 1
FROM [dbo].[Account] A INNER JOIN [deleted] B
ON A.[AccountID] = B.[AccountID1]
GO
To get the top 10 Twitter users with the most followers, the following query can be used:
SELECT TOP 10 B.[AccountID], B.[FullName], COUNT(*) AS [Followers]
FROM [dbo].[Follower] A INNER JOIN [dbo].[Account] B
ON A.[AccountID2] = B.[AccountID]
GROUP BY B.[AccountID], B.[FullName]
ORDER BY COUNT(*) DESC
To make these queries execute fast, indexes can be created on the [AccountID1] and [AccountID2] columns:
CREATE UNIQUE NONCLUSTERED INDEX [IX_Follower1] ON [dbo].[Follower] (
[AccountID1] ASC, [AccountID2] ASC
)
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_Follower2] ON [dbo].[Follower] (
[AccountID2] ASC, [AccountID1] ASC
)
GO