21 March 2007

Statistics and Databases

As I explained in one of my articles, I don't really like storing the players data in a database. Instead, we use flat files, which, IMHO, are easier to work with and faster.
But a disadvantage of flat files vs. a database is that it is hard to obtain statistic information. For example, checking to see the total quantity of an item in the game using flat files requires opening every single player file, which is slow and inconvenient. Then if you want to determine some other statistic information, you will need to reopen all those files again.

Luckily, our server is designed not to need any such statistics. If some statistics are needed, they are saved in a special file, and loaded whenever they are needed, or just kept in the memory.

But often times it is a good idea to have a quick glance at some stuff that is going on, for example seeing how many gold coins are in the game, and the top 100 players in terms of how many gold coins they have. This is useful for many purposes, including balancing the economy, creating new formulas, adjusting some rates, and even checking to see if there are any item duplication bugs.
As previously mentioned, it is hard to get all this data without a database.
Which is why I worked at actually storing all the stuff I am interested in an Sqlite DB.

Sqlite is a public domain database (SQL) library. It's not a database server, such as MySql or MSSql. Because of this, there is less time involved in accessing the database (no indirection layer), and you don't have to worry about having a server running all the time.
Because it is public domain, we can embed some server code in the program that is generating the statistics. With GPL code it is slightly more complicated (in theory we can use GPL code in our server, and not be required to make our source code public, because we do not distribute it; however, GPL3 is going to change that so we'd rather not use any GPL code at all).

Writing the program was relatively easy, Sqlite is very easy to use, at least on Windows. The problem was in having a script that would feed that program with every player file name. As I am developing under Windows, I didn't have native access to the powerful and very useful Unix commands, such as bash and find. The Dos/Windows .bat is not that powerful, and I couldn't find a way to get all the files in a directory, recursively, and send them to a program of my choice.
So what I did was download Cygwin, which is for Windows what Wine is for Linux (well, sort of).
One problem that I had is the fact that it would seem that the bash version that comes with it is using an internal version of "find". So when typing: "find ." from the bash prompt, it complains about some bad parameters... Eventually, I just renamed find.exe as find_1.exe and used "find_1 .", which seemed to work better.
The script looked like this: "for i in $(./find_1 .); do ./stats.exe $i; done" (Lachesis helped me with it).
The player files I used were taken from the previous night's backup.
The script worked, but it was very, very slow. It took about 4 hours to process 101K files. I am not sure if this was because of Cygwin, or because of Sqlite, my program was pretty fast.
I talked with Learner about it, and we are going to modify the program to get it's own files, so it will not have to rely on an external script. Making it get it's own files will also make it faster, as it doesn't have to open and close the database for every player file.

Despite for all those small problems, once the database was created, the end result was very nice. I was able, in seconds, to get the total quantity of the gold coins in the game (about 200M). Not bad, that would be about 2K gold coins per player, on average. Of course, as with the real life, the amount of gold coins in the game is not evenly distributed, some players have over 2 million coins, while others have considerably less. In fact, here is the top quantities of gold coins for the players (I am going to publish only the quantities, not the player names, for privacy reasons).

3117212
2788486
2581151
2486154
2076652
1679400
1616314
1603558
1528248
1428579
1386501
1373042
1301485
1284118
1256488
1203808
1164721
1134737
1101526
1099434
1061115
1058531
1027778
996269
960114
948841
948784
882865
858820
842955
809451
786818
765751
746009
740049
723096
711036
674987
674701
671717
657964
642701
632000
597026

The irony is, more than a half of the players in this top never bought anything from us, so the game can very well be played without paying anything. As for those who do buy items from us, I suspect that the reason they are in the top is because they like the game a lot and play a lot. After all, you wouldn't buy items if you didn't play a lot, right?

Another interesting thing is that a lot of rare items are pretty abundant in the game. For example, there are 15K enriched fire essences, and they sell for around 3.5Kgc each. If they belonged to one single person, and if that person had enough buyers, he could sell them for about 20K USD (not bad). Of course, they do not belong to one single player, and very few people have more than 20 in their inventory. The reason there are so many of them in the game is that for a year or so, I didn't check on how many were entering the game vs. how many were exiting the game, and we had quite some disparity (about 2 entering for each that was exiting). Now that I made them more rare, more are exiting than entering, so in a few months their numbers should decrease by a few K.
It is also worth mentioning the fact that some of those players are no longer playing, so the real quantity might be something close to 10K, but that is impossible to know for sure.

To end this long post, I am glad I spent some time to write this program and use Sqlite. It proved to be quite useful, and will help with the ongoing task of constantly improving Eternal Lands.

7 Comments:

Anonymous Anonymous said...

I doubt that the richest guy has _only_ 3 million gc.

22/3/07 12:22  
Anonymous Anonymous said...

cmd dir /n

22/3/07 12:45  
Blogger Radu said...

Well, yeah, dir can be used to get all the file names like that. But I was looking at a way to be able to put each filename into a variable, so that they can be individually feed to a program as a parameter.

22/3/07 15:02  
Anonymous Anonymous said...

>>As for those who do buy items from us, I >>suspect that the reason they are in the >>top is because they like the game a lot >>and play a lot.
I disagree. Those who play a lot can make ingame gc, they don't have to spend $ on the game. And those who don't have time for this or are evil rich irl spend $ on items from the shop :)

22/3/07 15:21  
Blogger Radu said...

Well, of course that someone who buys items from the shop can make more money than those who don't. But the point I was trying to make is that a lot of people are on that top, without paying anything to us (or paying for minor things, such as a p2p race).
And someone who plays the game a lot, and is good at saving money will generally have at least as much money as someone who plays little buy buys items from the shop.

22/3/07 17:45  
Anonymous Anonymous said...

know this is a little old but

you could have done something like

dir /b /s > filelist.txt

then just read all the files from
the file that is generated.

21/4/07 00:33  
Anonymous Anonymous said...

There's a possibility i misinterpreted this, but it seems SQLite doesnt use GPL for its licensing, at least it doesnt look like that for their site at least, it just says its public domain, and that it doesnt really have a license, that's also why they allow people to ask them to provide a commercial license if we use SQLite on a commercial product where people dont beleave in public domain code.

But of course, i could probably have misinterpreted this, so if i did, i'm sorry

19/5/07 06:22  

Post a Comment

<< Home