10:08:46 #startmeeting data persistence on koha 10:08:46 Meeting started Fri Feb 18 10:08:46 2011 UTC. The chair is hdl. Information about MeetBot at http://wiki.debian.org/MeetBot. 10:08:46 Useful Commands: #action #agreed #help #info #idea #link #topic. 10:08:53 Hi 10:09:08 Let's start the meeting. 10:10:13 Is there any one for the record ? 10:11:03 * hdl Henri-Damien LAURENT BibLibre 10:11:26 * kf Katrin Fischer, BSZ - following the discussion 10:11:34 Colin Campbell, PTFS-Europe 10:11:34 * fredericd Frédéric Demians, Tamil 10:11:46 salut Henri-Damien 10:12:32 So 4 ... 10:12:35 Ok. 10:13:23 We have been playing with Plack in order to test that and achieve data performance. 10:13:46 http://plackperl.org/ 10:13:56 #link http://plackperl.org/ 10:14:46 We have pushed a branch on koha-community.org where you can have psgi files to launch a plack server 10:15:26 http://git.koha-community.org/gitweb/?p=wip/koha-biblibre.git;a=shortlog;h=refs/heads/wip/Plack 10:15:43 There is also the http://git.koha-community.org/gitweb/?p=wip/koha-biblibre.git;a=shortlog;h=refs/heads/wip/data_persistence 10:16:50 data_persistence branch which contains some patches in order to manipulate C4::Context to remove the connections to zebra when they are not used any longer. at least in SimpleSearch 10:17:05 We have been testing thet on some servers with nginx 10:17:21 The results are quite good in terms of performance. 10:18:43 do you have figures? 10:19:23 you can launch using : KOHA_CONF=/home/users/koha/sites/univ_lyon3/etc/koha-conf.xml PERL5LIB=/home/koha/src plackup -S Starlet --workers 32 --port 5000 /local/home/users/koha/src/prod.psgi 10:20:12 fredericd: well I had... But I donot have them right there. 10:20:31 But you can use the file. 10:22:07 If I checkout wip_pack branch from git.koha-communty.org, what do I need to do to get it working? 10:22:17 it needs some tweaking of the apache configuration to have static files served directly and a reverse proxy on 5000 10:22:26 you need to install Plack 10:22:36 cpanm Task::Plack 10:22:54 (you can install cpan minus for that) 10:23:14 Task::Plack comes with 3 servers you can test 10:23:27 plackup, starlet, and an other one. 10:23:37 starman I think 10:24:16 installation in progress... 10:24:17 The stuff is working quite well... ... But not ready for production at the moment. 10:24:33 Because of some problems : 10:24:44 a) Circular references in code. 10:24:50 b) zebra search 10:25:14 c) lack of caching in the code. 10:25:39 You mean that zebra search slow down if connection isn't reset regularly? 10:25:47 yes. 10:26:25 This is why I sent a branch datapersistence... which destroy the connection to zebra when things are over. 10:27:05 But the fact is that when you are running with sockets, it leaves zombies on the system... So one has to kill them 10:27:52 With tcp connection, that problem is not so hard. 10:28:12 Since there is a tcp timeout 10:30:20 About Circular refereences in Code, I sent some mails on list. 10:30:39 But had very little feedback 10:31:33 Some from ColinC quite helpful... 10:32:09 But I would like to know if we can setup a plan on order to solve those problems and work out solutions 10:34:28 a hitlist of obstacles 10:35:05 About Circular references : I think that if we could at least get the functions which adds some additional dependencies... 10:35:32 I gave a list of some of the circular references. 10:36:49 on the lists. 10:37:21 i cannot find the link right now. 10:38:08 But about zebra search, i think that the solution we proposed could be a nice option. 10:38:33 If you could test and provide some feedback 10:39:17 And about the lack of Caching, chris proposed the usage of Memcached 10:39:42 And I think it is quite a good start. 10:41:29 But this would also require to add some local variables to store the information. and not make any calls. 10:41:50 This would be much more efficient. 10:45:20 not sure I followed that last part 10:47:16 I have done some work for C4::Languages (which still has some bug... But it was a first start... and provided a good improvements.) and C4::Context 10:47:55 http://git.biblibre.com/?p=koha;a=blob;f=C4/Languages.pm;h=c00695b607b0f4d1ac8a5f8f4a397b03a97a849d;hb=bug/BLO/MT5496 10:48:16 Thomas Dukleth, Agogme, New York [lost attention to the time but has read back.] 10:49:18 http://git.biblibre.com/?p=koha;a=blob;f=C4/Context.pm;h=eb8b7ada8dd1f55c8af2e32041d747b6d41de5c5;hb=bug/BLO/MT5496 10:51:20 ColinC: what are you not following. 10:51:46 Do you think I am wrong ? 10:52:44 I'm not sure what point you were trying to make about Memcached 10:54:36 Memcached is not generalized in all the modules. 10:56:24 And when you edit the thing, the old stuff sticks in Memchaced. 10:57:11 you have to flush the cache regularly. Or to slush the cache on edits. 10:57:25 flush... Or to edit the structure. 10:58:47 But Memcached is not really adding much performance... unless you use for complex structures. 10:59:09 hdl: What does it mean to "edit the thing" or "edit the structure"? 10:59:55 you cache preferences, when you edit a preference. the cache is not updated 11:00:31 (at least with the system as it is now) 11:01:57 You have to wait for 5 or whatever the time in the file for it to be updated correctly... 11:02:35 For future ref I think the Cache::Memcached interface has more fine grained api than Memoize::Memcached 11:03:09 hdl: Are the deficiencies to which you are referring deficiencies of memcached or deficiencies of the current implementation work for Koha? 11:03:33 hdl or both 11:04:32 both in fact... 11:06:54 The problem is that the edition for Memcached would have to be quite localized and centralized... in order to be easily generalised. 11:07:40 But it would require to have some more fine grain for some other things. 11:09:16 instance : systempreferences donot change often... but biblio change far more often 11:10:20 I think there is quite a bit of refactoring to be done to some code to take advantage of any caching structure 11:11:15 Can you provide an exemple of how you see things ? 11:13:35 or does anyone has some precise idea ? 11:13:56 Because the model has been cgi script based nothing has been trated as possibly living longer than the scripts runtime 11:14:22 s/trated/treated/ 11:20:03 ColinC: What improper presumptions have been taken in consequence of that legacy design? 11:22:01 ColinC: But is there any concrete exemple of a nice application design you have in mind ? 11:22:17 we have operations that intertwine updates and reads to persistent/non-persistent data 11:23:46 Are you thinking of evergreen ? 11:23:55 I've used memcached in a high-throughput environment it gave many improvements but it was important to concentrate cacheing on what benefitted from it 11:26:16 you mean to use caching only where it was really needed ? 11:27:07 Yes 11:27:13 ColinC: Are you suggesting that the performance improvement would be minimal for data which is used infrequently or changes often and therefore needs careful update management? 11:27:29 thd: i think so 11:28:28 But there are many places : branches, categories, issuingrules, systempreferences, languages, message_preferences... 11:28:34 where it can be used. 11:29:17 The approach to e.g. the item and patron record in an issue transaction is different to the circ rules governing that transaction is a crude example 11:29:40 and what hdl said 11:31:08 ColinC: Why is having multiple data access rules especially problematic? 11:31:46 Its not 11:32:59 Is there anything else to say now ? 11:33:12 ColinC: What did you mean in referring to item and patron record for issuing transactions as different from circ rules? 11:33:19 Are you interested in playing with plack ? 11:33:32 Are you interested in testing ? 11:33:58 Are you interested in any of the 3 problems I listed ? 11:34:01 Who ? 11:34:34 thd: the life of of the cached data differs 11:34:55 hdl: Is there a bug and patch for the Zebra connection issue? 11:38:27 no. 11:38:41 not yet in fact. 11:39:17 It is something we discovered with some of our tools... 11:39:30 But neglected to report... 11:39:47 But this state of fact is known with zebra and yaz. 11:40:25 hdl: I see reports on the koha mailing list of people not running Zebra as a daemon. Does that help with the issue or merely create other performance problems? 11:41:04 I think it would not help in fact. 11:41:46 It is owed to the z3950 protocol which is connected... And since koha initiate one connection and use that... In a data persistent model... you use always the same. 11:42:04 And .... It ends up being very slow. 11:42:43 ok. if noone has anything else to say... Let's end the meeting. 11:42:51 What causes the slowness for an open connection? 11:43:05 yes... 11:43:36 send 1000 queries to a single connection and it will slow down... 11:43:54 and zebra will take some more memory 11:44:13 If anyone has questions about Plack and how to test... 11:45:00 If I can help in any way, if you have any question or if you think of any exemple... please do. 11:46:47 #endmeeting