10:08:46 <hdl> #startmeeting data persistence on koha 10:08:46 <huginn> Meeting started Fri Feb 18 10:08:46 2011 UTC. The chair is hdl. Information about MeetBot at http://wiki.debian.org/MeetBot. 10:08:46 <huginn> Useful Commands: #action #agreed #help #info #idea #link #topic. 10:08:53 <hdl> Hi 10:09:08 <hdl> Let's start the meeting. 10:10:13 <hdl> Is there any one for the record ? 10:11:03 * hdl Henri-Damien LAURENT BibLibre 10:11:26 * kf Katrin Fischer, BSZ - following the discussion 10:11:34 <ColinC> Colin Campbell, PTFS-Europe 10:11:34 * fredericd Frédéric Demians, Tamil 10:11:46 <fredericd> salut Henri-Damien 10:12:32 <hdl> So 4 ... 10:12:35 <hdl> Ok. 10:13:23 <hdl> We have been playing with Plack in order to test that and achieve data performance. 10:13:46 <hdl> http://plackperl.org/ 10:13:56 <hdl> #link http://plackperl.org/ 10:14:46 <hdl> We have pushed a branch on koha-community.org where you can have psgi files to launch a plack server 10:15:26 <hdl> http://git.koha-community.org/gitweb/?p=wip/koha-biblibre.git;a=shortlog;h=refs/heads/wip/Plack 10:15:43 <hdl> There is also the http://git.koha-community.org/gitweb/?p=wip/koha-biblibre.git;a=shortlog;h=refs/heads/wip/data_persistence 10:16:50 <hdl> data_persistence branch which contains some patches in order to manipulate C4::Context to remove the connections to zebra when they are not used any longer. at least in SimpleSearch 10:17:05 <hdl> We have been testing thet on some servers with nginx 10:17:21 <hdl> The results are quite good in terms of performance. 10:18:43 <fredericd> do you have figures? 10:19:23 <hdl> you can launch using : KOHA_CONF=/home/users/koha/sites/univ_lyon3/etc/koha-conf.xml PERL5LIB=/home/koha/src plackup -S Starlet --workers 32 --port 5000 /local/home/users/koha/src/prod.psgi 10:20:12 <hdl> fredericd: well I had... But I donot have them right there. 10:20:31 <hdl> But you can use the file. 10:22:07 <fredericd> If I checkout wip_pack branch from git.koha-communty.org, what do I need to do to get it working? 10:22:17 <hdl> it needs some tweaking of the apache configuration to have static files served directly and a reverse proxy on 5000 10:22:26 <hdl> you need to install Plack 10:22:36 <hdl> cpanm Task::Plack 10:22:54 <hdl> (you can install cpan minus for that) 10:23:14 <hdl> Task::Plack comes with 3 servers you can test 10:23:27 <hdl> plackup, starlet, and an other one. 10:23:37 <hdl> starman I think 10:24:16 <fredericd> installation in progress... 10:24:17 <hdl> The stuff is working quite well... ... But not ready for production at the moment. 10:24:33 <hdl> Because of some problems : 10:24:44 <hdl> a) Circular references in code. 10:24:50 <hdl> b) zebra search 10:25:14 <hdl> c) lack of caching in the code. 10:25:39 <fredericd> You mean that zebra search slow down if connection isn't reset regularly? 10:25:47 <hdl> yes. 10:26:25 <hdl> This is why I sent a branch datapersistence... which destroy the connection to zebra when things are over. 10:27:05 <hdl> But the fact is that when you are running with sockets, it leaves zombies on the system... So one has to kill them 10:27:52 <hdl> With tcp connection, that problem is not so hard. 10:28:12 <hdl> Since there is a tcp timeout 10:30:20 <hdl> About Circular refereences in Code, I sent some mails on list. 10:30:39 <hdl> But had very little feedback 10:31:33 <hdl> Some from ColinC quite helpful... 10:32:09 <hdl> But I would like to know if we can setup a plan on order to solve those problems and work out solutions 10:34:28 <ColinC> a hitlist of obstacles 10:35:05 <hdl> About Circular references : I think that if we could at least get the functions which adds some additional dependencies... 10:35:32 <hdl> I gave a list of some of the circular references. 10:36:49 <hdl> on the lists. 10:37:21 <hdl> i cannot find the link right now. 10:38:08 <hdl> But about zebra search, i think that the solution we proposed could be a nice option. 10:38:33 <hdl> If you could test and provide some feedback 10:39:17 <hdl> And about the lack of Caching, chris proposed the usage of Memcached 10:39:42 <hdl> And I think it is quite a good start. 10:41:29 <hdl> But this would also require to add some local variables to store the information. and not make any calls. 10:41:50 <hdl> This would be much more efficient. 10:45:20 <ColinC> not sure I followed that last part 10:47:16 <hdl> I have done some work for C4::Languages (which still has some bug... But it was a first start... and provided a good improvements.) and C4::Context 10:47:55 <hdl> http://git.biblibre.com/?p=koha;a=blob;f=C4/Languages.pm;h=c00695b607b0f4d1ac8a5f8f4a397b03a97a849d;hb=bug/BLO/MT5496 10:48:16 <thd> Thomas Dukleth, Agogme, New York [lost attention to the time but has read back.] 10:49:18 <hdl> http://git.biblibre.com/?p=koha;a=blob;f=C4/Context.pm;h=eb8b7ada8dd1f55c8af2e32041d747b6d41de5c5;hb=bug/BLO/MT5496 10:51:20 <hdl> ColinC: what are you not following. 10:51:46 <hdl> Do you think I am wrong ? 10:52:44 <ColinC> I'm not sure what point you were trying to make about Memcached 10:54:36 <hdl> Memcached is not generalized in all the modules. 10:56:24 <hdl> And when you edit the thing, the old stuff sticks in Memchaced. 10:57:11 <hdl> you have to flush the cache regularly. Or to slush the cache on edits. 10:57:25 <hdl> flush... Or to edit the structure. 10:58:47 <hdl> But Memcached is not really adding much performance... unless you use for complex structures. 10:59:09 <thd> hdl: What does it mean to "edit the thing" or "edit the structure"? 10:59:55 <hdl> you cache preferences, when you edit a preference. the cache is not updated 11:00:31 <hdl> (at least with the system as it is now) 11:01:57 <hdl> You have to wait for 5 or whatever the time in the file for it to be updated correctly... 11:02:35 <ColinC> For future ref I think the Cache::Memcached interface has more fine grained api than Memoize::Memcached 11:03:09 <thd> hdl: Are the deficiencies to which you are referring deficiencies of memcached or deficiencies of the current implementation work for Koha? 11:03:33 <ColinC> hdl or both 11:04:32 <hdl> both in fact... 11:06:54 <hdl> The problem is that the edition for Memcached would have to be quite localized and centralized... in order to be easily generalised. 11:07:40 <hdl> But it would require to have some more fine grain for some other things. 11:09:16 <hdl> instance : systempreferences donot change often... but biblio change far more often 11:10:20 <ColinC> I think there is quite a bit of refactoring to be done to some code to take advantage of any caching structure 11:11:15 <hdl> Can you provide an exemple of how you see things ? 11:13:35 <hdl> or does anyone has some precise idea ? 11:13:56 <ColinC> Because the model has been cgi script based nothing has been trated as possibly living longer than the scripts runtime 11:14:22 <ColinC> s/trated/treated/ 11:20:03 <thd> ColinC: What improper presumptions have been taken in consequence of that legacy design? 11:22:01 <hdl> ColinC: But is there any concrete exemple of a nice application design you have in mind ? 11:22:17 <ColinC> we have operations that intertwine updates and reads to persistent/non-persistent data 11:23:46 <hdl> Are you thinking of evergreen ? 11:23:55 <ColinC> I've used memcached in a high-throughput environment it gave many improvements but it was important to concentrate cacheing on what benefitted from it 11:26:16 <hdl> you mean to use caching only where it was really needed ? 11:27:07 <ColinC> Yes 11:27:13 <thd> ColinC: Are you suggesting that the performance improvement would be minimal for data which is used infrequently or changes often and therefore needs careful update management? 11:27:29 <hdl> thd: i think so 11:28:28 <hdl> But there are many places : branches, categories, issuingrules, systempreferences, languages, message_preferences... 11:28:34 <hdl> where it can be used. 11:29:17 <ColinC> The approach to e.g. the item and patron record in an issue transaction is different to the circ rules governing that transaction is a crude example 11:29:40 <ColinC> and what hdl said 11:31:08 <thd> ColinC: Why is having multiple data access rules especially problematic? 11:31:46 <ColinC> Its not 11:32:59 <hdl> Is there anything else to say now ? 11:33:12 <thd> ColinC: What did you mean in referring to item and patron record for issuing transactions as different from circ rules? 11:33:19 <hdl> Are you interested in playing with plack ? 11:33:32 <hdl> Are you interested in testing ? 11:33:58 <hdl> Are you interested in any of the 3 problems I listed ? 11:34:01 <hdl> Who ? 11:34:34 <ColinC> thd: the life of of the cached data differs 11:34:55 <thd> hdl: Is there a bug and patch for the Zebra connection issue? 11:38:27 <hdl> no. 11:38:41 <hdl> not yet in fact. 11:39:17 <hdl> It is something we discovered with some of our tools... 11:39:30 <hdl> But neglected to report... 11:39:47 <hdl> But this state of fact is known with zebra and yaz. 11:40:25 <thd> hdl: I see reports on the koha mailing list of people not running Zebra as a daemon. Does that help with the issue or merely create other performance problems? 11:41:04 <hdl> I think it would not help in fact. 11:41:46 <hdl> It is owed to the z3950 protocol which is connected... And since koha initiate one connection and use that... In a data persistent model... you use always the same. 11:42:04 <hdl> And .... It ends up being very slow. 11:42:43 <hdl> ok. if noone has anything else to say... Let's end the meeting. 11:42:51 <thd> What causes the slowness for an open connection? 11:43:05 <hdl> yes... 11:43:36 <hdl> send 1000 queries to a single connection and it will slow down... 11:43:54 <hdl> and zebra will take some more memory 11:44:13 <hdl> If anyone has questions about Plack and how to test... 11:45:00 <hdl> If I can help in any way, if you have any question or if you think of any exemple... please do. 11:46:47 <hdl> #endmeeting