#koha log

10:08:46 <hdl> #startmeeting data persistence on koha
10:08:46 <huginn> Meeting started Fri Feb 18 10:08:46 2011 UTC.  The chair is hdl. Information about MeetBot at http://wiki.debian.org/MeetBot.
10:08:46 <huginn> Useful Commands: #action #agreed #help #info #idea #link #topic.
10:08:53 <hdl> Hi
10:09:08 <hdl> Let's start the meeting.
10:10:13 <hdl> Is there any one for the record ?
10:11:03 * hdl Henri-Damien LAURENT BibLibre
10:11:26 * kf Katrin Fischer, BSZ - following the discussion
10:11:34 <ColinC> Colin Campbell, PTFS-Europe
10:11:34 * fredericd Frédéric Demians, Tamil
10:11:46 <fredericd> salut Henri-Damien
10:12:32 <hdl> So 4 ...
10:12:35 <hdl> Ok.
10:13:23 <hdl> We have been playing with Plack in order to test that and achieve data performance.
10:13:46 <hdl> http://plackperl.org/
10:13:56 <hdl> #link http://plackperl.org/
10:14:46 <hdl> We have pushed a branch on koha-community.org where you can have psgi files to launch a plack server
10:15:26 <hdl> http://git.koha-community.org/gitweb/?p=wip/koha-biblibre.git;a=shortlog;h=refs/heads/wip/Plack
10:15:43 <hdl> There is also the http://git.koha-community.org/gitweb/?p=wip/koha-biblibre.git;a=shortlog;h=refs/heads/wip/data_persistence
10:16:50 <hdl> data_persistence branch which contains some patches in order to manipulate C4::Context to remove the connections to zebra when they are not used any longer. at least in SimpleSearch
10:17:05 <hdl> We have been testing thet on some servers with nginx
10:17:21 <hdl> The results are quite good in terms of performance.
10:18:43 <fredericd> do you have figures?
10:19:23 <hdl> you can launch using : KOHA_CONF=/home/users/koha/sites/univ_lyon3/etc/koha-conf.xml PERL5LIB=/home/koha/src plackup -S Starlet --workers 32 --port 5000 /local/home/users/koha/src/prod.psgi
10:20:12 <hdl> fredericd: well I had... But I donot have them right there.
10:20:31 <hdl> But you can use the file.
10:22:07 <fredericd> If I checkout wip_pack branch from git.koha-communty.org, what do I need to do to get it working?
10:22:17 <hdl> it needs some tweaking of the apache configuration to have static files served directly and a reverse proxy on 5000
10:22:26 <hdl> you need to install Plack
10:22:36 <hdl> cpanm Task::Plack
10:22:54 <hdl> (you can install cpan minus for that)
10:23:14 <hdl> Task::Plack comes with 3 servers you can test
10:23:27 <hdl> plackup, starlet, and an other one.
10:23:37 <hdl> starman I think
10:24:16 <fredericd> installation in progress...
10:24:17 <hdl> The stuff is working quite well... ... But not ready for production at the moment.
10:24:33 <hdl> Because of some problems :
10:24:44 <hdl> a) Circular references in code.
10:24:50 <hdl> b) zebra search
10:25:14 <hdl> c) lack of caching in the code.
10:25:39 <fredericd> You mean that zebra search slow down if connection isn't reset regularly?
10:25:47 <hdl> yes.
10:26:25 <hdl> This is why I sent a branch datapersistence... which destroy the connection to zebra when things are over.
10:27:05 <hdl> But the fact is that when you are running with sockets, it leaves zombies on the system... So one has to kill them
10:27:52 <hdl> With tcp connection, that problem is not so hard.
10:28:12 <hdl> Since there is a tcp timeout
10:30:20 <hdl> About Circular refereences in Code, I sent some mails on list.
10:30:39 <hdl> But had very little feedback
10:31:33 <hdl> Some from ColinC quite helpful...
10:32:09 <hdl> But I would like to know if we can setup a plan on order to solve those problems and work out solutions
10:34:28 <ColinC> a hitlist of obstacles
10:35:05 <hdl> About Circular references : I think that if we could at least get the functions which adds some additional dependencies...
10:35:32 <hdl> I gave a list of some of the circular references.
10:36:49 <hdl> on the lists.
10:37:21 <hdl> i cannot find the link right now.
10:38:08 <hdl> But about zebra search, i think that the solution we proposed could be a nice option.
10:38:33 <hdl> If you could test and provide some feedback
10:39:17 <hdl> And about the lack of Caching, chris proposed the usage of Memcached
10:39:42 <hdl> And I think it is quite a good start.
10:41:29 <hdl> But this would also require to add some local variables to store the information. and not make any calls.
10:41:50 <hdl> This would be much more efficient.
10:45:20 <ColinC> not sure I followed that last part
10:47:16 <hdl> I have done some work for C4::Languages  (which still has some bug... But it was a first start... and provided a good improvements.)  and C4::Context
10:47:55 <hdl> http://git.biblibre.com/?p=koha;a=blob;f=C4/Languages.pm;h=c00695b607b0f4d1ac8a5f8f4a397b03a97a849d;hb=bug/BLO/MT5496
10:48:16 <thd> Thomas Dukleth, Agogme, New York [lost attention to the time but has read back.]
10:49:18 <hdl> http://git.biblibre.com/?p=koha;a=blob;f=C4/Context.pm;h=eb8b7ada8dd1f55c8af2e32041d747b6d41de5c5;hb=bug/BLO/MT5496
10:51:20 <hdl> ColinC: what are you not following.
10:51:46 <hdl> Do you think I am wrong ?
10:52:44 <ColinC> I'm not sure what point you were trying to make about Memcached
10:54:36 <hdl> Memcached is not generalized in all the modules.
10:56:24 <hdl> And when you edit the thing, the old stuff sticks in Memchaced.
10:57:11 <hdl> you have to flush the cache regularly. Or to slush the cache on edits.
10:57:25 <hdl> flush... Or to edit the structure.
10:58:47 <hdl> But Memcached is not really adding much performance... unless you use for complex structures.
10:59:09 <thd> hdl: What does it mean to "edit the thing" or "edit the structure"?
10:59:55 <hdl> you cache preferences, when you edit a preference. the cache is not updated
11:00:31 <hdl> (at least with the system as it is now)
11:01:57 <hdl> You have to wait for 5 or whatever the time in the file for it to be updated correctly...
11:02:35 <ColinC> For future ref I think the Cache::Memcached interface has more fine grained api than Memoize::Memcached
11:03:09 <thd> hdl: Are the deficiencies to which you are referring deficiencies of memcached or deficiencies of the current implementation work for Koha?
11:03:33 <ColinC> hdl or both
11:04:32 <hdl> both in fact...
11:06:54 <hdl> The problem is that the edition for Memcached would have to be quite localized and centralized... in order to be easily generalised.
11:07:40 <hdl> But it would require to have some more fine grain for some other things.
11:09:16 <hdl> instance : systempreferences donot change often... but biblio change far more often
11:10:20 <ColinC> I think there is quite a bit of refactoring to be done to some code to take advantage of any caching structure
11:11:15 <hdl> Can you provide an exemple of how you see things ?
11:13:35 <hdl> or does anyone has some precise idea ?
11:13:56 <ColinC> Because the model has been cgi script based nothing has been trated as possibly living longer than the scripts runtime
11:14:22 <ColinC> s/trated/treated/
11:20:03 <thd> ColinC: What improper presumptions have been taken in consequence of that legacy design?
11:22:01 <hdl> ColinC: But is there any concrete exemple of a nice application design you have in mind ?
11:22:17 <ColinC> we have operations that intertwine updates and reads to persistent/non-persistent data
11:23:46 <hdl> Are you thinking of evergreen ?
11:23:55 <ColinC> I've used memcached in a high-throughput environment it gave many improvements but it was important to concentrate cacheing on what benefitted from it
11:26:16 <hdl> you mean to use caching only where it was really needed ?
11:27:07 <ColinC> Yes
11:27:13 <thd> ColinC: Are you suggesting that the performance improvement would be minimal for data which is used infrequently or changes often and therefore needs careful update management?
11:27:29 <hdl> thd: i think so
11:28:28 <hdl> But there are many places : branches, categories, issuingrules, systempreferences, languages, message_preferences...
11:28:34 <hdl> where it can be used.
11:29:17 <ColinC> The approach to e.g. the item and patron record in an issue transaction is different to the circ rules governing that transaction is a crude example
11:29:40 <ColinC> and what hdl said
11:31:08 <thd> ColinC: Why is having multiple data access rules especially problematic?
11:31:46 <ColinC> Its not
11:32:59 <hdl> Is there anything else to say now ?
11:33:12 <thd> ColinC: What did you mean in referring to item and patron record for issuing transactions as different from circ rules?
11:33:19 <hdl> Are you interested in playing with plack ?
11:33:32 <hdl> Are you interested in testing ?
11:33:58 <hdl> Are you interested in any of the 3 problems I listed ?
11:34:01 <hdl> Who ?
11:34:34 <ColinC> thd: the life of of the cached data differs
11:34:55 <thd> hdl: Is there a bug and patch for the Zebra connection issue?
11:38:27 <hdl> no.
11:38:41 <hdl> not yet in fact.
11:39:17 <hdl> It is something we discovered with some of our tools...
11:39:30 <hdl> But neglected to report...
11:39:47 <hdl> But this state of fact is known with zebra and yaz.
11:40:25 <thd> hdl: I see reports on the koha mailing list of people not running Zebra as a daemon.  Does that help with the issue or merely create other performance problems?
11:41:04 <hdl> I think it would not help in fact.
11:41:46 <hdl> It is owed to the z3950 protocol which is connected... And since koha initiate one connection and use that... In a data persistent model... you use always the same.
11:42:04 <hdl> And .... It ends up being very slow.
11:42:43 <hdl> ok. if noone has anything else to say... Let's end the meeting.
11:42:51 <thd> What causes the slowness for an open connection?
11:43:05 <hdl> yes...
11:43:36 <hdl> send 1000 queries to a single connection and it will slow down...
11:43:54 <hdl> and zebra will take some more memory
11:44:13 <hdl> If anyone has questions about Plack and how to test...
11:45:00 <hdl> If I can help in any way, if you have any question or if you think of any exemple... please do.
11:46:47 <hdl> #endmeeting