UX Pattern: Using In Memory Data in Servoy to Accelerate delivery
UX Pattern: Using In Memory Data in Servoy to Accelerate delivery
Today we’re going to talk about in-memory data sources. So, as is the tradition with the Tech webinar series, we are going to start with demos and then we can do slides if we have time. I was feeling a little bit aggressive when I put this one together because I wanted demo about four or five or six different things. The reason I selected this topic is when I’m working with customers and, helping them to figure out how to implement something, how to solve problems. A lot of times, the need for memory data sources comes up. And there’s a bit of confusion about what it is when to use it, how best to use it. I’ve also hearing similar things from my colleagues as well, who actually prompted me to do this topic. So, I think it’s good to get this one out there so that people can learn. Let’s dive right into the demos. So, the first demo and the first use case is caching aggregate values. Often aggregates can come from many tables. They can be expensive to query if you want to show them in a grid like we have here. Then that becomes compounded because you’re showing lots of different aggregates. So, one workaround is to fetch the aggregates in a single SQL query, obviously a lot more efficient, but then how do you display it in an application? How do you work with it in the same way that you have data binding? So, in this case, we’re taking the good old Northwinds example database, and there’s a product stable there, which of course, products appear in many orders. And the gist of it is that we want to show some information about how the products were ordered in a grid here. So, this grid is an in-memory table based off of that, and I’m going to show you how it was generated. So, if I switch over to the IDE, first of all, I want to point out that since Servoy 8.1 in-memory data sources have supported a design time feature. That means in the solution explorer you can come over to your solution down to the data sources node, and when you expand it, you get this in-memory node here, and you can see that I have some tables that I’ve created. So, if you want to work with, if you want to work with in-memory data, and you know the structure ahead of time, then it really helps you to go here and create this, and you just go and make a new memory data source, and when you open this thing up, you see that it looks like the table editor for any other real relational database table. In this case, I am keeping the product ID, and I’m creating a few numeric columns to capture the aggregates, and that’s it. Something that was added in 8.2, when we covered this a little bit in the 8.2 launch webinar, is there was a new event special for memory data sources. It used to be that you had to go and sort of figure out when you wanted to load them, maybe just as the form is showing, or the user clicks in navigation item, or something. But now we can sort of do that in a lazy way. There is this on-load event for the data source, and you can bind it to a handler, and if we take a look at what’s going on in the code here, we can see that we’re using the query builder APIs, and all of this is the query, and you can see that we’re getting summarizing the sales, the number of units, the number of times it was ordered, etc. Where the magic happens is on this line here, database manager create data source by query. We give it the name of the data source that we want. We give it the query builder object, and you can optionally limit it to a certain number of rows. In this case, I put in minus one, which flags that I want all the records. I should also point out that this event handler passes in the name of the data source that’s trying to load, so that I just use that back when I actually create the data source. That’s pretty much all there is to it. The results of the query are then put into this temporary table, and I can build forms against it. I can do all sorts of things against it. In the case of the designer, if we go and look at the form for the cache aggregates, you can see that I just use the Servoy extra table component and start dropping the columns on there. If we look at the data providers, you may notice that I’m actually throwing some relations in there too. I created relations from the products aggregates memory table to the real products table. Then from there, I even put in another data provider, which hops over then to relations from the products, then over to the product categories. When we look back in the client here, we see that we have the product, we have the category, and then all of the aggregates here. The reason I’m showing you that is I want to point out that once you create the design time data source, it’s just like any other real relational database table. You can make relations to and from memory tables. You can do other interesting things, for example, if I go back to the product aggregates data source, you can see that I’ve created a calculation. I also wanted to know, I have the total number of times a product resorter. I have the total units that were ordered, but I wanted to know on average, how many units were in each order. I created a simple little calculation to divide one aggregate by the other. You can see that I’m able to paste that here on right into the form, the average units per order. I could have also done that in the query as well and let the database handle it. But I did this to point out that everything that you do to memory data source, you can do everything you do to a regular physical data source, you can do to a memory data source. So that’s the first demo. The use case is really sort of making working with aggregated data easier and also better performing. And speaking of performance, the second demo is similar, but it solves a bit of a different problem. What I want to do is show you a form which is inherently query intensive, which maybe has some performance problems. I want to look at the query statistics and then I want to revisit the same user experience, but with an in-memory implementation and show you the difference in the queries. So what I’m going to do is bring up database performance, and I’m going to go to my example database statistics and clear those statistics. And now I’m going to hit this query intensive form. And what this is doing is it’s looking at order details, but it’s also joining in a lot of data, or using a lot of related records. So the product, the product supplier, the product’s category, the customer on the order, and the employee who sold the order. And maybe I’ll page through a few pages. And now what I want to do is take a look at the queries. So just to show that grid and to flip through the records a bit, I’ve done somewhere over 500 individual queries. Overall, performance wasn’t too bad, but you could imagine that if one of those queries goes up a few milliseconds and you multiply that by the hundreds and hundreds of queries, then you start to have a performance problem. And in fact, when I work with Servoy customers and they experience performance problems, what we see is that the most common reason for them are, of course, going to the database, but also the sheer volume of queries that are created, not that the queries are necessarily bad queries or the database was an indexed right, just that Servoy was generating lots and lots of queries. And I think it’s important to point out that using relations and using related found sets really helps you for sort of a user walking through an application, but sometimes it’s not the most efficient way to just grab a bunch of data that may be related and show it, because anyone who does database does SQL stuff knows that you could do this in one query, not 500. So let’s take a look at the one query approach. So what I have here is the same form, but against an in-memory data source that’s optimized with a SQL query, what I’m going to do is go back to my database performance for the in-mem table and clear those statistics. And if I do this right, I can show this form and you could see that loaded quickly, and now if I go and reload these statistics, you could see that there was just a few queries. One is to actually insert, that’s a one time thing into the temp table, but then actually loading the found set as well. So the original query, the one that does all the joins is a single query, and if I clear this and I page through this grid, and I reload those statistics, you can see that it’s only loading the next batch of records in the found set, it’s loading them in chunks, and it’s not doing all of the additional queries that go in for the relations. So again, the use case here is to take a query intensive form and optimize it for with a SQL query. If we quickly take a look at this at the code that loaded this, and this would be the load order details method, the query is pretty easy to generate, especially if you already create relations, you can leverage those. And then again, I’m doing create data source by query, and this gets called again on the load event of that data source. So I created a design time data source here, order details not to be confused with my real order details table, and you can see that I have just the placeholders for the related, this kind of like a SQL view. If you’ve done work in SQL databases and you create views is sort of similar. Okay, the next demo, and this is a bit different. So we’ve seen so far just using memory data sources to look at data that was retrieved from a relational database, you know, through queries. Often people need to import a file, go through some records, check validate something like that, maybe then they get bulk inserted into another database table or maybe they just do some process and then throw it away. In those cases, we’re really working with non-SQL, non-relational data. And so what I did is I wanted to show an example of this and I went out on the Internet and I found a nice big CSV file that’s showing flood insurance policies for the state of Florida and the US kind of coincidentally, Florida just got it with a hurricane. But this is older data. I want to show this non-SQL data and I’m going to click this tab and you can see a little bit of a lag time while it’s inhaling that CSV file. But once it does, I have this stuff in a database table and I can build a form against it and do all kinds of other things against it. So you can see that I can click through this just like any other form. I have a map which is also using, they had a location data so I decided to put a map to show that. But I also want to show that this is when you use an in-memory data source, it’s a real SQL table. It’s not just like a data set that you throw up on a form. It could be something that you show on multiple forms. It could be something that you then generate found on sets against and do searching. So I brought in the S3Y search module and you can see that there’s some 36 and a half thousand of these records, so it’s not a small database. So I might want to do some searching. So I might come in here and I want to do, say Citrus County and that brings it down to 384 and maybe I want Masonry construction, but I don’t want reinforced Masonry construction. So now I’ve got it down to just 46 records. So the reason I’m showing that is to again reinforce that once you have the memory data source, then you can do anything like you would a normal data source. In this case, I wanted to show that I can do querying within. It’s not like I just want all those records up front here. It was a whole bunch of records and I’m using the extension to do the searching within that pretty large database. Let’s take a look at how it was done. Again, I’ll bring up my, this time my flood policies design time data source. And you can see that there’s quite a number of columns that were in that CSV file. And if I go to the event for the on load, it takes me back to my handler. And what I do here, it’s a little bit different. First, I start with an empty JS data set. I set what all the column names are going to be in. And one of the important things when you, when you go from a data set or query to memory data sources that the column names should match. So in this case, I just sort of laid them out so that they match exactly the design time data source schema. Then all I do is I read in the text file and I parse the lines and where the magic happens here is JS data set dot create data source and you give it the memory name. So again, when that form shows the load of ink gets called and the file is read in one time. And then we can work with it in memory and the searching was just really doing query builder against the in memory data set. I won’t show you that because it’s kind of irrelevant. I just want to show you that it’s possible. So, so now we get to maybe some best practices with this. One question that comes up is, of course, you’re working with data and memory. This cached, how do you update that cache because normally you get to for free built in and serve void data broadcasting. So you’re looking at a form that’s based on a table and another user and another session changes that the record you’re looking at you’ll get a data broadcast and you’ll see that updated. So to serve way application servers managing the caches of all the the relational data sources. In memory data sources, it’s pretty much once you load it, it’s offline and it’s up to you to maintain that. So let’s take a look at an example of how we could do that. I’m going to go back to the product aggregates and I’m going to go find one of the one of the products here, the the crap meat. And I want to show you that when you when you update a record, it doesn’t automatically update that. So I created a form over here to show an order and you can see that that appears on the order. And what I’ve done here is I’ve made a close some of these in free up some real estate. I’ve made a table event handler. So if I open up my. My order details. My real physical order order details table and I look at the events. I’ve made an after record update. So if I modify one of those records, I’m handling. I’m handling a trigger. Why don’t I just go in here and set a break point. And show you what happens. So normally if I were to say change the quantity here, let’s make it 5,000 right. So you can see that it should go up by about 45,000. And you can see when I click out the save event triggers this after record update. And I’m calling my update product aggregates method. We’re sort of requeries but but takes the product ID and puts it into the where clause down here. And then when I get that, I put it back into the product aggregates record and I save it. So I’m handling the. The trigger and then I’m going to save it back. And so if I come back. To whoops, I want to resume here from debugger. So if I resume and then I come back to. The client and I go back over and I look at the the project aggregates. You can see it went up. It was down like 20 some thousand now it’s gone up to 69,000. So that happened because because of that table event. So that’s one area where you can you can opportunistically update the cash. The other scenario is let’s say a user in another client session went into that order and made that change. And so that’s the next thing I want to show you because I’m in Servoy developer. There’s only one debug client for each client type. I have to launch the classic web client. So I’m going to do that now. It’s not quite as beautiful but it’ll get the job done. Oops, that was the wrong one. So what I want to do here is come back and change this back down to something much smaller. And we should see it the value go way back down on the on the product aggregate. So as I click out now it’s broadcasting. Again, I’ve said a break point. Oh, wait, this is within that client. Sorry. So if I come over to. To the ng client, you can see that it’s gone. I didn’t have to reload or refresh or anything. I’m just still sitting on that same form. It’s gone back from 69,000 back to 19,000. So in this case the data broadcast was handled and it did the same thing. Just to show you how that’s done. If we look at. The solution settings. The data broadcast handlers here might go to meeting consoles in the way. There we go. And if I go into. If I go into this handler, I’m just checking the data source. If it’s if it’s the order details, then I want to go ahead and process the update aggregates method. So I’m just getting the primary keys that were affected and I’m calling this same method, which. Requeries the. The aggregates and loads them into those records. So two events, data broadcasting. And table events to handle that. And we have a little time left and there’s there’s one other thing I wanted to show you. That’s just kind of for fun, but also does have some real implications. And that’s if you wanted to actually share a memory. Data source. Up until now, everything that we’ve done is within an individual client session. So. When I came in and I. Did the. The no sequel data and I loaded those flood policies. And it read in that CSV file. All of that is put into a. An in memory table. In my client session. So if someone else were to go to the same form in a different client session. They would also load in that CSV file and then I have two copies of the same. data in memory table for two different clients. This has implications both for the memory footprint on the applications. There were an also. For if I wanted to somehow have them see the same data. Maybe one person edits that and the other person wants to get a data broadcast event. So there are situations where you might actually want to have. A shared data source, much like a regular physical data source, but. A memory implementation of a shared data source. So what I’m going to show you here is. Is how you can do that quickly. We’ll just look at the shared table here. What I did is I just wrote a little script that took the suppliers and the. Customers from the example database and smush them together in a single. In memory table and loaded that here in the form. And if I look at it in the web client. I think needs to be reloaded anyway. They get shared together. Pointing at the same data source so you get data broadcasting and all that stuff. But I want to show you how you do that and then we can talk about why you might want to do that. If I look at the data sources. The actual. Fix data sources I have. I made I made one here called mem db and I want to show it to you. This is actually using the hsql db database. It’s the same database that we use for the in memory table. So when you make a memory data source, you’re actually making. A real database table and a real. You know, real SQL database implementation. It just happens to be in memory. And that’s using the hyper SQL database. Which ships by default with Servoy. What we can do is make a fixed connection to the same database. And create a table, a static table, a design time. So what I did here is I actually decided to make it bound to a file. So that also when the server shuts down, it’ll write out. The cache data to a file and starts up. It’ll read it back in. So I have some persistence between sessions. And you can see that I created on this data source one. One table called share table, which just has the company name in it. So then I can go and build forms against that same data source. So the reason I’m showing you this is that there may be times where you actually want to have some in memory data that you share across clients. It could be some metadata that is fixed but still needs to be loaded dynamically. So it’s not very common, but it’s kind of a side note. And I just wanted to point out that’s the real database. So you have access to it implicitly through all the API for working with memory data sources. You also have explicit access to it through the data sources. You just need to create a connection to it. It’s always there because it ships with Servoy. All righty. Let’s do a quick overview of what we just saw. And then we’ll take some questions. So again, what are memory data sources? It’s an in-memory implementation of database tables. It does use the hyper SQL database, which is a fully featured relational database. It just happens to be an in-memory implementation. When you use a memory data source, you get all of the CRUTA operations that you get with a physical data source. And you get to leverage all of Servoy’s data binding features. That means you can build data bound forms. You can make relations to and from memory data sources. You can build value lists against data sources. You can write queries to those data sources through Findmill, through Query Builder, etc. Why would you work with memory data sources? So one that we saw is working with arbitrary data sources, not SQL data sources. So consuming web services and showing them in a client, consuming flat files like we saw with the flood policies. Another reason you would use memory data sources is for performance optimization. You can greatly reduce the number of queries by writing a custom query, and then you can load that into a memory table. So you can do data binding things with it like build a form and show it. Another way you can optimize performance is to take data that you know is read only and just preload it into a memory data source. And then that way every time you hit that data source, you don’t have to keep going to disk to get it out. Also, I didn’t show an example of this. It can make difficult data structures easier to work with. Sometimes I work with customers that for whatever reason they can’t change their database schema. And it may be really overly normalized or maybe under normalized and they have big wide tables with lots and lots of columns. And what they can do is they can make the data structures that they’d rather work with and then just load them into the into the in memory data structures that are much easier to build applications against. So that’s a good use case that we didn’t we didn’t show. There’s two ways that we saw to load data. One is out of a data set. So if you get data from a file or from a web service or from anywhere else and you just put that into a data set, which is just a little row column object, then you can create a data source from that. The other way is through the query builder, which is really just shorthand for going through the data set because once you have a query builder object, you can load that into a data set and create it to a data source, but you can also create it directly from the query. Nice because you can load lots and lots of records without first putting in a data source. One thing to be sure is that when you do either of these approaches, the column names on the data set or the the aliases in the query, you’d imagine the target data source. There’s two ways to do the data source definitions what we saw in this demo was all at design times. So every every one of those data sources is the flood policies, the order details, the product aggregates. Those were all built at design time by right clicking the data sources, no, creating it in the table editor. You may encounter a situation we didn’t demo this, but you may encounter a situation where you don’t know what the scheme is going to be in advance. Maybe you are just consuming a flat file and you know, then you want to show it to the user with what rows and columns are in and let them do stuff to it. In that case, you probably want to get into a bit of the solution model. You can create a data source at runtime without it being existing at design time through the same API’s. But in that case, you don’t have the benefit of the design time data binding. So you’d have to take a form and then read it in a solution model and modify it, put the set the data source, etc. So I would only do that one necessary when you really don’t know the schema ahead of time. It’s important to remember that you will not be updated by built in data broadcasting. So you want to take advantage of table events and data broadcast events to update your cache if you needed to be updated. If it if it indeed relies on, you know, other tables. Keep in mind that they’re these are in memory implementation. So there are implications for creating lots and lots of in memory tables. By default, when you create a memory data source, it is per client. So the example I gave was that flood policies data set with 36,000 records in it. If I had 100 clients and each of them read in that same CSV file, then that would probably put quite a big footprint on the application server. So in cases where you think you’re going to have an in memory data source where there’s going to be sharing or, you know, a group of clients are going to basically hit the same data source. Consider doing the last trick I just showed you, which is to go ahead and make a real dedicated connection. That’s a bit different. It’s just like any other connection and they will share those and they’ll be updated via data broadcasting. So that does it for the overview. I will leave these links up which show you both the web page where we will post a recording of this video and also the Servoy forum where we continually post the slides and the example solution that we do in each one of these webinars there so you can download it later. You can ask questions. You can give us feedback. Do you have any questions? John. Yes, there are a few questions and keep coming them in. First question is from Juan and the question is would it be possible to create the empty data set with the same structure that the in mem data source has the like with an easy method. Otherwise if we ever change the in mem data source structure, we would need to change the code as well. So I guess the question is asking where I had shown the list of columns that we’re going to be in the data set. If you could somehow, well you can get the JS table object. So you could probably read that if you needed to, but I can imagine that if that change was happening that you probably have to change your query and stuff too. You wouldn’t know what the change is going to be. You wouldn’t be able to automate that, but you could probably get the JS table object for the data source and read that to get the list of columns. So the question from Andre does the in mem data source work with the Jasper report plugin. The in mem data source will work for the Jasper report plugin for the for the found set based reporting because like any other type of data source, the in memory data source once you once you fill the the data, you can create found sets, etc. And the Jasper report plugin takes found sets is as one of the things you can pass in. Jasper reports plugin also can do direct sequel, in which case you can also connect to an in memory data source, but it won’t exist outside the client session. So that would be a bit tricky. So I would go to the found set reporting route. Hope that’s a good answer. So if you define the connection, then like you showed in your second example, then you can also make Jasper connect to that or not. Yeah, exactly. Yeah. So if you made the dedicated connection, then then Jasper could do it outside the. Is that a TCP connection so that outside applications can connect with. That’s just a regular database. Okay. They should work then. And there’s a question from Matt’s how would the performance look if the source tables were all very large, for example, millions of records for the pre loaded relational data set. How would the performance look well. In general, the performance of something which is fully cash to be faster than something which goes to disk, loading the data initially would probably be pretty slow. So I can say those two things that the performance of loading the data will be slow. But that’s a one time event working with. Cashed an in memory table will be faster than going to disk. And that’s one of the benefits of using it. There will be limitations to the number of records that you can support. This has more to do with the hardware and. You know, the RAM allocated to the application server. I have read that the high per SQL database that’s used can support or can can do handle a large volume of transactions and a large number of records up in the many millions of records. So that’s that’s not really the limitation limitation is really on infrastructure. Okay. Yeah. So to conclude. The speed will be extremely good. Keep an eye on low times. Keep an eye on memory usage because yeah, it’s all in memory. Typically, if you’re having a small table, millions of rows should not be a big deal. If you just kind of bites and I guess there’s a bit of overhead. But if you start loading billions of rows into memory, then yeah, it’s. We’ll start using things. Keep an eye on your resource usage. Question from Ruben, is it possible to nullify and in mem data source to release resources for example? There’s not anything in the API to to make it go away. What you can do is you can call create data source again on an empty. On an empty data set because that will just dump and reload. So you can call create data source on the same data set or on the same data source. As many times as you want and it will effectively drop and reload records into that in memory table. So if you wanted to get rid of the records, then then you could just do it on an empty data set. That’s one way to do it. I’m not sure about when that memory gets released that has more to do, I think with the database implementation and garbage collection and all that stuff. So I’m not sure if that fully answers your question, but it should be possible. All right, what looks like all the questions are as to the comment from Bob. Sean always does a great job demonstrating a technique with a real life use case. Thank you, Bob. Just to make your day Bob did it. All right, thanks everybody for attending. We will be posting this recording soon on the archive and on November. First, we will be back with a new best practices session. So thanks, Sean for presenting and thanks everybody for attending and head to the forum for some discussions.