• Creating duplicate InfluxDB databases/datasets in InfluxCloud using SELECT INTO

    Creating duplicate InfluxDB databases/datasets in InfluxCloud using SELECT INTO

    InfluxCloud Chronograf currently doesn’t provide a fast and easy way of creating duplicates of your existing databases. Their support (which comes with the paid service) is sadly also very slow and it’s a bit of a chore to make them do anything for you within a reasonable timeframe unless it’s something supercritical.

    So, if you’re in need of a way to create a backup and re-create a DB multiple times because you’re looking to test some destructive actions on your data (such as updating records) and you’re looking to avoid dealing with the “support”, here’s a hacky way to deal with it which might end up being the faster thing to do.

    Meet Mr. SELECT INTO. What SELECT INTO aims to do is to help you out in copying all your data from all the measurements you might have into a newly created database (has to be pre-created before running the query). It also aims to keep your existing data structure (measurement names and what records go where). In its simplest form and in ideal circumstances the query that you’ll need to execute will look like this:

    A few details that are not so great about the vanilla SELECT INTO approach:

    1. Continuous Queries will not be copied over.
    2. If you try to run the SELECT INTO query from the Chronograf “Data Explorer”, and you have a dataset which is not trivially small – the Chronograf request with your query will timeout after 60s and the query will simply stop. It won’t even continue running in the background.
    3. Data types of some values might change for no apparent reason.. For example INT columns (values) might get converted to FLOATs upon data copying although all the data stored in these columns WAS initially actually INT data.. (?!) This is interesting because it might come back later and bite you once you try to execute aggregate functions to backfill some of the downsampled measurements with new/fixed data. What exactly is the issue with that? For example the SELECT INTO query might not convert INT’s to FLOAT’s for downsampled measurements (which are typically populated by CQ’s) but only for the “lowest resolution” measurements (measurement you insert the data into explicitly). So after your happily copied over all the data, if you try to insert a max(thought_it_was_an_int) from the default measurement into a downsampled one it will now try to store a FLOAT into an INT column and this will obviously return an exception. Happy days.

     

    What this means:

    1. Probably not a big deal if you just want to mess with the data, but keep it in mind.
    2. You’ll have to break your data down into smaller datasets to copy over (for example into one day data chunks using a WHERE time … clause). You’ll probably need an SDK library (such as InfluxData.Net) to automate the process.
    3. You’ll have to ditch the wildcard SELECT from the vanilla example and manually select (and explicitly cast) all the columns that you want to copy over to the new DB. This will make the process slightly more complicated but at this point you’re probably automating everything through a small script / migration app anyway so it shouldn’t be too much additional work.

     

    So, here’s a proof-of-concept code sample that explicitly selects columns to copy over, forces casting to INT data type (or keeps the data type intact), selects specific source and target measurements (tables) and selects only a single day worth of data. You will have to iterate through 1-2 loops to inject your desired source and target table names and time ranges to cover all the data that you’ve got in your DB.

    I really hope this helps and saves you some time. If you have any questions, feel free to leave a comment.

    #FastNotFastButMaybeFaster

    Cheers!

  • Delayed continuous query data resampling with InfluxDB

    Delayed continuous query data resampling with InfluxDB

    If you’re manually time-stamping your data and perhaps also batch-saving it to InfluxDB, there’s a high probability that some of your data points are arriving slightly late and are not being taken into account when the continuous query kicks in and creates a downsampled record.

    Lets take a 1-hour interval as an example. If your data point point gets written into InfluxDB even a second after the full hour, it won’t be used by the continuous query to calculate the downsample for the previous hour. A CQ which was set to run every hour will run precisely at the beginning of each hour.

    Now, the best way to deal with this would be to have a way of simply delaying the CQ execution by a certain amount of time so that it runs only when we’re certain that all the data points have already arrived. Unfortunately this is currently not possible with InfluxDB but – there is another approach which although not perfect, will help us out if we’re looking for downsampled data with as much precision and correctness as possible.

    What we can do is, we can specify a RESAMPLE interval when we define a CQ. If we take the 1-hour example, what this means is we can specify for how long will the CQ keep re-downsampling the data for the same hour. This simple timeline will help in better understanding the basic idea:

    Say we collected some data between 5am and 6am. CQ runs at exactly 6h mark and creates a new “downsample record” for the previous hour (T1). However, a point with a timestamp of 5h59m47s might arrive to InfluxDB at 6h00m18s. The problem is that the late point won’t end up in either the 5-6h downsample (because the CQ had already ran at this point) nor the 6-7h downsample (because the timestamp does not belong to this time interval).

    However, if we specified the RESAMPLE interval of say 2h in our CQ, the CQ would run on the 5-6h interval dataset for the second time at the 7h mark, but this time it would take into account the late data point as well. If we specified a RESAMPLE interval of 3h, the CQ would run for the same 5-6h interval 3 times and so on.

    Here’s a sample piece of code which does what was just described:

    Earlier I mentioned that this approach is not perfect. Why? It’s because we’re doing the downsampling twice. Depending on the amount of data that you have, this might potentially be an unnecessary performance hit. But, if you have many late points, it might also be critical to not just ignore them. So it’s a potential trade-off between precision and performance and you’ll have to evaluate your use case and decide if this approach will work for you or not.

    Hope it helped. If you have any questions, drop them in the commnets. Cheers.

  • How to use WHERE LIKE clause in InfluxDB

    How to use WHERE LIKE clause in InfluxDB

    In this short post I’ll explain how to write a WHERE LIKE clause in InfluxDB. The syntax is slightly different from what you would use in standard SQL. There is no LIKE clause per se and instead syntax resembles pattern matching with an equality or negation operator.

    Bellow are two very simple examples. The first one returns all the records where SensorName starts with abcd and the second one returns all the records where SensorName doesn’t start with abcd.

    This should be enough to get you started. Cheers!

     

  • Filtering collections by non-existing child entities in LLBLGen Pro ORM

    Filtering collections by non-existing child entities in LLBLGen Pro ORM

    This is an interesting one. Let’s take a very standard scenario as an example. Say we have a one-to-many relation (customer-order) between two tables (diagram made using QuickDBD):

    And say we want to fetch all the customers that haven’t made a single order yet. It’s sort of the opposite of filtering by nested entities which was covered in an earlier post of the series. In this case we want to fetch all the customers that have a zero count of children entities. To do this, we’ll have to use JoinHint and DBNull LLBLGen constructs. Here is how:

    If you have any further questions, please leave them below. Cheers!

  • Filtering collections by multiple field values in LLBLGen Pro ORM

    Filtering collections by multiple field values in LLBLGen Pro ORM

    Sometimes we need to fetch a very specific subset of records from the database. For example when we already have a list of specific Id’s of records that we need to fetch. That’s what this post will cover.

    To accomplish this kind of filtering with LLBLGen we need to use the FieldCompareRangePredicate. It’s usage is very similar to the FieldLikePredicate from the previous post of the series.

    That’s it. If you have any questions, leave them bellow. Enjoy!

     

  • Filtering collections by LIKE operator in LLBLGen Pro ORM

    Filtering collections by LIKE operator in LLBLGen Pro ORM

    This will be a simple one. It’s another useful way of filtering entity collections in LLBLGen Pro. If you don’t know anything about filtering, predicate expressions or buckets in LLBLGen, check out some of the previous posts from the series.

    What the LIKE operator enables us to do in SQL is it allows us to filter collections by partial string matches which is a quite common action if you have any search input fields in your app. For example if we wanted to find all the products that have the word “blue” in their name we would use the LIKE operator. The way to do it in LLBLGen would be to use the FieldLikePredicate:

    There really isn’t much more to it, it’s pretty straightforward and simple.

    Hope it helped. If you have any questions or comments, drop them below. Cheers!

     

Back to top