Mission impossible – Jpeg File Size Prediction

12. März 2019AllgemeinJpeg, Jpeg Dateigröße, Predict jpeg file sizeAndreas

Vorhersage der Filegröße von JPEGs verschiedener Größe erweist sich als sehr unsicher.

tdlr; Eine Vorausberechnung scheint nicht möglich – es bleibt nur eine wiederholte „Testberechnung“ mit einer Optimierungslogik für die Skalierung.

Für die Monochrom-App – eine App zur Umsetzung von Leica-DNG Dateien in Schwarz-Weiß-Bilder – wünschte man sich eine Funktion zur Erstellung von Jpegs mit einer vorbestimmten Größe in Kilobyte.

Viele Foren sehen eine Begrenzung der Uploadgröße vor. Vorausgesetzt man möchte keine Qualitätsabstriche machen (mein Standard für JPEG Qualität ist 93 von 100), so muss die Pixelzahl durch Veränderung der Seitenmaße (bei Erhalt deren Verhältnisses) angepasst werden.

Schön wäre es, diese Seitenlängen im Vorhinein ausrechnen zu können — geht aber leider nicht. Der Jpeg-Standard verknüpft verschiedene Komprimierungswege, die darüber hinaus durchaus standardgerecht relativ frei kombiniert werden können (https://en.wikipedia.org/wiki/JPEG), namentlich Colorspace Conversion, Downsampling, Discrete Cosine Transformation, Quantization und anschließendes Entropy Encoding.

Aus meinem DNG-Bestand habe ich 631 .DNG Dateien mit 6000x4000px in monchrome Bilder umgerechnet und auf längste Bildseiten mit 100, 200, 500, 1000, 1500, 2000, 2500, 3000, 3500, 4000 Pixeln skaliert (entsprechend 6666, 26666, 166666, 666666, 1.5 mio, 2.666 mio, 4.166 mio, 6 mio, 8.166 mio und 10.666 mio Bildpixeln)

In der Folge ergibt sich, dass die Bilddateigröße sich nicht sicher voraussagen läßt, wenn man die Pixelzahl durch Lanczos3-Skalierung in 1/4 Schritten verringert (Halbierung der Langs- und Breitseite).

Die Linien zeigen die Größenveränderung der resultierenden Jpeg-Datei aus Leica-DNG nach monochrom-Umwandlung bei unterschiedlichen längsten Bildseiten (px100 — px4000) mithilfe Lanczos3-Skalierung

Obenstehender Grafik ist die Veränderung der Dateigröße bei unterschiedlicher Skalierung einzelner Bilddateien (farbige Linien) zu entnehmen. Es deutet sich schon an, dass kein linearer Zusammenhang besteht. R-Plus ergibt einen exponentiellen fit mit R**2 zwischen 0.8 und 0.9 von einzelnen Längseitenlängen zum nächsten Halbierungsschritt (i.e. px2000 -> px1000). Die Genauigkeit reicht nicht für eine saubere Vorhersage der Bilddateigröße.

Box-and-Whiskerplots der Dateigröße von
N=631 Leica .DNG 6000×4000 -> Jpeg Grautöne (Black&White) skaliert (Lanczos3) auf verschiedene längste Seitenlängen (px100 — px4000)
Die rote Linie bei 500 KB zeigt das Uploadlimit des deutschen LUF.

Die rote Linie bei 500 KB in oben stehender Grafik zeigt das Uploadlimit des deutschen LUF. Wie zu erkennen ist, kann die Länge der Längsseite eines 500MB Jpeg (Q93, mit 8 Bit Grautönen) demnach zwischen px1000 und px3500 liegen – je nach Bildkomposition.

Was bleibt:

Berechnung jeder einzelnen Jpeg-Datei und Ermittlung der Dateigröße, Optimierungslogik anhand des Fehlers und Annäherung an eine geeignete Pixelzahl in asymmetrischen Schrittlängen*Fehlercoeff.

Etwa so:

Zielgröße: 490000 jpeg: 744593 width: 1802.2019 err^2: -0.0051
Zielgröße: 490000 jpeg: 616205 width: 1785.2429 err^2: -0.0025
Zielgröße: 490000 jpeg: 605390 width: 1778.2429 err^2: -0.0023
Zielgröße: 490000 jpeg: 600989 width: 1765.2429 err^2: -0.0022
Zielgröße: 490000 jpeg: 592734 width: 1704.2429 err^2: -0.0020
Zielgröße: 490000 jpeg: 555151 width: 1701.2429 err^2: -0.0013
Zielgröße: 490000 jpeg: 554161 width: 1664.2429 err^2: -0.0013
Zielgröße: 490000 jpeg: 529685 width: 1551.2429 err^2: -0.0008
Zielgröße: 490000 jpeg: 466955 width: 1648.2429 err^2: 0.0004
Zielgröße: 490000 jpeg: 521434 width: 1631.2429 err^2: -0.0006
Zielgröße: 490000 jpeg: 512253 width: 1624.2429 err^2: -0.0004
Zielgröße: 490000 jpeg: 508231 width: 1611.2429 err^2: -0.0003
Zielgröße: 490000 jpeg: 500654 width: 1550.2429 err^2: -0.0002
Zielgröße: 490000 jpeg: 466785 width: 1553.2429 err^2: 0.0004
Zielgröße: 490000 jpeg: 468561 width: 1590.2429 err^2: 0.0004
Zielgröße: 490000 jpeg: 487857 width: 1703.2429 err^2: 4.3e-05

auf Kosten der Dauer: 3.35 sec anstatt 1.45 sec (einschließlich der B&W Konversion)

Go (Golang) Array Specialities

12. August 2018AllgemeinAndreas

If calling a function in go with an array as parameter, „strange“ things happen from time to time. Some call them pitfalls – anyway it’s a must know for golang programming.

I will share some experiments:

https://play.golang.org/p/snqZWeqJWpw

package main

import (
  "fmt"
)

func shortenRef(a *[]int) {
  copy(*a, (*a)[3:])
  *a = (*a)[:3]
  fmt.Println(*a, len(*a))
  *a = make([]int, 10)
  fmt.Println(*a, len(*a))
}

func shortenArr(a []int) {
  copy(a, a[3:])
  a = a[:3]
  fmt.Println(a, len(a))
  a = make([]int, 10)
  fmt.Println(a, len(a))
}

func main(){
  a := []int{0,1,2,3,4,5,6,7,8,9}
  shortenRef(&a)
  fmt.Printf("call shortenRef(&a): %v len: %v\n", a, len(a))
  a = []int{0,1,2,3,4,5,6,7,8,9}
  shortenArr(a)
  fmt.Printf("call shortenArr(a): %v len: %v\n", a, len(a))
}

OUTPUT:

3 4 5] 3
[0 0 0 0 0 0 0 0 0 0] 10
call shortenRef(&a): [0 0 0 0 0 0 0 0 0 0] len: 10
[3 4 5] 3
[0 0 0 0 0 0 0 0 0 0] 10
call shortenArr(a): [3 4 5 6 7 8 9 7 8 9] len: 10

neovim boilerplate

18. Februar 2018AllgemeinAndreas

linux (Lubuntu 17.10) 2018.02
neovim with vim-plugin, vim-go and deoplete.vim code completion

prerequisites:
go -Installation
Python3 or Anaconda3 (- use pip for python neovim needed by deoplete)
export PATH=$PATH:/usr/local/go/bin
export GOROOT=/usr/local/go
export GOPATH=$HOME/go

$ sudo apt-get update
$ sudo apt-get install neovim
( –> vs 0.2.0 in ~/.local/share/nvim )

$ pip install neovim

$ cd ~/.local/share/nvim/
$ git clone https://github.com/Shougo/deoplete.nvim.git
$ curl -fLo ~/.local/share/nvim/site/autoload/plug.vim –create-dirs https://raw.githubusercontent.com/junegunn/vim-plug/master/plug.vim

$ mkdir ~/.local/share/nvim/plugged
$ mkdir ~/.config/nvim/

$ nvim ~/.config/nvim/init.vim


call plug#begin('~/.local/share/nvim/plugged')

Plug 'junegunn/vim-easy-align'

Plug 'https://github.com/junegunn/vim-github-dashboard.git'

Plug 'scrooloose/nerdtree', { 'on':  'NERDTreeToggle' }

Plug 'fatih/vim-go', { 'tag': '*' }

Plug 'nsf/gocode', { 'tag': 'v.20150303', 'rtp': 'vim' }

if has('nvim')
  Plug 'Shougo/deoplete.nvim', { 'do': ':UpdateRemotePlugins' }
else
  Plug 'Shougo/deoplete.nvim'
  Plug 'roxma/nvim-yarp'
  Plug 'roxma/vim-hug-neovim-rpc'
endif

Plug 'zchee/deoplete-go', { 'do': 'make'}

Plug 'terryma/vim-multiple-cursors'

call plug#end()

" neocomplete like
set completeopt+=noinsert
" deoplete.nvim recommend
set completeopt+=noselect

" Path to python interpreter for neovim
let g:python3_host_prog  = '/path/to/python'
" Skip the check of neovim module
let g:python3_host_skip_check = 1

" Run deoplete.nvim automatically
let g:deoplete#enable_at_startup = 1
" deoplete-go settings
let g:deoplete#sources#go#gocode_binary = $GOPATH.'/bin/gocode'
let g:deoplete#sources#go#sort_class = ['package', 'func', 'type', 'var', 'const']
let g:go_auto_type_info = 1

" Disable deoplete when in multi cursor mode
function! Multiple_cursors_before()
    let b:deoplete_disable_auto_complete = 1
endfunction
function! Multiple_cursors_after()
    let b:deoplete_disable_auto_complete = 0
endfunction

restart nvim
:PlugInstall
:GoInstallBinaries

___and you’re DONE___

Find position of bit in byte with one bit set

3. Dezember 2017AllgemeinAndreas

Gegeben ist ein Byte mit einem gesetzten Bit: i.e. 1,2,4,8,16,32,64,128

Die Position des gesetzten Bits soll gefunden werden:
i.e. Byte = 1 enstpricht Position 1
i.e. Byte = 64 entspricht Position 7

… ended up with deBruijn (without knowing then …)


package main

/*
int pos(int a){
  int p = (a & 3) | ((a >> 1) & 6) | ((a >> 2) & 5) | ((a >> 3) & 4) | ((a >> 4) & 15) | ((a >> 5) & 2) | ((a >> 6) & 1);
  return p;
}
*/
import "C"

import "fmt"

func main() {
  a := 1
  for i := 0; i < 8; i++ {
    e := (a & 1) | (a & 2) | (a >> 1 & 4) | (a >> 4 & 8) // 1, 2, 4, 8
    e |= ((a >> 1) & 2) | (a>>2)&1                       // 3
    e |= ((a >> 2) & 4) | (a>>4)&1                       // 5
    e |= ((a >> 3) & 4) | (a>>4)&2                       // 6
    e |= ((a >> 4) & 4) | ((a >> 5) & 2) | (a>>6)&1      // 7
    fmt.Println(e)
    a <<= 1
  }
  b := 1
  for i := 0; i < 8; i++ {
    a := b
    e := a & 3 // (a & 1) | (a & 2)         // 1, 2
    a >>= 1    // b>>1
    e |= a & 6 // (a & 4) | (a & 2)                // 4, 3
    a >>= 1    // b>>2
    e |= a & 5 // (a & 1) | (a & 4)                 // 3, 5
    // a >>= 1 // b>>3
    e |= (a >> 1 & 4) // 6
    a >>= 2           // b>>4
    //e |= (a & 1)                     // 5
    //e |= (a & 2)                     // 6
    //e |= (a & 4)              // 7
    //e |= (a & 8)                     // 8
    e |= a & 15              // 5, 6, 7, 8
    e |= (a>>1)&2 | (a>>2)&1 // 7
    fmt.Println(e)
    b <<= 1
  }
  c := 1
  for i := 0; i < 8; i++ {
    a := c
    e := (a & 3) | ((a >> 1) & 6) | ((a >> 2) & 5) | ((a >> 3) & 4) | ((a >> 4) & 15) | ((a >> 5) & 2) | ((a >> 6) & 1)
    fmt.Println(e, int(C.pos(C.int(a))))
    c <<= 1
  }

  // faster ! really this is as fast as 
  //        : about 1 multiply, 
  //                1 bitwise shift operation, 
  //                1 bitwise 'AND' operation 
  //                .. followed by one array uplook
  // (found out this is deBruijn style)
  // 111 101 001 100 010 110 011 000 '1101000111' (839*(1<<i))>>6)&7
  //
  // start with a map and a multipier '1101000111'

  var deBruijnMap = [8]byte{4, 5, 2, 6, 3, 1, 8, 7} // ->{5, 2, 4, 0, 1, 3, 7, 6}
  const mult = 839
  for i := 0; i < 8; i++ {
    a := 1 << uint(i)
    fmt.Println(deBruijnMap[((mult*a)>>6)&7])
  }

}

Go – running fixed number of concurrent go routines

29. November 2015AllgemeinAndreas

This snippet is about the control of running a fixed number of concurrent worker functions concurrently.

Lets start with a basic version, that does no error checking at all and that will simply panic in case one of the concurrent processes crashes.

Snippet 1


package main

import "fmt"

func main(){
   parallel(func(x int) error {
    fmt.Println("Parameter:", x)
    return nil
  }, []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 3)
}

func parallel(fn func(int) error, sq []int, p int) error {
  var q = make(chan bool, p)
  defer func() { close(q) }()
  for _, s := range sq {
    q <- true
    go func(s int) {
      fn(s)
      <-q
    }(s)
  }
  for len(q) > 0 {
  }
        return nil
}

The function parallel takes the function to run in parallel (fn), a slice of parameters to start the concurrent workers (sq) with and the number of workers, you want to run in parallel. First a channel q of the capacity p (the number of concurrent processes) is instantiated. The defer function will care for proper closing. The for-loop takes care for starting the go routines with the appropriate values. By filling up the channel q the loop execution is blocked if the capacity is exhausted and p processes are running. Each return of a goroutine receives once from q and thus allows for a new loop until all parameter values are issued to the worker functions. The first for-loop exits and the second for-loop cares that all coroutines to finish before parallel returns.

Make sure your function fn in calling parallel(fn, sq, p) meets the function type (=signature) and the value table is adjusted properly then you are done with.

Note, that Snippet 1 panics if any of the concurrent processes crashes. Furthermore the second for loop will run in an endless loop, if one of the go routines didn’t return because ‚<-q' won't happen than. Such a code works well if the function called concurrently is guaranteed to non-panic or block (or if the project is an a state, where such panicking helps to improve logic). Snippet 2 cares for recovery and returns general error information and information on the parameter value that resulted in the crash.

Snippet 2


package main

import "fmt"
import "errors"

func main(){
    err, errlist := parallel(func(x int) error {
    fmt.Println(x, 100/x)
    return nil
    }, []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0}, 3)
    if err != nil {
  fmt.Printf("Error! -> %v\n", err)
  for _, err := range errlist {
    fmt.Println("  ", err)
  }
    }
}

func parallel(fn func(int) error, sq []int, p int) (err error, errlist []error) {
  var q = make(chan bool, p)
  var e = make(chan error, len(sq))
  defer func() {
    close(q)
    close(e)
  }()
  for _, s := range sq {
    q <- true
    go func(s int) {
      defer func() {
        if P := recover(); P != nil {
          e <- errors.New(fmt.Sprintf("func(%v) crashed: %v", s, P))
        }
        <-q
      }()
      fn(s)
    }(s)
  }
  for len(q) > 0 {
  }
  for len(e) > 0 {
    err = errors.New(fmt.Sprintf("%d concurrent process crashed", len(e)))
    errlist = append(errlist, <-e)
  }
  return err, errlist
}

------------------------------
Output from running this code.
3 33
4 25
5 20
6 16
1 100
7 14
2 50
9 11
10 10
8 12
Error! -> 1 concurrent process crashed
   func(0) crashed: runtime error: integer divide by zero

This gives better control on crashes and allows the main process to proceed even in case one of the concurrent functions crashes. Anyway it does not care properly for a concurrent function not returning.

Furthermore errors are printed out after all processes finished, which might be inconvenient on long running processes.

In the next Snippet 3 errors are channeled out to a concurrent running receiver loop.


package main

import "fmt"
import "errors"

func main(){
  // -> routine 4 concurrent error / result output
  errChan := make(chan error, 11)
  //ergChan := make(chan int, 11) // enable concurrent result output
  alldone := make(chan bool, 1)
  go func() {
    for done := false; !done; {
      select {
      case e := <-errChan:
        fmt.Println("Error -> ", e)
      // case ee := <-ergChan: // enable concurrent result output
      //   fmt.Println("Result -> ", ee)
      case done = <-alldone:
        break
      }
    }
  }()
  // <-

  parallel(func(x int) error {
    r := 100 / x
    fmt.Println(x, r)
    // ergChan <- r
    return nil
  }, []int{1, 2, 3, 0, 4, 5, 0, 6, 0, 7, 8, 9, 10}, 3, errChan)

  // let error output goroutine exit
  close(alldone)

  // ....
}

// func parallel with errors returned through an error channel
func parallel(fn func(int) error, sq []int, p int, eC chan error) {
  var q = make(chan bool, p)
  defer func() {
    close(q)
  }()
  for _, s := range sq {
    q <- true
    go func(s int) {
      defer func() {
        if P := recover(); P != nil {
          eC <- errors.New(fmt.Sprintf("func(%v) crashed: %v", s, P))
        }
        <-q
      }()
      eC <- fn(s)
    }(s)
  }
  for len(q) > 0 {
  }
}

I am aware that there is a number possibilities to make sure the number of concurrent running processes remains stable including the use of patterns to loop with a partial number of wait groups or monitoring a running counter and the like. None the less i think such use of channels is a particular elegant way to manage this problem.

GO: passing Arrays & Slices to functions

2. September 2015AllgemeinAndreas

When passing Arrays as parameters to functions some specialities need to be cared of. Go’s documentation stated, that arrays will be copied.

You would assume, the following function would, according to the steps, work like this: copy the array (when passing), sort the copy and return the median without changing the original array ‚l‘.

see go playground



package main

import (
        "fmt"
        "sort"
       )

func arrayMedian(list []int) (median int) {
   lenAR := len(list)
   if lenAR == 0 {
      return 0
   }
   sort.Ints(list) 
   if lenAR%2 == 0 {
      return (list[lenAR/2-1] + list[lenAR/2]) / 2
   }
   return list[lenAR/2]
}

func main(){
   l := []int{0,3,4,1,0,9,6,3,9,8}
   fmt.Println(l, arrayMedian(l), l)
}

Surprise – if looking at the output, you’ll find the original array ‚l‘ is sorted now.

Why? I guess, passing l by ‚arrayMedian(l)‘ didn’t pass the original array (or generates a copy) but passes a slice over ‚l‘ (rsp: a copy of the slice over ‚l‘), what actually means, that pointers to the ‚l’s list members are passed. The manipulation (here: sorting) is performed on the original array.

What you need to do is this instead:


func arrayMedian(list []int) (median int) {
   lenAR := len(list)
   if lenAR == 0 {
      return 0
   }

   ar := make([]int, lenAR)
   copy(ar, list[:])

   sort.Ints(ar)
   if lenAR%2 == 0 {
      return (ar[lenAR/2-1] + ar[lenAR/2]) / 2
   }

   return ar[lenAR/2]
}

Though you produce a copy manually, sort the copy and return the result by leaving the original array ‚l‘ unchanged.

How to include binary data into HTML5

6. Juli 2015AllgemeinAndreas

… without getting cross-site-scripting problems on local files / server less testing.

For some 3D data representation i needed some MB binary 3D data to be included in the HTML side for further use in threejs. The file should be useful for standalone opening after email-sending – loading data from a fileserver wasn’t a an option though. On the other hand writing 8bit data points into asciitext wasn’t an option either, because such file fast becomes to big to be handled properly: each datapoint 8bit needs to be represented by 1-3bytes plus separator or by 3 bytes fixed length and both would pump up the file size. Another option would be a base64 encoding of the raw binary data which also might be compressed (zipped) before encoding resulting in an about 1.3 larger b64 asciitext, that can be unzipped and parsed to regain the data. This solution is smart but involves „unzipping to get the asciitext“ meaning that there has to be imported/bundled a fast and reliable javascript zip-library.

This is what i finally came up with:
1. write the 8bit binary data into a png file
2. base64 encode the resulting run-length compressed png file
3. use <img src="data:image/png;base64,...."> to include that into the html5 file
4. create a hidden canvas of appropriate dimension and use context.drawImage and context.getImageData to extract the binary data which is now stored in an Uint8ClampedArray
5. calculate the needed Float32Array(s) for threejs from that Uint8ClampedArray

Note on step 1: you’ve the choice to use all 24bit of the pixel matrix in png (four 8bit binary data -> 1 pixel in png) and have some data length information elsewhere in your javascript to manage padding problems that might come up eventually. The lazy way is to encode only 1 Byte per pixel (grayscale png) and rely on the run length compression to deal with the resulting duplications. In the latter case, make sure you extract each forth value from the canvas context.getImageData method result.

This python code will provide you with the base64 encoded png data:


from PIL import Image
import base64

fle = file("testdata.bin") # binary data: 12960000 bytes
im = Image.frombytes('L', (3600,3600), fle.read(12960000))
im.save("testdata.png")

b64 = base64.b64encode(file('testdata.png').read())
b64 = '<img id="dimg" src="data:image/png;base64,%s">'%b64 
file('testdata64.html','wb').write(b64).close()

a faster bloomfilter for go/golang

6. März 2015AllgemeinAndreas

a bloom filter is a structure that is designed to register bit pattern in a way, you can securely say the bit pattern you present to the bloom filter is unique and unknown to the filter if the filter returns that. Unfortunately you cannot invert this phrase: if the bloom filter response marks the bit pattern to be included („known“) to the filter, there is a certain possibility that this is factual wrong.

To get an idea about bloomfilters and how they work have a look here:
https://en.wikipedia.org/wiki/Bloom_filter
and here: https://de.wikipedia.org/wiki/Bloom_filter
and do not miss the great links provided at the end of the articles.

There is a range of bloom filters implemented in go/golang (see https://github.com/search?utf8=✓&q=bloom+filter+language%3AGo&type=Repositories&ref=advsearch&l=Go).

This said i had my own ideas about what is fast in bloom filters and the result is this:
https://github.com/AndreasBriese/bbloom

go / golang (super) fast PRNG: BREEZE

6. Januar 2015AllgemeinAndreas

i became unhappy with the usual go PRNGs including salsa and the like. Despite knowing about being foolish to take such a task i did create a deterministic chaos based random generater (CBRNG) that produces random bits (tested with the NIST package) and is faster than go’s standard random generator, salsa20 and of course way faster than go’s crypto/random package.

you may find this and a lot of information about the how and why and however else and speed …
at my github site: https://github.com/AndreasBriese/breeze
and there is a pdf summary too: https://github.com/AndreasBriese/breeze/blob/master/breeze.pdf

!! Do not oversee the fact, it is not so far cryptoanalysed by any third party !!

Hash mich – Nachtrag zu PRNGs als Hashgeneratoren

4. August 2014AllgemeinAndreas

In Ergänzung zu den Hashfunktionen hier

Auch Pseudo-Random-Number-Generatoren PRNG können grundsätzlich dazu eingesetzt werden, Hashwerte zu erzeugen.

PRNGs erzeugen ausgehend von einem Seed eine feste Abfolge von zufällig verteilten Zahlen, die man dann zu einem Hashwert beliebiger Länge kombinieren kann. Ein Problem können sehr kurze Strings sein, die gehasht werden sollen, da, wie gesagt, erst eine ausreichend großer Seed nötig ist, um den PRNG zu initialisieren.

SipHash https://131002.net/siphash/ von Jean-Philippe Aumasson und Daniel J. Bernstein (der u. a. Salsa20 PRNG für eine kryptologische Stream-Verschlüsselung entwickelt hat, ist ein solcher PRNG, der für kurze Eingaben optimiert wurde. SipHash ist nach Angaben der Verfasser etwa genauso schnell wie MurmurHash3.

Update:
SipHash hat mich überzeugt. Es ist schnell und anscheinend ziemlich Kollisionsresistent. Ich konnte BruteForce jedenfalls keine Kollisionen für strings mit Länge 8 und 16 finden.

varCode

CodeSnippets Python, JS, LaTex, Go/Golang