rand_junk

few times i've found a need of getting large amounts (tens..hundreds of gigabytes) of pseudo-random data, where randomness is not as important as speed of it's generation.

this software is on revised-BSD license.

common use cases

typical example is preparing hard drive for an encryption. using /dev/urandom with reads about 8MB/s takes ages to fill modern-sized drives, and initial filling of them is just a way to prevent easy estimation where the data really is.

another good example is checking/comparing the compression level of different algorithms. just render few random files and try to compress them with different packers.

testing high-speed networks bandwidth can be a case too. thanks to this you minimize influence of compression, that might be provided by underlying protocols. after sending and receiving such a data you can check their checksums as well, to check if it works fine.

tool and usage

this is where this program comes in – it generates pseudo-random junk to stdout at the speed of about 0.7GB/s, on a casual PC. this is sufficient to saturate modern drives, by the the order of magnitude.

usage is simple. to build just type 'make'. Makefile is prepared to handle gcc 4.6 and gcc 4.7, since program uses C++11 features. to generate 100GB file of pseudo-random content just type:

./rand_junk.out 1001024 > my_rand_file.bin

there is also a simple tool that shows how much of a different characters there was on an input (i.e. distribution of output bytes):

./check.out < my_rand_file.bin

note that 'check' tool is just a simple checker and was never optimized, thus is MUCH slower than 'rand_junk'.

how does it work

the idea is simple – it reads 1MB of pseudo-random data from the /dev/urandom and launches two threads:

  1. constantly reading new content from /dev/urandom
  2. doing some pseudo-random modifications at a high speed (xoring with random values, etc…)

at the same time main thread is constantly outputting buffer's content on the stdout, until specified amount of data is produced. to make data even more equally distributed it also does xor of data with the previous content, in each thread, before publishing. all of this happens in parallel (separate threads), thus making output “random enough” for many typical usages, where huge amount of pseudo-random data is needed.

download

you can download rand_junk and do what ever you want with it. :) current version is v2.0.1.

prjs/rand_junk.txt · Last modified: 2013/05/17 19:08 (external edit)
Back to top
Valid CSS Driven by DokuWiki Recent changes RSS feed Valid XHTML 1.0