If you have an uptime counter with sufficient resolution, can't you just make that part of the temporary file name and not even worry about generating pseudo-random characters?
I have thought about that, but there are some problems with doing so.
The uptime count is processed at 10 millisecond intervals via the jiffy IRQ (triggered by a 100 Hz hardware timer). On a system running at double digit clock speeds, a lot can happen in 10 milliseconds. Consider that my POC V1.3 unit is running at 16 MHz. During the interval between jiffy IRQs, 160,000 clock cycles will have elapsed. That’s more than ample time for a task switch to occur. Here’s a possible scenario:
Immediately after the jiffy IRQ handler has processed the uptime counter and resumed the foreground, interrupted task ‘A’ calls the tempname() kernel API, which generates a pseudo-random filename from the uptime counter.
As tempname() exits back to the calling program after generating the filename, the kernel preempts task ‘A’ so task ‘B’, which has the highest priority of all waiting tasks, can be run.¹
Upon being restarted, task ‘B’ immediately calls tempname(). “Immediately” can be a very short time compared to the interval between jiffy IRQs.²
The filename given to task ‘B’ will be the same as that given to task ‘A’, as the uptime counter will not have changed from the time when task ‘A’ was suspended and task ‘B’ was restarted to when ‘B’ called tempname().
Another consideration is conversion of the uptime to a character string has the same issues as seen with converting any other binary number to a string of alphanumeric characters. Also, at best, the uptime counter can only offer 40 bits of data that changes at a very leisurely pace from the microprocessor’s perspective (once every 10,000 microseconds). 40 bits limits the resulting ASCII string to 13 characters if converted into a number. Converting it to hex is even worse—10 characters maximum. Whether 13 or 10, that’s too short a string for many applications.
On the other hand, I can make as many calls to the PRNG as I want in a very short period of time³, giving me 32 bits of pseudo-random data on each call, from which I can build a string of almost any length. Given that capability, the problem reduces to one of generating ASCII characters from PRN bytes that don’t fall into the ASCII space.
———————————————————— ¹Each time the kernel is about to return control to user space, it will determine if it should do a task switch.
²Timing calculations I have done while sketching out my kernel’s workflow suggest that task preemption will take ~100 microseconds to complete with the 816 running at 20 MHz and no I/O-bound processing involved (i.e., no swapping a sleeping task to disk).
³In most cases, preemption when an API call returns to user space won’t be an issue as far as the caller is concerned—all of its state will be preserved for a later restart. However, there are a few possible cases where preemption upon exit to user space may not be desirable. A call to the PRNG API might be one of those cases.
mktemp creates file (or directory) from template with XX's and return the name. So if it is "atomic" in the sense, that it not may be reentred before finishing, it will not create two same names even when called at the "same" time, as the second atempt will find the existing filemane and generate something else.
This approach is not universal for generating random strings, but it works for random files.
In your scenario in step 1 it generates a file with random name, in step 2 kernel switch the tasks, in step 3 the file already exists, so it have to generate another file(name)
I think, that similar effect could be done with system having some buffer, where the function would put the used random number (timestamp) and add one more digit as count of the same values there, so each call would return different value. And the buffer may be cleared once the timestamp changes, so it do not have be so big. (return from the function would be pair (timestamp, count of the same timestamps used)
(well so long, as timestamp is unique increasing, so it may fail with bad clocks, without RTC, on daylight changing time back or synchronising to external clock forsing step back in time ... but the question is, it it will be real problem at all)
mktemp creates file (or directory) from template with XX's and return the name. So if it is "atomic" in the sense, that it not may be reentred before finishing, it will not create two same names even when called at the "same" time, as the second atempt will find the existing filemane and generate something else.
Yes, a mktemp() call will (should) result in the creation of a uniquely-named file. However, that wasn’t the gist of my post. I was describing how it might be possible for two unrelated calls to a hypothetical tempname() function that creates filenames from the system’s uptime to end up with two tasks being given the same name. That is the concern, as the creation of any file has to start with assigning it a name.
Quote:
In your scenario in step 1 it generates a file with random name, in step 2 kernel switch the tasks, in step 3 the file already exists, so it have to generate another file(name)
You should re-read my scenario description. No file was created in step 1. All that step was doing was requesting a filename. So when the tempname() call returns in step 3, task ‘B’ won’t know if a file with the same name already exists in the same subdirectory—no attempt to create the file was made by task ‘A’ before it got preempted.
In order to avoid a name collision when it comes time to create the file, the creation process must determine if an identically-named file exists in the target subdirectory—that is, the file-creation function must offer atomicity as an option. For example, in the Linux kernel, the creat() API (create new file) may be called with the O_EXCL flag to guarantee that the call will fail if a file of the same name already exists in the target subdirectory. Without the O_EXCL flag in the call, creat() will “step on” an existing file with the same name. The availability of atomicity would be the means by which the system can compensate for cases in which maketemp() returns identical names.
Yes. "Random" and "Unique" have a lot of similarities, but "Unique" or even "Consecutive" could be the slightly better choice for this use case, IMHO.
In a practical system, the choice between “random” and “unique” would be influenced by the degree of security that is desired. For example, if a temporary file is being created on a closed system solely to hold the results of some intermediate computational steps, a short but unique name would suffice. On a UNIX or Linux system, a common practice is to append the task’s process ID (usually a 16-bit number) to a short character string. Process IDs are unique, so the filename will also be unique among all running tasks.
On the other hand, a randomly-generated filename with a lot of mixed-case and numeric characters would be better on a system that is potentially exposed to external access. Getting such a filename out of an uptime counter (or, for that matter, a process ID) would be difficult. That’s a job for a PRNG.
This is fairly well-worn territory...I do not recommend reinventing the wheel.
In most cases, I don’t as well. However, trying to devise a way to make wheels that run truer can be instructive, even if the final result is that of throwing away your newly-invented wheels and using the old ones. At the least, the reinvention process may cause one to develop an appreciation of the difficulty of making truly round wheels.
In my particular case, I have several areas of potential interest related to generating pseudo-random numbers, random character string generation being one of them. It is an area in which my programming knowledge is weak. So with that in mind, some reinvention seems to be useful to me, as I am bound to learn something.
Something we always have to keep in mind when discussing such a thing such as a PRNG algorithm is the type of hardware on which it will be run. Unlike the workstation on which I am typing this, my POC unit isn’t clocked at several gigahertz and doesn’t have an HPET to produce “noise” (although the counter/timer in the 28L92 can suffice). Also, its natural word size is 16 bits, unlike my workstation, which is 64-bit hardware. So whatever ultimately gets used as a PRNG has to work within those parameters, meaning a balance must be struck between maximum randomness and acceptable performance. If that involves some reinvention, that’s fine.
This is fairly well-worn territory in computer science and mathematics, and I do not recommend reinventing the wheel.
Indeed.
One thing I've found and seen recently is that some PRNGs while you think "cheap, fast and good enough for games" can produce some very surprising results. One case is the Minecraft game. The terrain is "procedurally generated" which is a fancy name for using a random number generator - but sometimes it goes awry with some seeds - e.g. some seeds will cause endless repetition of structures, or total lack of a particular structure. If you're interested there are some nice videos on YouTube about it.
In the Minecraft case, the PRNG algorithm is known and is used by players to generate millions of games to find a particularly "good" (or "bad") seed for the various types of game they wish to play.
Knowing the algorithm and seed in a commercial gambling machine could well be interesting though...