6502.org Forum  Projects  Code  Documents  Tools  Forum
It is currently Sun May 12, 2024 5:51 am

All times are UTC




Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: Tue Jan 03, 2023 9:16 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1373
So, as the heading states... no more track/sector accesses.

As DOS/65 was a rewrite in 6502 assembly language to perform the same functions as CP/M, it was also based on accessing storage via track and sector parameters. As CP/M was based on the old IBM 3740 diskette format, the record size was made standard at 128-bytes, the sector field size from the 3720 diskette format.

As storage devices started to standardize on a 512-byte sector size (even back in 1981), using such storage devices with CP/M and DOS/65 required that blocking/de-blocking be performed. Moving further forward, most modern storage devices no longer use cylinder/head/sector (CHS) access, but have moved to a Logical Block Address (LBA) scheme. This certainly makes more sense.

Back to DOS/65... the Primitive Execution Module (PEM, equivalent to CP/M's BDOS) uses Records internally, and translates those to track and sector calls based on the Disk Control Block (DCB) parameters. PEM then makes several calls to the SIM, which takes the track and sector calls, and translates them back to Records to access newer block devices such as Compact Flash, IDE or any other block device. Needless to say, this is just overhead processing within PEM and SIM that has no real advantage.

I decided to go thru PEM once again, and isolated the calls and routines that deal with translating records to track/sector information and either modified or re-wrote them to use record-level calls only. I also ended up making a small modification to the Disk Control Block (DCB) format. It's mostly the same, except two changes: The second field now defines the number of records per native disk block and the third field now defines the number of reserved records (which must be a multiple of the second field, e.g., 4 in the case for block devices which are 512-byte).

As for SIM, I rewrote the entire disk access section from scratch. SIM now uses no page zero space, is quite a bit smaller than before and technically should be a bit faster. I also added an additional field which defines a partition offset. This allows the DOS/65 section of the disk to start anywhere. This is a 4-byte long word parameter which is added to the calculated LBA for all block accesses. There's also another table of drive offsets that define the offset from the beginning of the partition for each logical drive defined. My default is to configure all 8 available drives at their maximum of 8MB per drive.

I currently have a beta version of this running on my original C02 Pocket SBC. The full code size, which includes 4KB of allocation maps for 8 drives, is 10KB with 6KB being for CCM, PEM and SIM. SIM still calls the C02BIOS for all device accesses. While I've not yet done any performance measurements, I have an 8 drive configuration running from a pure RAM loaded DOS/65 build, which really limits the available RAM (now a mere TEA size of 20KB). Still, I've managed to get the core utilities moved over and the assembler with most of the V2.1 apps. They've all assembled correctly and are running as well, provided you don't overrun the available memory.

I need to do some more serious testing on this, but that will be limited until I get back to my usual lair with the rest of my toys and such... the real testing will begin when I get it running on the 3.3V prototype SBC with 56KB of RAM and a 6GB Microdrive and the NXP SC28L92 DUART. So far... it seems quite good, but testing has been limited and such, not to mention I'm not yet reloading CCM and PEM via the warm boot call.... so I expect it to crater once in a while, especially if CCM gets overwritten (it happened when I assembled the basic compiler... as CCM was clobbered).

Once I get a solid version running, I'll post back and make the code available. As an FYI, ALL of the code makes a lot of use of the CMOS instructions and addressing modes, which also helps get the reduction in code size as well.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 04, 2023 7:41 am 
Offline

Joined: Wed Jan 01, 2003 6:32 pm
Posts: 32
I agree 100% with your analysis and your LBA approach. I did the same in my CPM-65.

I suggest to check the allocation map approach. In my case, i use a single 256 byte allocation map for all drives. This gives max. 8 MB disk space with admittedly very large logical sectors of 4kB.

Using a single allocation map means, that you can have only one drive open for writing at all times and must rebuild the map if you change the drive or the diskette for writing a new block. The performance penalty for this is surprisingly small, because this happens not very often. File shuffling rarely occurs in my environment :-) So i never changed to a multi allocation map approach.

Happy debugging
Dietrich

_________________
My system: Elektor Junior Computer, GitHub https://github.com/Dietrich-L


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 04, 2023 10:00 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8180
Location: Midwestern USA
floobydust wrote:
So, as the heading states... no more track/sector accesses.

Glad to read you are staying on track with this. Let’s hope your modifications don’t derail anything. :D

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 04, 2023 5:38 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1373
Dietrich wrote:
I agree 100% with your analysis and your LBA approach. I did the same in my CPM-65.

I suggest to check the allocation map approach. In my case, i use a single 256 byte allocation map for all drives. This gives max. 8 MB disk space with admittedly very large logical sectors of 4kB.

Using a single allocation map means, that you can have only one drive open for writing at all times and must rebuild the map if you change the drive or the diskette for writing a new block. The performance penalty for this is surprisingly small, because this happens not very often. File shuffling rarely occurs in my environment :-) So i never changed to a multi allocation map approach.

Happy debugging
Dietrich


Thanks Dietrich,

I did take a look-see at your CPM-65 a while back... nice piece of work... very small footprint as well.

As DOS/65 already uses separate allocation maps for each drive, I've opted to stick with this. My current SIM config has 8 drives at 8MB each. I have several test scenarios that copy lots of data between multiple drives. My main development system has 56KB of contiguous RAM and I could easily add another 6KB (currently used by C02 Monitor) to this by changing the PLD configuration. The disk control block for each drive is configured for 2KB block size, so that requires 512-bytes per allocation map. Granted, this totals up to 4KB for allocation maps, but could easily be halved by going to a 4KB block size, which I'll likely try out once the current code development is more complete.

Oddly, initial performance with this beta version shows as being slower than the 3.03 version (my github account) running on the same hardware. I'm not yet isolating the bottleneck, so I'm thinking about writing a SIM exerciser program that can perform some benchmarking against the SIM disk calls. I have a benchmark timer in my Monitor/BIOS with 10ms resolution and I can use that for accurate timings. Hopefully this can provide better insight as to the performance loss (where there should be a gain). I would note that the current SIM calculates the required LBA and if it's already loaded, it doesn't load it again, which does save some execution time, mostly for record reads.

Despite the current performance hit, the system appears to be quite stable... but much more work to do before it's out of beta...

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Wed Jan 04, 2023 5:48 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1373
BigDumbDinosaur wrote:
floobydust wrote:
So, as the heading states... no more track/sector accesses.

Glad to read you are staying on track with this. Let’s hope your modifications don’t derail anything. :D


Yes, more free time while I'm out of town, albeit with very limited hardware resources for testing and only a laptop for all of the work (I miss the big screen and the rest of the toys).

Fortunately, no derailments... yet, but hey, lots more to do, so the opportunity to crash it is still quite high :shock:

I need to get a handle on the performance hit, which is unexpected... I guess there's always the chance that other changes I made to PEM have managed to cause performance impacts within that module, but the routine that calculates and passes the record number to SIM is much smaller and faster than the track/sector code it replaced. The SIM code is also smaller than the track/sector code it replaced... hence the unexpected hit!

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 05, 2023 1:36 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1373
Well, my bad...

The basic design uses a core routine that checks the record required (for read or write), calculates the correct LBA for it and ensures that it's loaded in the host buffer. Just as long as the next required record maps to the same LBA (as there are 4 records per LBA) it doesn't reload it, which improves the read performance.

I forgot to copy the current LBA to it's saved values, which are used as a comparison before loading a new LBA as needed. Needless to say, every record access was reading the calculated LBA before grabbing the 128-byte record. Fortunately, that's fixed and the overall performance is easily on par with my older 3.03 version, maybe a slight tad better. I did manage to trim some of the code down a bit in the process and also determined that my changes to PEM (beyond the record based ones) were not causing any disk I/O performance issues.

So, time for yet more testing... where the bulk of it will be done next week once I'm back to my normal location. Overall I think this is a good update for DOS/65. I'm still thinking about delayed writes of changed LBAs to the disk to improve performance, but that will require some changes to the BIOS and it's interrupt structure, hence a later update.

Oddly, with the exception of overwriting CCM (only 20KB of TEA space), I've not needed to reload CCM or PEM when running multiple applications, which include SD (super directory), Xmodem transfer, file copy, file compare, the assembler and pretty much anything else. Still, the goal is to have this version loaded into the reserved records on the first drive letter and be bootable.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 05, 2023 7:36 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8180
Location: Midwestern USA
floobydust wrote:
So, time for yet more testing... where the bulk of it will be done next week once I'm back to my normal location. Overall I think this is a good update for DOS/65. I'm still thinking about delayed writes of changed LBAs to the disk to improve performance, but that will require some changes to the BIOS and it's interrupt structure, hence a later update.

Implementing a buffer pool shouldn't be too difficult, other than finding sufficient RAM in which to place it and a table to manage the pool.

If it were me doing it, I’d buffer disk blocks, not individual CP/M records. If records are buffered, then when one becomes “dirty,” two disk accesses are required to flush the record, as it would be necessary to fetch the block in which the dirty record belongs, re-write the record and then re-write the block. On the other hand, if blocks are buffered and a record in a block that is already buffered is re-written, you only need one disk access to re-write the block.

Either way, the interesting thing will be in working out how to periodically flush dirty buffers. Since CP/M is a uni-tasking environment, you don't have the luxury of having a process automatically run at periodic intervals to sync your buffers as they age.

One method would be to have some code (“buffer flusher”) that is executed when the PEM gets ready to return to a calling program. That code would scan the buffer pool table looking for dirty buffers that have aged out. When one is found, it would be written and marked clean. Other than the time required to write buffers, this process should have little effect on performance if the buffer pool table is properly organized and buffer age is stored in a binary format (not normal time-of-day format, which is too computationally-expensive).

In order to avoid having the buffer flusher run on every return from the PEM, a timer field somewhere in RAM should be decremented by your IRQ handler at one second intervals, and when it reaches zero, it would be reset and a flag would be set indicating it’s time to flush buffers. The timer would remain stopped until the flush-the-buffers flag has been cleared in the foreground.

Meanwhile, on each exit from the PEM, the flush-the-buffers flag would be checked and if set, the buffer-flusher code would be executed. Once that is done, the flag would be cleared. The IRQ handler, which would be polling the flag at periodic intervals, would see that it has been cleared and would restart timer.

There would also need to be a command to tell DOS/65 to immediately flush all buffers. Obviously, doing so would be necessary before powering down the system.

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Thu Jan 05, 2023 10:57 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1373
BigDumbDinosaur wrote:
floobydust wrote:
So, time for yet more testing... where the bulk of it will be done next week once I'm back to my normal location. Overall I think this is a good update for DOS/65. I'm still thinking about delayed writes of changed LBAs to the disk to improve performance, but that will require some changes to the BIOS and it's interrupt structure, hence a later update.

Implementing a buffer pool shouldn't be too difficult, other than finding sufficient RAM in which to place it and a table to manage the pool.

If it were me doing it, I’d buffer disk blocks, not individual CP/M records. If records are buffered, then when one becomes “dirty,” two disk accesses are required to flush the record, as it would be necessary to fetch the block in which the dirty record belongs, re-write the record and then re-write the block. On the other hand, if blocks are buffered and a record in a block that is already buffered is re-written, you only need one disk access to re-write the block.

Either way, the interesting thing will be in working out how to periodically flush dirty buffers. Since CP/M is a uni-tasking environment, you don't have the luxury of having a process automatically run at periodic intervals to sync your buffers as they age.

One method would be to have some code (“buffer flusher”) that is executed when the PEM gets ready to return to a calling program. That code would scan the buffer pool table looking for dirty buffers that have aged out. When one is found, it would be written and marked clean. Other than the time required to write buffers, this process should have little effect on performance if the buffer pool table is properly organized and buffer age is stored in a binary format (not normal time-of-day format, which is too computationally-expensive).

In order to avoid having the buffer flusher run on every return from the PEM, a timer field somewhere in RAM should be decremented by your IRQ handler at one second intervals, and when it reaches zero, it would be reset and a flag would be set indicating it’s time to flush buffers. The timer would remain stopped until the flush-the-buffers flag has been cleared in the foreground.

Meanwhile, on each exit from the PEM, the flush-the-buffers flag would be checked and if set, the buffer-flusher code would be executed. Once that is done, the flag would be cleared. The IRQ handler, which would be polling the flag at periodic intervals, would see that it has been cleared and would restart timer.

There would also need to be a command to tell DOS/65 to immediately flush all buffers. Obviously, doing so would be necessary before powering down the system.


All good points.... so far, the SIM code basically keeps a 512-byte LBA in a default buffer in low RAM. The SIM routine for reading transfers some pointers around and strips off the lower 2-bits of the lower order byte, which is the PEM Record offset (128-bytes) to be read and moved to memory at a location that PEM provides thru a separate call. Once the Read routine saves the 2-bit record offset, it calls a routine that calculates the LBA from the PEM Record and active drive number, then checks to see if it's in the LBA buffer. If not, it loads it and returns back to the SIM Read routine, which only needs to move one of 4 records from the buffer to memory. Then again, if the LBA that holds the requested PEM Record is already loaded, the routine just returns to the SIM Read routine. Needless to say, this is what I call a "cheap cache" as each LBA read has 4 PEM Records. This approach works fine... and once I got the code right, the read performance was good overall.

The problem from a performance view is writing the PEM Records. Right now, I'm using the same core routine to ensure the correct LBA is loaded into the buffer, then the SIM Write routine moves the updated Record into one of the 4 Record offsets in the LBA. Once this is done, I write the LBA out to disk. This becomes the bottleneck. I did some simple testing attempting to get a similar approach to reading records, but alas, it hasn't worked out... managing a dirty block flag to ensure an updated LBA is written first, it has some odd issues, meaning things aren't getting written out properly, or at all.

There is a chance that the onboard write-cache on the Microdrive (other system) might be more useful in enhancing the overall performance, but my end goal is still to get a single LBA buffered for record writes and hopefully improve the performance, especially for writing larger files, such as a file COPY program. Still a ways to go on this one... but tomorrow is a travel day... so timing is everything!

Oddly, the default blocking/deblocking and track/sector code performs better on write performance, so I need to dig thru that yet again... as it does have some flag bits depending on the type of write it's doing. It appears that it is managing to hold some record writes before writing the LBA to disk.

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Sun Feb 05, 2023 8:18 pm 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1373
I now have a bootable version running from the Microdrive on my prototype 3.3V system. The Monitor boot sequence is just hard-coded to load 20 blocks (10KB) from a section on the drive into the starting memory location and then JUMP to the SIM cold start to kick it off. Overall it runs quite well... this is a Beta version however, so it's not quite perfect. Still more work to do on SIM... plus many of the older and some newer utilities don't exit the same (meaning a warm boot), so they need to be cleaned up at some point.

The good news... it boots, runs well and you can assemble all of the utilities successfully. This includes the Basic compiler and runtime, which also work. Overall, I think CCM and PEM are pretty solid, SIM needs some better coding to get the disk write performance improved. The DCB is slightly changed for reserved records, vs reserved tracks. Other changes are noted with heavy commenting in the source files.

Note that a typical sysgen does not work here, as all DOS (CCM, PEM, SIM) related code uses a lot of CMOS instructions and addressing modes. The standard assembler is 6502 only. I've added some code to the Microdrive utility program to be able to write a section of RAM out to any starting location on the drive, hence a manual process for the time being. All of the WDC Tools assembled code creates a Motorola S19 record file, which can be downloaded via the C02 Monitor Xmodem-CRC support.

Here's a zip file that includes everything:

Attachment:
DOS65-V320.zip [779.32 KiB]
Downloaded 26 times


Have fun!

_________________
Regards, KM
https://github.com/floobydust


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 06, 2023 1:58 am 
Offline
User avatar

Joined: Thu May 28, 2009 9:46 pm
Posts: 8180
Location: Midwestern USA
floobydust wrote:
I now have a bootable version running from the Microdrive on my prototype 3.3V system...

Sounds like you’ve been busy with the assembler. :D

Let’s say I've built a 65C02 machine with mass storage that is addressed via LBAs and natively reads and writes 512 byte blocks. Let’s further say this hypothetical machine has 48 KB of RAM and firmware that provides a console I/O API, as well as a disk I/O API. What other facilities would your DOS/65 port need in order to run? Other than read and write, are there other disk operations you are using?

_________________
x86?  We ain't got no x86.  We don't NEED no stinking x86!


Top
 Profile  
Reply with quote  
PostPosted: Mon Feb 06, 2023 7:24 am 
Offline
User avatar

Joined: Tue Mar 05, 2013 4:31 am
Posts: 1373
BigDumbDinosaur wrote:
floobydust wrote:
I now have a bootable version running from the Microdrive on my prototype 3.3V system...

Sounds like you’ve been busy with the assembler. :D

Let’s say I've built a 65C02 machine with mass storage that is addressed via LBAs and natively reads and writes 512 byte blocks. Let’s further say this hypothetical machine has 48 KB of RAM and firmware that provides a console I/O API, as well as a disk I/O API. What other facilities would your DOS/65 port need in order to run? Other than read and write, are there other disk operations you are using?


In short, that should be sufficient to run DOS/65. Note that the SIM module is what interfaces the hardware routines that provide basic character I/O for a console and basic disk access at a block level for read and write.

As Richard's code has been around for many years, he has provided documentation on how to interface DOS/65 to generic hardware. You need to provide a simple set of functions for console I/O and Disk I/O, and not much else.
As DOS/65 was patterned after CP/M, consider CCM and PEM as the two main modules of code (CCP and BDOS) and SIM as the equivalent of BIOS to interface to your hardware.

The C02BIOS which is included has a set of functions that access the console and disk. The SIM module just interfaces to the C02BIOS.

I've included some of Richard's documentation here for reference.

Hopefully this helps. The CCM and PEM module can be used as is, no changes needed. The SIM module would need to be changed to use your existing routines to access your hardware.


Attachments:
DOS-65 System Interface Guide B.pdf [253.12 KiB]
Downloaded 23 times
DOS-65 System Description B.pdf [160.09 KiB]
Downloaded 26 times
DOS-65 Bringing the System Up.pdf [105.72 KiB]
Downloaded 27 times

_________________
Regards, KM
https://github.com/floobydust
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 5 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: