Let's make an OS together! [KERNEL ALREADY DONE!]

Gradius2000 · Post by **Gradius2000** » Thu Aug 31, 2017 1:45 am

Hello, everybody! I am the creator of a github page here:
https://github.com/GradiusLover2000/CJWJudyOS
This page is the source site for a brand-new nanokernel that switches tasks through interrupts! It is still in released-Alpha stage and I need some people to test it with me. I know it may sound like something I can do my self, but I'm just trying to get the page popular

Any way, I can't wait to see your responses and usage for the kernel.asm on the page! Check out the github link above for more info.
Gradius2000

GARTHWILSON · Post by **GARTHWILSON** » Thu Aug 31, 2017 3:22 am

Hi, and welcome. We wish you success.

I just saw this and won't be able to give it detailed attention for a while yet, but I might make some comments anyway, based on a quick look.

The first one is that some comments in the code would be welcome. Actually, you've done a couple of things that I always like for readability.
In the setup, you'll want to set the interrupt-disable flag (do the SEI) before writing to the IRQ vector, not after.
It looks like you could shorten the loops by starting X at the higher number, then DEX'ing and using the implied, automatic compare-to-zero instruction that's included in DEX, for your BNE's.
Similarly, there's no need to do the CMP #0 right after an LDA. It's an automatic part of the LDA anyway, so it's redundant.
At "call," after the LDA taskp, instead of CMP #1...CMP #2...etc., you could do DEA each time before the BEQ's instead, saving a byte each time.
The task stretches (for example task0r) are short enough each time however that you could use BNE instead and branch around the taskXr to the next DEA test, eliminating all the jumps.
To shorten the source code (although it won't change the final assembled code), you could use macros to do the repetitive parts, for example
Code: Select all
```
        LDA  $35
        PHA
        LDA  $2D
        PHA
        LDA  $25
        PHA
```
could be replaced with
Code: Select all
```
        PUSH3  $35, $2D, $25
```
(BTW, what are these constants? They should probably have meaningful names.)
and
Code: Select all
```
        LDA  $0D
        LDX  $15
        LDY  $1D
```
could be replaced with
Code: Select all
```
        LDAXY  $0D, $15, $1D
```
The PLP, RTS pair can be shortened to RTI if you adjust the earlier-pushed address by 1.
Code: Select all
```
        LDA  #$00
        LDX  #$00
        LDY  #$00
```
can be shortened by a couple of bytes with LDA #0, TAX, TAY. Actually, the same LDAXY macro mentioned above could be written to do it, watching for the numbers being the same, and using conditional assembly to lay down the TAX and/or TAY op codes if appropriate.

You might get the whole thing down to one page, and make the resulting code more compact as well.

White Flame · Post by **White Flame** » Thu Aug 31, 2017 3:26 am

Well, it kind of goes to the definition of what a "kernel" is, as to how much scope is included for it to be considered done.

You do have a task switcher, but I don't see any stack management code which is pretty necessary for preemptive task switching. While 6502 kernels often don't do much memory management (and here you've chosen static partitioning), figuring out how to divvy up pages 0 and 1 between the tasks is still usually the job of the kernel (either software-provided or designs for programmer discipline), as well as if you're going the micro/nanokernel route, how message passing & I/O will be managed.

Because the task switcher is called quite often and is a steady source of overhead, you should look at collapsing the code down heavily. The big CMP/BEQ tree will take dozens of cycles per switch, so if you could simply use offsets into your tables instead of branching to variants of code, you'd get rid of both cycles & bytes. While this technically is optimization, it's pretty important to make this tight early on, because it can affect the structure of task state variables which will have wide-ranging design effects.

Overall, I think a memory map of zeropage & comments would be necessary to understand the other specifics of the code so far. For instance, the purging/locks/stopping/loading features mentioned in the changelog aren't easily decipherable from the code as-is, to a new reader. But honestly without any stack handling (unless I'm missing it) I'm not sure what you have would actually work for any tasks which call subroutines or otherwise use the stack. However, it's certainly a reasonable beginning for experimentation. By writing more complex tasks to switch, you'll bump into the issues you need to deal with. If you can think about some of the issues above first, that could prevent you from baking in constraining designs too early.

Edit: Just noticed another issue. When an IRQ hits and it branches down to 'cont', the X register is being used without ever being initialized. If the tasks are required to keep a certain value in X that might be okay (though more difficult to program for and less robust), but otherwise it might be a bug.

BigDumbDinosaur · Post by **BigDumbDinosaur** » Thu Aug 31, 2017 4:09 am

Gradius2000 wrote:

Hello, everybody! I am the creator of a github page here:
https://github.com/GradiusLover2000/CJWJudyOS
This page is the source site for a brand-new nanokernel that switches tasks through interrupts! It is still in released-Alpha stage and I need some people to test it with me. I know it may sound like something I can do my self, but I'm just trying to get the page popular

Any way, I can't wait to see your responses and usage for the kernel.asm on the page! Check out the github link above for more info.
Gradius2000

Most wouldn't consider me a newbie when it comes to assembly language, especially code that works against the bare metal. That said, I haven't got a clue as to what is going on in your program. A comment would die of sheer loneliness in your source code. Not helping matters, you've got all sorts of "magic numbers" and "magic addresses" in there, with no obvious indication of what they mean and what they do.

Also, please take note of Garth's comments concerning programming style, optimizations, etc. Also, you should consider writing this for use with the 65C02, not the (obsolete) 6502. The former offers more instructions, some very useful (e.g., PHX, PHY, etc.), more addressing modes, and doesn't have the latter's errata, such as the JMP ($xxFF) bug.

First step, in my opinion, is for you to thoroughly comment your source code and define what all those magic numbers mean. Also useful would be a narrative that describes the manner in which your algorithm works and any "gotchas" the would-be user should know about.

Technically speaking, you don't really have a kernel in the accepted sense of the term because there don't appear to be any provisions for I/O support. Have you got any plans to incorporate such support into your program?

BigEd · Post by **BigEd** » Thu Aug 31, 2017 6:21 am

Welcome Gradius2000! Looks like you've got some code reviewers for free - hope it helps! Well done for getting this far and thanks for sharing your project.

Gradius2000 · Post by **Gradius2000** » Thu Aug 31, 2017 11:32 am

BigDumbDinosaur wrote:

Gradius2000 wrote:

Hello, everybody! I am the creator of a github page here:
https://github.com/GradiusLover2000/CJWJudyOS
This page is the source site for a brand-new nanokernel that switches tasks through interrupts! It is still in released-Alpha stage and I need some people to test it with me. I know it may sound like something I can do my self, but I'm just trying to get the page popular

Any way, I can't wait to see your responses and usage for the kernel.asm on the page! Check out the github link above for more info.
Gradius2000

Most wouldn't consider me a newbie when it comes to assembly language, especially code that works against the bare metal. That said, I haven't got a clue as to what is going on in your program. A comment would die of sheer loneliness in your source code. Not helping matters, you've got all sorts of "magic numbers" and "magic addresses" in there, with no obvious indication of what they mean and what they do.

Also, please take note of Garth's comments concerning programming style, optimizations, etc. Also, you should consider writing this for use with the 65C02, not the (obsolete) 6502. The former offers more instructions, some very useful (e.g., PHX, PHY, etc.), more addressing modes, and doesn't have the latter's errata, such as the JMP ($xxFF) bug.

First step, in my opinion, is for you to thoroughly comment your source code and define what all those magic numbers mean. Also useful would be a narrative that describes the manner in which your algorithm works and any "gotchas" the would-be user should know about.

Technically speaking, you don't really have a kernel in the accepted sense of the term because there don't appear to be any provisions for I/O support. Have you got any plans to incorporate such support into your program?

I do have plans for I/O support in a standard library as well as a few drivers for peripheral chips. And by the way, the code is for 65c02, because there is some addressing modes in the code that aren't allowed by 6502.
As for comments, I know my code very well, so I will add those today (8/31/17)

Gradius2000 · Post by **Gradius2000** » Thu Aug 31, 2017 12:28 pm

GARTHWILSON wrote:

Hi, and welcome. We wish you success.

I just saw this and won't be able to give it detailed attention for a while yet, but I might make some comments anyway, based on a quick look.

The first one is that some comments in the code would be welcome. Actually, you've done a couple of things that I always like for readability.
In the setup, you'll want to set the interrupt-disable flag (do the SEI) before writing to the IRQ vector, not after.
It looks like you could shorten the loops by starting X at the higher number, then DEX'ing and using the implied, automatic compare-to-zero instruction that's included in DEX, for your BNE's.
Similarly, there's no need to do the CMP #0 right after an LDA. It's an automatic part of the LDA anyway, so it's redundant.
At "call," after the LDA taskp, instead of CMP #1...CMP #2...etc., you could do DEA each time before the BEQ's instead, saving a byte each time.
The task stretches (for example task0r) are short enough each time however that you could use BNE instead and branch around the taskXr to the next DEA test, eliminating all the jumps.
To shorten the source code (although it won't change the final assembled code), you could use macros to do the repetitive parts, for example
Code: Select all
```
        LDA  $35
        PHA
        LDA  $2D
        PHA
        LDA  $25
        PHA
```
could be replaced with
Code: Select all
```
        PUSH3  $35, $2D, $25
```
(BTW, what are these constants? They should probably have meaningful names.)
and
Code: Select all
```
        LDA  $0D
        LDX  $15
        LDY  $1D
```
could be replaced with
Code: Select all
```
        LDAXY  $0D, $15, $1D
```
The PLP, RTS pair can be shortened to RTI if you adjust the earlier-pushed address by 1.
Code: Select all
```
        LDA  #$00
        LDX  #$00
        LDY  #$00
```
can be shortened by a couple of bytes with LDA #0, TAX, TAY. Actually, the same LDAXY macro mentioned above could be written to do it, watching for the numbers being the same, and using conditional assembly to lay down the TAX and/or TAY op codes if appropriate.

You might get the whole thing down to one page, and make the resulting code more compact as well.

Thank you so much for your optimizations! It really helped me when I thought it was already optimized,

I have committed all that you said to the code. It should be better now WITH COMMENTS!

Macros, however, are very confusing on the assembler I'm using, so I won't use them yet.

Gradius2000 · Post by **Gradius2000** » Thu Aug 31, 2017 12:34 pm

White Flame wrote:

Well, it kind of goes to the definition of what a "kernel" is, as to how much scope is included for it to be considered done.

You do have a task switcher, but I don't see any stack management code which is pretty necessary for preemptive task switching. While 6502 kernels often don't do much memory management (and here you've chosen static partitioning), figuring out how to divvy up pages 0 and 1 between the tasks is still usually the job of the kernel (either software-provided or designs for programmer discipline), as well as if you're going the micro/nanokernel route, how message passing & I/O will be managed.

Because the task switcher is called quite often and is a steady source of overhead, you should look at collapsing the code down heavily. The big CMP/BEQ tree will take dozens of cycles per switch, so if you could simply use offsets into your tables instead of branching to variants of code, you'd get rid of both cycles & bytes. While this technically is optimization, it's pretty important to make this tight early on, because it can affect the structure of task state variables which will have wide-ranging design effects.

Overall, I think a memory map of zeropage & comments would be necessary to understand the other specifics of the code so far. For instance, the purging/locks/stopping/loading features mentioned in the changelog aren't easily decipherable from the code as-is, to a new reader. But honestly without any stack handling (unless I'm missing it) I'm not sure what you have would actually work for any tasks which call subroutines or otherwise use the stack. However, it's certainly a reasonable beginning for experimentation. By writing more complex tasks to switch, you'll bump into the issues you need to deal with. If you can think about some of the issues above first, that could prevent you from baking in constraining designs too early.

Edit: Just noticed another issue. When an IRQ hits and it branches down to 'cont', the X register is being used without ever being initialized. If the tasks are required to keep a certain value in X that might be okay (though more difficult to program for and less robust), but otherwise it might be a bug.

Can you tell me where you see the bug? Which line is it?

White Flame · Post by **White Flame** » Thu Aug 31, 2017 12:39 pm

Gradius2000 wrote:

Can you tell me where you see the bug? Which line is it?

Lines 254, 256, 258, and 260. The X register is used without being initialized in the interrupt code. (again, unless it's assumed that the interrupted code has kept X valid for the IRQ code's purposes (which would be kind of a brittle assumption anyway, but I mention it just for completeness)).

Gradius2000 · Post by **Gradius2000** » Thu Aug 31, 2017 12:42 pm

White Flame wrote:

Gradius2000 wrote:

Can you tell me where you see the bug? Which line is it?

Lines 254, 256, 258, and 260. The X register is used without being initialized in the interrupt code. (again, unless it's assumed that the interrupted code has kept X valid for the IRQ code's purposes).

Can you reread the code, because I changed it up after optimization; lines 260 and 258 no longer exist...

White Flame · Post by **White Flame** » Thu Aug 31, 2017 12:45 pm

Yeah, that's a hefty shuffle, and doesn't look like it takes that early problematic branch anymore.

Can you explain how you manage sharing the stack between the multiple preempted tasks?

Gradius2000 · Post by **Gradius2000** » Thu Aug 31, 2017 12:47 pm

White Flame wrote:

Yeah, that's a hefty shuffle, and doesn't look like it takes that early problematic branch anymore.

Can you explain how you manage sharing the stack between the multiple preempted tasks?

Sorry, but I haven't added that feature yet, because I don't have my brain wrapped around sharing the stack.
Can someone send me their github username at salcj7400@gmail.com to collaborate with me on stack sharing?

Wait, does memory address $0100 contain a copy of the Stack Pointer register? I think i spotted that in this link: http://6502.org/source/kernels/minikernel.txt
If so, please confirm, because then I can add that feature!

White Flame · Post by **White Flame** » Thu Aug 31, 2017 12:55 pm

I'm pretty sure people here would prefer to continue discussing here. However you decide to deal with the stack depends a lot on what level of safety you want, and how much time you want to spend between task switches.

Some systems divided up the stack into 64 bytes per process, with 4 processes's stacks active/cached at a time in $01xx. Others copied out the used portion of the stack page into a process-specific page on every task switch. I'm sure there are other approaches, too. Non-preemptive multitasking can sometimes let the entire stack drain off before continuing to the next process, or not continue processing a task until everything above it on the stack is finished, depending on the model.

Obviously, there's no processor support for doing such things with the stack, so anything will be fully manual and a "creative" solution.

Edit: The stack pointer is a register in the CPU itself, generally accessed via TXS and TSX. If you want it in $0100, you have to TSX, STX $0100 yourself.

Gradius2000 · Post by **Gradius2000** » Thu Aug 31, 2017 12:59 pm

White Flame wrote:

I'm pretty sure people here would prefer to continue discussing here. However you decide to deal with the stack depends a lot on what level of safety you want, and how much time you want to spend between task switches.

Some systems divided up the stack into 64 bytes per process, with 4 processes's stacks active/cached at a time in $01xx. Others copied out the used portion of the stack page into a process-specific page on every task switch. I'm sure there are other approaches, too. Non-preemptive multitasking can sometimes let the entire stack drain off before continuing to the next process, or not continue processing a task until everything above it on the stack is finished, depending on the model.

Obviously, there's no processor support for doing such things with the stack, so anything will be fully manual and a "creative" solution.

Edit: The stack pointer is a register in the CPU itself, generally accessed via TXS and TSX. If you want it in $0100, you have to TSX, STX $0100 yourself.

Oh my gosh, I completely didn't see that instruction in all my experience! Thank you!

Gradius2000 · Post by **Gradius2000** » Thu Aug 31, 2017 2:29 pm

Okay, everybody! The nanokernel still doesnt have I/O, but you can safely make a task that drives such at $F800. Anyway, the kernel now has 16 Byte x 8 Task Stacks! I hope you guys are glad, too!

Let's make an OS together! [KERNEL ALREADY DONE!]

How do you think I did? Rate based on stability and design.

Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]

Re: Let's make an OS together! [KERNEL ALREADY DONE!]