What Sorts of Tools Do You Use to Unit Test Your Code?

load81 · Post by **load81** » Thu Aug 27, 2020 3:21 pm

I am working on a project to disassemble an 8k cartridge. The initial disassembly is done and the resulting source code assembles. The hashes match, so it technically works. But there aren't many macros yet and I'm still a long way from fully understanding the code.

Early on I noticed there were several variants of the cartridge, at least according to the TOSEC database. Even with the cartridge header stripped off several dumps had different sha1 hashes. Some of these will no doubt turn out to be minor dump errors. But, I suspect at least a few of them are bug fixes between production runs. I'd like to sort that all out, if I can.

I read Using a running VICE session for development and realized just how powerful the VICE machine language monitor is. It quickly became clear that, with effort, it could be used for very rigorous unit testing.

So, I'm starting to think about how to use a unit test harness, something like Test Anything Protocol appeals to me. It's text based, human readable, and easy to log. I realize 99% of this would be an effort in higher level scripting, likely with Bash or Perl. I could use TAP to check my macros and various chunks of code. With effort, I also probably integrate the VICE ML monitor too. Technically my assembler, 64tass has .assert and .check functionality but its use is undocumented and discouraged.

Is anyone else leveraging unit tests to bug hunt or sanity-check their code? What are some tools and techniques I should be aware of? Any pointers and suggestions would be welcome. Also, even though their use is discouraged due to a pending syntax change, do any of you know how to get .assert and .check to work?

cjs · Post by **cjs** » Thu Aug 27, 2020 8:01 pm

Yes, I extensively unit test all my assembly code. I use the pytest framework in Python because it's by far the best unit test framework of the many dozens I've seen and the several I've written in the last twenty years.

At the moment you can find all of what's discussed below in my 8bitdev repo.

In my system first he file is assembled with an assembler of choice. Currently my (rather horrible) top-level build script and the loaders support The Macroassembler AS and the ASxxxx assembler suite, but others would be easy enough to add. (The main work is in writing the code to read your assembler's symbol table output.) Then the unit test framework starts and, for each test, sets up a CPU simulator (currently available are py65 for 6502 and my own for 6800), loads the object file into it, loads the symbol table, and runs the test.

Here's a sample set of 10 unit tests for a 6502 routine called `bi_readdec`, which given a pointer to an ASCII representation of a hexadecimal number converts it to a "bigint" (arbitrary-precision) binary number and stores that in an output buffer.

Code: Select all

#   Buffers used for testing deliberately cross page boundaries.
INBUF  = 0x6FFE
OUTBUF = 0x71FE

@pytest.mark.parametrize('input, output', [
    (b'5',               b'\x05'),
    (b'67',              b'\x67'),
    (b'89A',             b'\x08\x9A'),
    (b'fedc',            b'\xFE\xDC'),
    (b'fedcb',           b'\x0F\xED\xCB'),
    (b'80000',           b'\x08\x00\x00'),
    (b'0',               b'\x00'),
    (b'00000000',        b'\x00'),
    (b'087',             b'\x87'),
    (b'00000087',        b'\x87'),
])
def test_bi_readhex(m, R, S, input, output):
    print('bi_readhex:', input, type(input), output)

    m.deposit(INBUF, input)
    m.depword(S.buf0ptr, INBUF)
    m.depword(S.buf1ptr, OUTBUF)
    size = len(output) + 2               # length byte + value + guard byte
    m.deposit(OUTBUF, [222] * size)     # 222 ensures any 0s really were written

    m.call(S.bi_readhex, R(a=len(input)))
    bvalue = m.bytes(OUTBUF+1, len(output))
    assert (len(output),    output,  222,) \
        == (m.byte(OUTBUF), bvalue, m.byte(OUTBUF+size-1))

Some notes to help explain this:

1. The test is obviously parametrized, allowing me to use the same code body for many tests. The `input` and `output` parameters are obviously specified right there; the other three parameters are `m`, the simulated machine, `S` the symbol table loaded from the assembler output, and `R` a class allowing me to construct "register set" objects (there will be more on this below). All three of those are "fixtures"; simply adding an `m` to the parameter list tells pytest to go find the setup code for the simulated machine, run it, and pass in the object it produces.

2. The `print` statement prints to stdout; this is captured by pytest and won't be shown unless the test fails. (Though you can ask it to show output even from successful tests if you like.)

3. You can see that there are functions to deposit bytes and words into the simulator's memory. Here this is used to set up the input buffer and the pointers to the input and output buffers. `INBUF` and `OUTBUF` are just the constants defined earlier in the test code. `buf0ptr` and `buf1ptr` are symbols in the assembly code; `S.buf0ptr` returns the value of `buf0ptr`, which in this case is the address in memory where we store the pointer to that buffer.

4. `m.call()` starts executing code in the simulator; it starts at the given address (the `bi_readhex` symbol, here) and counts JSRs and RTSs until it finds the final RTS, where it stops and returns, unless it encounters a BRK instruction in which case a (Python) exception will be thrown. (The list of "stop" opcodes can be specified, as can a different limit on the number of instructions to execute before throwing an exception.) If your JSRs and RTSs don't match, there are other ways of calling the code and running it to a given point, exiting without an exception on encoutnering a given opcode, etc.

5. `m.call` also takes a register set (which includes flags); here you can see that we set only register A, loading it with the length of the input buffer.

6. After it returns, we fetch some bytes from the simulator's memory and then assert that various values are what we expect them to be. There's almost never any need to write your own assertion functions; simply `assert EXPRESSION` and if it fails pytest will take it apart and show you the pieces, even telling you things like which individual elements in a list (or in this case, a sequence of bytes) are different from what's expected. That's why I can combine all my values above into 3-tuples and compare them; pytest will tell me which individual values in the tuples did not match and drill down even further into those if they're structured values.

This test unfortunately doesn't demonstrate register/flag comparisons, but those are done with objects constructed with R(), which can have "don't care" values to be used in comparisons. So typically I'd do something like `assert R(x=0x33, Z=1) == m.regs` to test just the x register value and Z flag, and on failure it would give me back something like the following, where the hyphens indicate the "don't care" values in the expected result:

Code: Select all

____________________________ test_bi_readhex[67-g] _____________________________
src/m65/bigint.pt:54: in test_bi_readhex
    assert R(x=0x33, Z=1) == m.regs
E   assert Unexpected Registers values:
E     6502 pc=---- a=-- x=33 y=-- sp=-- ------Z-
E     6502 pc=1069 a=FE x=00 y=FF sp=FF nv--diZC
----------------------------- Captured stdout call -----------------------------
bi_readhex: b'67' <class 'bytes'> b'g'

It's worth mentioning that this sort of testing can also replace using a debugger in many circumstances; it's not difficult (but should be made easier!) to have the simulator stop at specified addresses and print out the current values of whatever registers and memory are of interest, for example. I can also generate execution traces, but those too want more work (for example, they currently don't show what memory was changed at every step).

Right now this whole thing is not really "productized" for use by others; the framework should be in a separate repo, with documentation and tutorials, etc. etc. I'm planning to get around to that one day, but it's still under pretty heavy development at the moment. However, I'm happy to do support, pair programming sessions, whatever, to help anybody who's interested in getting up to speed on this stuff.

Quote:

I realize 99% of this would be an effort in higher level scripting, likely with Bash or Perl.

Yeah, as someone who's been using Bourne shell since the '80s, Perl since the '90s, Ruby from the early 2000s onwards, and, over the last few years, Python, I can say you definitely should simply start with Python. I frequently ignore my own advice and use Bash to get something started and most of the time I regret it. (My top-level `Test` script in that repo is an excellent example.) The difference isn't as vast with Perl or Ruby, but it's still there and hurts in some important areas. (For example, you can't get something like pytest in Ruby or Perl because they don't give you access to the compilation system; pytest actually compiles the Python code in your tests differently from normal in order to instrument it so it can take apart structured variables in the way mentioned above.)

load81 · Post by **load81** » Thu Aug 27, 2020 8:47 pm

cjs wrote:

Yeah, as someone who's been using Bourne shell since the '80s, Perl since the '90s, Ruby from the early 2000s onwards, and, over the last few years, Python, I can say you definitely should simply start with Python. I frequently ignore my own advice and use Bash to get something started and most of the time I regret it. (My top-level `Test` script in that repo is an excellent example.) The difference isn't as vast with Perl or Ruby, but it's still there and hurts in some important areas. (For example, you can't get something like pytest in Ruby or Perl because they don't give you access to the compilation system; pytest actually compiles the Python code in your tests differently from normal in order to instrument it so it can take apart structured variables in the way mentioned above.)

Thanks! I'll look into it and give it a try.

I know what you mean with Bash. Sometimes Bash is the right answer for simple problems. The reality is, I often write simple prototype code in Bash and use that code as a rough outline. Then, I rewrite everything in whatever language I'm going to actually use, often Python. I like Python for API-to-API type stuff, but I don't usually find low level work appealing in the language.

I was experimenting with Raku (ex-Perl 6) for a while. It has grammars, which are named regexes with recursion thrown in to allow for some very complex parsing to take place. After that, I began and oddly intense Perl 5 kick that isn't slowing down. There is an interesting proposal called Cor; it's a new object model. It is sort of Ruby-like to my eye. That proposal got me to give Perl a second look after a long hiatus, one-liners and short one-off "data munging" scripts notwithstanding. It's neat to see how much the language has changed over the years.

I never did get into Ruby, though I do find a lot of Ruby code is visually appealing. I think the next language I'm going to try to tackle is Forth, just because it's so different from everything else. On that note, DurexForth (C64) looks very cool.

Martin_H · Post by **Martin_H** » Thu Aug 27, 2020 11:31 pm

I unit test my code using Py65Mon launched from a Makefile. No fancy framework, I just look for known good output.

Here's a link to my repo:
https://github.com/Martin-H1/6502/blob/ ... n/Makefile

cjs · Post by **cjs** » Fri Aug 28, 2020 2:07 am

load81 wrote:

I know what you mean with Bash. Sometimes Bash is the right answer for simple problems.

Well, Bash is pretty good for expressing the running and combining of programs. But still, it is possible write libraries in other languages to make that almost as easy, if not sometimes easier. (Check out Rash: The Reckless Racket Shell for a particularly stunning example.) I really ought to just be at least trying to start with Python and the shell package when I'm about to write a script. The problem is, I've been writing shell scripts too damn long and my brain turns off as soon as I need one because I can write them almost automatically. (And maybe I'm just a bad person. :-/ Surely they must have meetings for people like me. "Hi. My name is Curt, and I write Bash.")

Quote:

I like Python for API-to-API type stuff, but I don't usually find low level work appealing in the language.

I'm not sure what you mean by "low-level," but I wrote a 6800 CPU simulator in Python, which is a completely standalone program (it uses only two functions from the standard libraries), and found it to be just fine. (Or as fine as one can be with run-time type checking and Algol-style syntax, anyway. :-)) You can find the core of it in the three `op*` files here; it's barely over 500 lines of code, plus another 700-odd lines for 400 unit tests (again, thanks to pytest, plus a handful of admittedly complex functions). Those counts include comments, which I count as code.

Quote:

I was experimenting with Raku (ex-Perl 6) for a while. It has grammars, which are named regexes with recursion thrown in to allow for some very complex parsing to take place.

Well, as soon as I hear the words "parsing" and "regex" in the same sentence I tend to run away as quickly as possible; parsing is is probably more subject to Jamie Zawinski's dictum¹ than anything else I've encountered. Those do look like a better way of building regular expressions, though, if you're ever in one of those rare situations where complex regular expressions are actually advisible to use.

----------
¹ "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems"

SamCoVT · Post by **SamCoVT** » Fri Aug 28, 2020 1:04 pm

Martin_H wrote:

I unit test my code using Py65Mon launched from a Makefile. No fancy framework, I just look for known good output.

I'll second py65 - it's written in Python and you can extend it. When working with scotws on TaliForth2, we implemented tests for all of the Forth words by extending py65 to handle input from test files and output to a test result file. It would also make sure all input was consumed and there were no Forth error messages in the output.

After instantiating and extending the py65 monitor class, you can load binaries into RAM, peek and poke things in memory, single step, set breakpoints, and run at full speed. You can also simulate hardware at various addresses by "subscribing" to reads or writes at the addresses you want - it will run your function to determine how to react. We added a 32-bit cycle counter to time Forth words.

Here is the test script we came up with.
https://github.com/scotws/TaliForth2/bl ... alitest.py

dmsc · Post by **dmsc** » Fri Aug 28, 2020 2:15 pm

Hi!

load81 wrote:

Is anyone else leveraging unit tests to bug hunt or sanity-check their code? What are some tools and techniques I should be aware of? Any pointers and suggestions would be welcome. Also, even though their use is discouraged due to a pending syntax change, do any of you know how to get .assert and .check to work?

For the FastBasic unit testings, I wrote my own emulator library: https://github.com/dmsc/mini65-sim/ ; It emulates the full Atari 8-bit OS, but not the Atari hardware, so it can be used to test all the command line tools and BASIC samples.

Using this emulator, I built a simple test framework at https://github.com/dmsc/fastbasic/tree/master/testsuite , it reads test definition files like this: https://github.com/dmsc/fastbasic/blob/ ... -input.chk

Code: Select all

Name: Test statement "INPUT"
Test: run-fp
Input:
1
2
.
Output:
Start
?1        1
1         2
?18

"Test" says which test to apply, "run-fp" means compile with floating-point compiler, then run the resulting program. "input" data is passed to the emulator as console input, "output" data is checked to match the one given. The above is accompanied with the following basic program: https://github.com/dmsc/fastbasic/blob/ ... -input.bas

Code: Select all

' Test for statement "INPUT"
? "Start"
input a%
? err(), a%
input ; b%
? err(), b%
input a%
? err()

Note that the emulator is used first to run the command line compiler, so the full process is tested as it would work in the Atari.

Have Fun!

soci · Post by **soci** » Sat Aug 29, 2020 7:17 am

The .assert and .check directives in 64tass are not for code testing purposes as outlined above.

These directives were added long time ago to prevent mistakes when programming banked memory systems. I needed them because often the wrong memory configuration was used which resulted in memory trashing or garbage reads. Also certain functions were only supposed to be called if the memory area(s) they operated on were available. Or worse those functions could have been banked out themselves.

It was a sort of hack and their use was complicated. However they served their purpose and I got rid of a lot of bugs in my code while suffering their limitations.

I've choose not to document them to discourage their use as they will go away at some point once I figure out a proper replacement for them.

Somewhat platform specific but more on topic I think:

https://www.commocore.com/repository/c64unit
https://github.com/martinpiper/BDD6502
http://www.cactus.jawnet.pl/attitude/?a ... 8&which=15

load81 · Post by **load81** » Sat Aug 29, 2020 8:52 pm

soci wrote:

The .assert and .check directives in 64tass are not for code testing purposes as outlined above.

Soci, thanks for clearing that up. In other languages "assert" gets used a lot to mean "throw an error if this condition [which should never happen] actually occurs."

Your assembler is rock solid. You should absolutely have a Patreon or a cryptocurrency address for users to donate to.

cjs · Post by **cjs** » Mon Aug 31, 2020 9:25 am

soci wrote:

Somewhat platform specific but more on topic I think:
https://www.commocore.com/repository/c64unit
https://github.com/martinpiper/BDD6502
http://www.cactus.jawnet.pl/attitude/?a ... 8&which=15

soci, thanks for finding those! I always like to look at other systems and see how they compare to mine (and steal any good ideas that they have :-P).

I've had only a brief look at them so far, but I do have a couple of comments on them.

I'm not seeing much use of test generation from parameters in any of these systems. This is something that in my experience is not used so much in testing high-level languages, but I've found I use it quite heavily in testing assembly code. For example, in the article on CommTest they have the following tests for a "subtract" function which I find quite typical:

Code: Select all

context("when address is $0000") {
  it("results in address = $ffff") {
    writeWordAt(address, 0x0000)
    call
    assert(readBytesAt(address, 2) === Seq(0xff, 0xff))
  }
}

context("when address is $0001") {
  it("results in address = $0000") {
    writeWordAt(address, 0x0001)
    call
    assert(readBytesAt(address, 2) === Seq(0x00, 0x00))
  }
}

context("when address is $0100") {
  it("results in address = $00ff") {
    writeWordAt(address, 0x0100)
    call
    assert(readBytesAt(address, 2) === Seq(0xff, 0x00))
  }
}

There's a lot of code duplication here, which is precisely where pytest's test parametrization becomes so nice:

Code: Select all

@pytest.mark.parametrize('input, result', [
    (0x0000, 0xFFFF), (0x0001, 0x0000), (0x0100, 0x00FF),
])
def test_subtract(m, S):            # machine, Symbol table
    m.depword(S.address, input)
    m.call('subtract')
    assert result == m.word(S.address)

Perhaps not such a big deal when you have just three cases, but it's not unusual for me to have two dozen or more test cases for a more complex function. And in some cases I derive great comfort from being able to programmatically generate an exhaustive list of inputs and results for functions with a small but not tiny range of inputs (say, 256 values).

You'll have noticed there's some plain English descriptions in the ConnTest test cases above. Moving towards such "plain English" descriptions is characteristic of "BDD," or "Behaviour-Driven Design." BDD6502 actually writes the tests in such form, as in this example:

Code: Select all

  Scenario: Simple Score add test

    Given I start writing memory at $400
    Given I write the following bytes
      | Score_ZeroCharacter+3 | Score_ZeroCharacter+4 | Score_ZeroCharacter+5 | Score_ZeroCharacter+6 | Score_ZeroCharacter+7 | Score_ZeroCharacter+8 | Score_ZeroCharacter+9 |

    Given I start writing memory at $500
    Given I write the following hex bytes
      | 05 04  06 04 03 01 |

    When I set register a to lo($500)
    When I set register x to hi($500)
    When I execute the procedure at ScoreAdd for no more than 103 instructions

    Then I hex dump memory between $400 and $407
    Then I expect to see $3ff equal 0
    Then I expect to see $400 equal Score_ZeroCharacter+3
    Then I expect to see $401 equal Score_ZeroCharacter+4
    Then I expect to see $402 equal Score_ZeroCharacter+7
    Then I expect to see $403 equal Score_ZeroCharacter+0
    Then I expect to see $404 equal Score_ZeroCharacter+2
    Then I expect to see $405 equal Score_ZeroCharacter+4
    Then I expect to see $406 equal Score_ZeroCharacter+9
    Then I expect to see $407 equal 0

How appealing BDD is to you I suppose depends on how much you like or hate verbosity (my instinct is to run as soon as I hear either of "BDD" or "plain English"), but this is what the above would look like in a "non-BDD" system:

Code: Select all

def test_simple_score_add(m, S, R):        # machine, Symbol table, Register set constructor
    m.deposit(0x400, score_zchar[3:10])
    m.depoist(0x500, b'\x05\x04\x06\x04\x03\x01')
    m.call(S.ScoreAdd, R(a=LSB(0x500), x=MSB(0x500))
    assert b'\x00' + score_zchar[3:10] + b'\x00' == m.bytes(0x3FF, 10)

What Sorts of Tools Do You Use to Unit Test Your Code?

What Sorts of Tools Do You Use to Unit Test Your Code?

Re: What Sorts of Tools Do You Use to Unit Test Your Code?

Re: What Sorts of Tools Do You Use to Unit Test Your Code?

Re: What Sorts of Tools Do You Use to Unit Test Your Code?

Re: What Sorts of Tools Do You Use to Unit Test Your Code?

Re: What Sorts of Tools Do You Use to Unit Test Your Code?

Re: What Sorts of Tools Do You Use to Unit Test Your Code?

Re: What Sorts of Tools Do You Use to Unit Test Your Code?

Re: What Sorts of Tools Do You Use to Unit Test Your Code?

Re: What Sorts of Tools Do You Use to Unit Test Your Code?