The KimKlone: a radical 6502 redesign

Dr_Jefyll · on Oct 5, 2011

Thank you, 0x12 and all of you, for your interest in my quirky creation. Naturally I follow modern FPGA technology avidly, but I guess it's obvious that my project predates that sort of thing. 2901 style Bit-Slice components were the wonder of wonders not so long ago! But modern programmable logic brings us to another level altogether.

Another revolution of course is the community of the Internet. My KimKlone project was conceived and implemented as a one-man effort, and for years it remained a de facto secret since I had no way to publicize it and almost no-one to discuss it with. These are magical times we live in, and I'm grateful to share your ideas and comments.

-- Jeff

ChuckMcM · on Oct 4, 2011

This is nice stuff. I built a TV-Typewriter using Don Lancaster's cheap video cookbook. He pioneered this technique of 'co-processing' with the 6502. However, if you were going to do this today, you would be better off just building a 6502 into an FPGA [1] or buying an existing design [2] and modifying it. Jan Gray did a nice RISC cpu [3] that clearly got some inspiration from the the 6502 and it was very small in terms of gates (although that isn't such a big deal these days).

It is well worth your time to get an FPGA evaluation kit (my all time favorite is the Altera DE-2) and learn for yourself how straight forward it is to build your own CPU which is has instructions that do things you need vs what the manufacturer thought you might need.

[1] http://opencores.com/project,lattice6502

[2] http://edablog.com/2010/05/06/mos-6502-cpu/

[3] http://fpgacpu.org/xsoc/cc.html

Don's book: http://www.abebooks.com/9780672215247/Cheap-Video-Cookbook-L...

kabdib · on Oct 4, 2011

World record for a stock 6502 without cooling is ~ 25Mhz, done one semi-drunk Friday evening in a lab at Commodore by Leonard Tramiel and associates.

"Let's see how fast this thing can go before the smoke gets out."

[Told to me by Leonard. I miss Friday beer-bashes surrounded by lab equipment, all kinds of stuff can happen. Richard Frick has the Atari distance record for a reverse-biased 555 timer; it blew its top about 15 feet.]

0x12 · on Oct 4, 2011

> Richard Frick has the Atari distance record for a reverse-biased 555 timer; it blew its top about 15 feet.

Haha! That's really funny, spectator sports for nerds :)

25Mhz is not too shabby, how did the ram hold up at those speeds?

0x12 · on Oct 4, 2011

Extremely clever use of the illegal opcodes in the original 6502, check out the way he fakes the output of the databus to the processor by pretending a different opcode was read than the one that was actually read.

Hacking at its finest.

yanowitz · on Oct 4, 2011

If you haven't watched this video about reverse engineering the 6502 (and how the decoder works, etc.), find the time. It's well worth the ~50 minutes. Mind blowing stuff: http://www.youtube.com/watch?v=reIYvmuWHhk&sns=tw

zokier · on Oct 4, 2011

I wonder how old this is. Sounds like something from mid '90s maybe?

zokier · on Oct 4, 2011

Downvotes?

minikomi · on Oct 4, 2011

Wow... Looking over some of the machines that guy had to work on is really great.. And a testiment to how far the printing / copying field has come!

soapdog · on Oct 4, 2011

is it down?

0x12 · on Oct 4, 2011

not for me.

hackermom · on Oct 4, 2011

Interesting work, but I couldn't help to notice one part: "The KimKlone represents an architectural extension of the 65C02. The most striking improvement is efficient linear access to a 16 Mbyte Address Space."

Incidentally, this is what the 65816 was made for.

rbanffy · on Oct 4, 2011

The 65816 address space was segmented, like the 8086's.

jjss · on Oct 4, 2011

Nope, the 65816 address space is flat. If you read the docs and look at the opcodes I see how it can give the impression of being segmented (with all the "bank" terminology), but it's a full 24-bit address space with no restrictions.

rbanffy · on Oct 4, 2011

Sort of, at least. IIRC, program code could not execute across bank boundaries, although you could do long jumps to any 24-bit address.

dfox · on Oct 4, 2011

Which is probably the case also for this hack. It's probably possible to devise logic that would remove this limitation, but I assume that would be mentioned in the article.

0x12 · on Oct 4, 2011

Read the article, he's thought of that. There is a trick with two registers in there that allows you to pivot from one bank to another.

More explanation in the appendix on the instruction set. The opcodes are 13, 23, eb and fb:

http://www.laughtonelectronics.com/arcana/BrideOfSon%20KK%20...

Bottom of the list.

dfox · on Oct 4, 2011

It allows you to jump between banks, but not to overflow from one bank to another without explicit jump on the boundary.

rbanffy · on Oct 4, 2011

You can always check if all address lines transitioned from 1 to 0 on an instruction fetch and increment whatever you use for PBR...

dfox · on Oct 4, 2011

It's conceptually simple, but there are two problems:

1) you have to detect near jumps that jump back inside one bank. (at least from instruction ending on 0xffff to instruction starting on 0x0000, on the other hand you can plausibly ignore this as it is very unlikely case)

2) more importantly, I understand that this coprocessor contraption does not interact in any way with lower 16 bits of address bus. You would need to actually snoop on address bus and detect the wraparound and as this thing is built from MSI logic, detecting transition 1 -> 0 on all bits of address bus - while conceptually trivial - would require significant amount of hardware. You can detect 1 -> 0 transition only on 15th bit of address, but then you really need to detect jumps in microcode and disable this logic in case of jump.

delinquentme · on Oct 4, 2011

this is awesome ... but can we get a TLDR?

0x12 · on Oct 4, 2011

Guy expands 6502 to 16M address space by intercepting the databus and re-mapping unused opcodes and clever use of the spurious signals generated by the cpu when executing other undefined opcodes, adds a few registers to make the whole thing transparent from an assembler programmers point of view. In other words, there is no difference to the programmer between native and newly minted instructions.

On top of that he boosts the speed of his forth interpreter by concentrating on a frequently used construct called 'NEXT' in a way that should make anybody that has tried to optimize the inner loop of some VM or language proud. After all, what better way to optimize in such a situation than to be able to mold the instruction set to your desire.

He then uses this home-brew Frankenstein contraption as his benchtop computer for multiple years to do real work (instead of just shooting some pretty pictures and calling it a day).

hth

mjhall · on Oct 4, 2011

The most significant part (I think, probably wrong) is on page 5[0] where he details the invalid instructions that do more than a NOP and why they're useful.

[0] : http://www.laughtonelectronics.com/arcana/BrideOfSonPg5.html

Dr_Jefyll · on Oct 5, 2011

Sorry about the TLDR situation; it's on my list to revise the article by prepending an abstract. BTW suggestions and questions about the article are welcome.

@mjhall - you're quite right; the strange, phantom 65c02 operations dramatically expanded what I could do with this project. It's extremely cooperative of the CPU to generate a memory address while leaving all registers unchanged! Although it's true I could've used the PROM to map NOPs onto CMP or BIT instructions and gotten my addresses that way, that approach preserves the registers but still stomps the Flags. In contrast, the "LDD" operations are ideal for the job -- an opportunity handed to me on a silver platter!

-- Jeff