( Copyrighted by eMAST Technology Corp, 2000 )
( All rights reserved )

\ eForth Kernel for P16, 02dec98cht
\ from kernel28.seq, comment out debugging aids, 24nov95cht
\    Keep 0 page usage to 2FF.  300-3FF needed by system.
\ kern16 10jun88cht
\ kern24, 29nov00cht, for P24, 'if' does drop. add B> and >B for packing bytes.

comment:

The KERN24.F file contains most of the words which are written
in assembly for speed considerations.  P24 eForth is optimized
as all the words which can be written in assembly are so done.
However, much more optimization is achieved by a set of macros,
which try to convert the most commonly used high level Forth
words into machine instructions and packs these machine instruction
as tightly as possible.  The end results are that the code size
is significantly reduced and the execution speed greatly increased.

The use of macros will be further explained along with the code.

The code words in this file alse serve as programming examples
for the optimal use of the P24 CPU.  It is worth you while to study
them carefully, and use them as templates when you like to convert
high level application words into assembly.

The Forth Virtual Engine is:
   T              top of stack
   S              data stack     16 levels
   R              return stack   16 levels
Both the data and return stacks are in CPU.

The A register is used by the memory fetching and storing
instructions to provide address to the external memory.  When
not used to address memory, A can be used as a scatched
register.

In the MUL and DIV instructions, the A register serves as the
extension to the T register to hold the lower half of the partial
product or the divident.

Subroutine thread model eliminates IP, doColon, and EXIT.

16 levels of stacks are enough for most applications.  They will
wrap around when exhausted.

Memory allocation:
        0       Boot code
        10      Initial variables
        18      Kernel
        9A      Forth words
        700     RAM, variables
        710     Text buffer
        730     TIB
        780     User dictionary
        7FF     End of memory

'if' drops top element being tested.
'-if' does not drop, as it tests carry.

comment;

\ All the system variables are defined as macros.  They will be
\ assembled as literals in the form of LDI instructions.  On
\ execution, they will return their respective addresses on the
\ data stack.  It is assumed that the target system has RAM starting
\ from location $700.  For a different target system, you have
\ to change the locations in these macros.

\ HLD points to buffer for output numeric string.
\ SPAN variable to hold the length of input text string.
\ >IN offset to the text string currently being interpreted.
\ #TIB length of the input text string
\ 'TIB location of the terminal input buffer.
\ BASE base for number conversions
\ CONTEXT pointer to start dictionary searches
\ CP points to the top of the dictionary
\ LAST points to the name field of the last word
\ 'EVAL points to $INTERPRET or $COMPILE to evaluate words
\ 'ABORT points to error recovery routine
\ TEXT points to text buffer to unpack strings
\ tmp a scratch pad variable.
hex
CRR .( System variables ) CRR
: HLD 700 ldi ;         \ scratch
: SPAN 701 ldi ;        \ #chars input by EXPECT
: >IN  702 ldi ;        \ input buffer offset
: #TIB 703 ldi ;        \ #chars in the input buffer
: 'TIB 704 ldi ;        \ TIB
: BASE 705 ldi ;       \ number base

CRR
: CONTEXT 706 ldi ;    \ first search vocabulary
: CP 707 ldi ;         \ dictionary code pointer
: LAST 708 ldi ;       \ ptr to last name compiled
: 'EVAL 709 ldi ;       \ interpret/compile vector
: 'ABORT 70A ldi ;
: TEXT 710 ldi ;         \ unpack buffer
: tmp 70B ldi ;        \ ptr to converted # string

\ Many Forth words have corresponding P24 machine instructions
\ or can be represented by a short sequence of P24 machine
\ instructions.  Instead of representing them in subroutines,
\ they are defined as macros, which invoke the assembler
\ mneumonics to pack as many machine instructions to program
\ words.
\ Obviously, if a Forth words can be translated to less than
\ four machine instrucitons, there are gains in shorter code
\ sizes and faster execution speed.  However, there are also
\ significant gains when a Forth word is defined as a 4 machine
\ instruction macro, because it may continue the packing from
\ the previous word to the next word.
\ These macros together with the machine instructions
\       DUP, DROP, AND, XOR
\ tend to pack the code tightly.
CR .( macro words ) CR
: EXIT ret ;
: EXECUTE ( a ) push ret ;
: ! ( n a -- ) sta st ;
: @ ( a - n ) sta ld ;
: R> ( - n ) pop ;
: R@ ( - n ) pop dup push ;
: >R ( n ) push ;
: SWAP ( n1 n2 - n2 n1 )
   push sta pop lda ;
: OVER ( n1 n2 - n1 n2 n1 )
   push dup sta pop
   lda ;
: 2DROP ( w w  -- )
   drop drop ;
: + ( w w -- w ) add ;
: NOT ( w -- w ) com ;
:  NEGATE ( n -- -n )
   com 1 ldi add ;
: 1- ( a -- a )
   -1 ldi add ;
: 1+ ( a -- a )
   1 ldi add ;
: BL ( -- 32 )
   20 ldi ;
: +! ( n a -- )
   sta ld add st
   ;
: - ( w w -- w )
   com add 1 ldi add
   ;

\ Following words are complicated and have to be defined as
\ code word.
\ doVAR starts a variable or an array.  It returns the address
\ following doVAR.
\ doNEXT terminates a FOR-NEXT loop.  It decrements the counter
\ on the return stack.  It exits the loop when the count is 0.
CR .( kernel words ) CR
CODE doVAR
   pop ret
CODE doNEXT
   pop pop dup                \ decrement count
   if -1 ldi add push
      push ret             \ if index is not 0, loop back
   then
   drop 1 ldi add                  \ index is 0, exit loop and continue
   push ret

\ Following are Forth words which are too long for macros,
\ yet still easily expressible in machine instructions.
\ They are all commonly used Forth words.
\ UM+ ( n n -- sum carry ) is a special word in eForth to
\ provide carry in addition.  However, it is not used here
\ because carry is readily accessible using -if or BNC.
\ Note that BZ removes the flag tested from the stack, while
\ BNC does not disturb the data stack.  Thus BZ can be used
\ to code IF directly, and BNC will let -IF to test T repeatedly
\ using the SHL instruction.
CR
CODE 0< ( n - f )
   shl
   -if drop -1 ldi ret
   then
   dup xor ( 0 ldi )
   ret
CODE OR ( n n - n )
   com push com
   pop and com ret
CODE UM+  ( n n - n carry )
   add
   -if 1 ldi ret
   then
   dup dup xor ( 0 )
   ret
CODE ?DUP ( w -- w w | 0 )
   dup
   if dup ret then
   ret
CODE ROT ( w1 w2 w3 -- w2 w3 w1 )
   push push sta pop
   pop lda ret
CODE 2DUP ( w1 w2 -- w1 w2 w1 w2 )
   dup push push
   dup sta pop lda pop
   ret

CR
CODE DNEGATE ( d -- -d )
   com push com 1 ldi
   add
   -if pop ret
   then
   pop 1 ldi add ret
CODE ABS ( n -- +n )
   dup shl
   -if drop com 1 ldi add
       ret
   then
   drop ret

CR
CODE = ( w w -- t )
   xor
   if dup dup xor ret then
   -1 ldi ret
CODE 2! ( d a -- )
   sta push stp
   pop st ret
CODE 2@ ( a -- d )
   sta ldp ld ret
CODE COUNT ( a -- a+1 n )
   sta ldp push lda
   pop ret

\ B> adds one byte at b to the word at a.  It shifts the
\ existing data in a left by 8 bits.  Returns b+1 and a,
\ and is ready to pack in the next byte.
\ B> is used by PACK$ to pack a byte string into a packed
\ string.
\ >B unpacks three bytes in a and puts them at b.  Returns
\ a+1 and b+3 so it is ready to unpack the next word.  The
\ first byte unpacked is also return as a count, which is
\ useful when this word is the first word of a packed string.
\ >B is called by UNPACK$ to convert a packed string to a
\ counted byte string.
CR ( pack B> and unpack >B strings )
CODE B> ( b a -- b+1 a )
   push sta ldp push
   lda pop pop sta
   ld 
   shl shl shl shl
   shl shl shl shl 
   add st lda ret
CODE >B ( a b -- a+1 b+3 count )
   push sta ldp push
   lda pop pop ( a+1 n b ) sta 
   dup push 
   $FF ldi and pop
   $FFFF00 ldi and $FF ldi xor
   shr shr shr shr
   shr shr shr shr
   dup push 
   $FF ldi and pop 
   $FFFF00 ldi and $FF ldi xor
   shr shr shr shr
   shr shr shr shr
   $FF ldi and dup push
   stp stp stp ( a+1 c )
   lda pop ret

