Self-Modifying Code on a Commodore VIC-20

Note: All code listings are in lower case so that they are pastable into the VICE emulator. Otherwise, you will get graphics/uppercase PETSCII characters on paste.

Examining the structure of how the BASIC code is stored

User program RAM is in locations 4096 to 7680 (decimal) on a VIC 20. The storage format of the basic programs can be dumped with the following BASIC:

for i=4096 to 7680 - fre(1): ? i,chr$(peek(i));peek(i): next i 

I’ve taken the extra step up adding a slightly more sophisticated version of the above at line 10000 in the below code so that I can RUN 10000 to dump memory locations with paging and skipping control and non-printable characters.

10 print "hi"
20 n=peek(4104)
30 x=peek(4105)
40 if n >= 90 then n=65
50 n=n+1
60 x=int(26*rnd(1)+65)
70 poke 4104,n
80 poke 4105,x
90 goto 10
9999 end
10000 b=4096:i=b
10010 e=7680-fre(0)
10020 c=0
10030 ls=20
10040 ? i,
10050 ch=peek(i)
10060 ? ch;
10070 if(ch>=32 and ch<=127)or(ch>=160 and ch<=254)then ? chr$(ch);
10075 ?
10080 if c>ls then ? "continue";: input wt$: c=0
10090 c=c+1
10100 i=i+1
10110 if i>e then end
10120 goto 10040
User program RAM dump

You’ll notice in the above that we start with a null character (0) followed by 12, 16, 10 and 0. 12 and 16 are a pointer to the the memory location of the next line of code (in “little endian” order, so 16 * 256 + 12 = 4108)

The next bytes, at location 4099 and 4100, are 10 and 0. This is the line number for that line of code (again, in little endian format).

Once you get past these 2 2 byte numbers, you have a code…. 153: 153 is the VIC 20 BASIC Keyword Code for the PRINT statement. All syntactically significant tokens (keywords and symbols) are reduced to a single byte (and TAB and SPC functions actually include their left parenthesis as part of this code). The VIC-20 Programmer’s Reference Guide lists out these values (some of these are just their PETSCII codes if individual characters):

VIC 20 BASIC Keyword Codes

You’ll notice that space (32) and double quote (34) are explicitly expressed, as are the individual digits of any number literals.

At the very end of the line is a 0/null again to terminate the line. (Fun part of this experiment: Setting a byte in the middle of the line to 0 makes the rest of the line unreadable by the BASIC interpreter!)

Modifying the code

For an easy first attempt at this, I’m going to just change location 4105 and 4106, which are the letters in HI

10 print "hi"
HI at 4104 and 4105

In the below code, I’m cycling the original H through the alphabet (65-90) and setting the original I with random values:

20 n=peek(4104)
30 x=peek(4105)
40 if n >= 90 then n=65
50 n=n+1
60 x=int(26*rnd(1)+65)
70 poke 4104,n
80 poke 4105,x
90 goto 10
The changing 2-letter strings from the self-modifying code

If you BREAK out of the program (Esc key in VICE emulator) after running and list the first few lines, you’ll see that the initial PRINT statement’s string has indeed changed:

The print statement has had its string changed.

What’s Next?

This is obviously a very trivial exercise of self-modifying code, but any modifications that require anything aside from 1:1 in-place replacement requires more planning: The lines of a program are variable in length, which means that inserting code requires shifting subsequent code in memory. Also, shifting code in memory requires updating all pointers that pointed to the original locations. The next exercise will probably be adding code to the end of the program rather than trying to insert it in the middle.