Here are several different techniques that can be used to make
a small stripped image. Using these techniques I was able to make
a 450KB image that computes the "What's Hot"
page. Most of the techniques have sample code that can be run
(although running them will corrupt your image since they are
removing stuff). Several of the code segments use "bad"
coding style using methods such as become: and instVarAt:.
These methods were required to do some low level hacking and should
not be used otherwise.
If you know techniques that I've missed, please mail me, and I'll add them to the list.
There are many parts to the Smalltalk environment, browsers, interface widgets, database connection, etc. Some of these are required to be removed from a runtime image (e.g., Compiler, Browser, etc.), but many are not. The base stripper included with VisualWorks only strips the required classes, and does not try to analyze what classes are actually used. Other strippers require you to enter your startup class and will calculate all class and methods that are reachable from that class. These strippers will make a smaller runtime than the VisualWorks stripper. If you have a test suite with 100% coverage, another stripping technique is to strip all methods not called while executing the test suite. This is a very aggressive technique and produces the smallest images.
One of the easiest things to remove a few extra bytes from an image is to empty the subclasses instance variable in each class. While the subclasses instance variable is used a lot while browsing, it is seldom used when a program executes. Furthermore, if it is ever needed during execution, it can be calculated by reversing the superclass pointers. For example, we can replace the subclasses method in Behavior with this one:
subclasses
subclasses == nil
ifTrue: [^Array new]
ifFalse: [(Metaclass allInstances select:
[:each | each superclass == self class])
collect: [:each | each soleInstance]]
The ifFalse: case was changed so that it didn't return
the contents of the subclasses instance variable, instead it must
calculate the contents. We'll also need to change the addSubclass:,
removeSubclass: and allSubclassesDo:
methods to reflex this change. After changing these methods we
can then empty the subclasses variable:
Smalltalk allClassesDo: [:each | (each instVarAt: 4) == nil
ifFalse: [each instVarAt: 4 put: #()]]
Approximate savings: 8KB
There are couple things that can be done to eliminate some of
the blocks that are in an image. One thing is to define the value,
value:, value:value:, value:value:value:,
and valueWithArguments: methods in object to return
self. Whenever we have a block like "[nil]",
then we could replace it with nil instead. We must
be careful to not perform these optimization for classes that
redefine these messages (e.g., ValueModel) so it
would probably only be safe doing this optimization on literal
blocks and self blocks where we already know the types.
Another trick that can be performed is to add the value messages to symbol. These messages would perform the symbol on their arguments. For example, we can define
value: receiverObject
^receiverObject perform: self
Now we can use the symbol #first to represent the
block "[:each | each first]". Although
this is slightly slower than the block version, it probably will
not be noticed in an application.
Smalltalk allBehaviorsDo: [:each | each selectors do: [:sel |
| method |
method := each compiledMethodAt: sel.
method withAllBlockMethods do: [:meth |
1 to: meth size do: [:i |
(meth at: i) class == BlockClosure ifTrue: [
((meth at: i) method instVarAt: 1) == 68950388 ifTrue: [
meth at: i put: ((meth at: i) method at: 1)]]]]]]
Approximate savings: 14KB
Every compiled method has its byte codes instructions that are
executed when the method is run. Some of the smaller methods can
store their byte codes directly in the compiled methods as integers,
but most are stored as ByteArrays. Many of these
byte arrays are the same across the different methods and can
be shared to save space. This is some simple code to eliminate
the redundant byte codes in all compiled methods.
| sets |
sets := Array new: 25101.
CompiledCode withAllSubclasses do: [:each | each allInstances
do:
[:inst |
| bytes element set index |
bytes := inst instVarAt: 1.
bytes isInteger
ifFalse:
[index := (bytes inject: 0 into: [:s :e | s + e asInteger])
\\ 25101 + 1.
set := sets at: index.
set isNil ifTrue: [set := sets at: index put: Set new].
element := set basicAt: (set findElementOrNil: bytes).
element isNil
ifTrue: [set add: bytes]
ifFalse: [inst instVarAt: 1 put: element]]]]
Approximate savings: 119KB
Along with the byte codes, every compiled method has a literal
frame that it uses to lookup message names, non-instance variable
references, classes, and literals such as 1.0 and
#(). Every time you have a literal in your code,
it will create a new literal that is placed in the method (except
for SmallIntegers and Symbols). As a
result, there are many equivalent versions of the same literals
in the code. For example, there are many different empty arrays
that were created from #(). Each one of these empty
arrays take 12 bytes, so it would be advantageous if we used the
same empty array.
Here's a method that will remove duplicate literals from all methods:
| sets process |
sets := Dictionary new.
#(#Array #ByteArray #FixedPoint #Float #Double #LargePositiveInteger
#LargeNegativeInteger #ByteString #TwoByteString #BlockClosure
#CompiledBlock) do: [:each | sets at: each put: Set new].
process := [:anObject | 1 to: anObject size
do:
[:i |
| obj replaceWith set |
obj := anObject at: i.
set := sets at: obj class name ifAbsent: [nil].
set notNil
ifTrue:
[replaceWith := set basicAt:
(set findElementOrNil: obj).
replaceWith isNil
ifTrue:
[set add: obj.
obj class == Array ifTrue:
[process value: anObject]]
ifFalse: [anObject at: i
put: replaceWith]]]].
CompiledCode withAllSubclasses do: [:each | each allInstancesDo: [:inst |
process value: inst]]
The main problem with this is that if you have code that modifies a literal, then it will now modify all equivalent literals in the system since they are the same object. This is not a major problem since you should never write code that modifies literals in methods.
Approximate savings: 62KB
Once you have performed the Literals step, you will have
some methods whose literal frame contain the same object repeated.
For example, you may have a method that refers to 0.5
many times. When it was compiled, every 0.5 in the
code was translated to a different 0.5 in the method's
literal frame. Since we have made these the same objects in the
Literals step, we can eliminate the extra entry in the
literal frame. This is a major step that would require scanning
the byte codes, and converting them. It might not be worth it
for the savings gained.
Another place to remove extra objects from the system is in the
variable dictionaries. Whenever you create a class variable, it
is stored in a dictionary in the classPool instance
variable of that class. Whenever that variable is referenced in
a method, its association is used in the literal frame. Most of
the time the dictionary is not referenced by an application. Furthermore,
it is considered bad style to reference the class variables by
going through the class pool dictionary (similar to using the
instVarAt: method).
Smalltalk allClassesDo: [:each | each instVarAt: 8 put:
nil]
Approximate savings: 16KB
Whenever you define a class with instance variables, Smalltalk creates an object for the class and gives it an array of the instance variable names. These names are used when compiling to create the appropriate byte codes, but they are not used when executing the code so they can be removed from an application.
The main problem with this is that the database utilities uses
the update:to: method in Object. This
method requires the instance variable names to lookup the position
of the variable to be updated. The solution to this problem is
to convert all variable names to "instVarN", but this
will require all update:to: message sends to always
pass the first argument as a literal.
Smalltalk allBehaviorsDo: [:each | each instVarAt: 5 put:
nil]
Approximate savings: 100KB
Every method that has been automatically generated by one of the
VisualWorks tools (UIBuilder, QueryEditor,
etc.) are instances of AnnotatedMethod. These methods
contain a special dictionary that holds the methods resource types.
These resource tags are only used by the VisualWorks tools and
can be deleted from a runtime image:
AnnotatedMethod allInstances do: [:inst |
| newInst |
newInst := CompiledMethod new: inst size.
1 to: newInst size do: [:i |
newInst at: i put: (inst at: i)].
1 to: CompiledMethod instSize do: [:i |
newInst instVarAt: i put: (inst instVarAt: i)].
newInst become: inst]
Approximate savings: 24KB
Another simple thing that can gain some extra bytes is to shrink
the collections that automatically grow such as OrderedCollection,
MethodDictionary, Set, etc. This can
easily be done by executing code like:
#(Set OrderedCollection Bag IdentitySet List)
do: [:each | (Smalltalk at: each) allInstances do: [:inst |
|newInst |
newInst := inst class withAll: inst.
newInst basicSize < inst basicSize ifTrue: [newInst become: inst]]].
#(Dictionary IdentityDictionary SystemDictionary PoolDictionary)
do: [:each | (Smalltalk at: each) allInstances do: [:inst |
|newInst |
newInst := inst class withAll: (inst associations).
newInst basicSize < inst basicSize ifTrue: [newInst become: inst]]]
Approximate savings: 22KB
Every method compiled is represented in the system by an instance
of CompiledMethod or one of its subclasses. For similar
methods the only different between mclass variable.
We can replace all of these methods with the same instance.
| col ar |
ar := Array new: 25119.
col := OrderedCollection new: 4000.
col := CompiledMethod allInstances.
col := col select: [:each | each mclass ~~ VariableBinding].
col
do:
[:each |
| i hash |
(each instVarAt: 1) isInteger ifFalse: [
hash := (each instVarAt: 1) inject: 17171 into: [:v :e | (e even ifTrue: [e bitShift: e \\ 4] ifFalse: [e]) + v].
i := hash abs \\ 25119 + 1.
(ar at: i) isNil ifTrue: [ar at: i put: OrderedCollection new].
(ar at: i)
add: each]].
ar := ar select: [:each | each notNil and: [each size > 1]].
ar do: [:each | 1 to: each size do: [:i | i + 1 to: each size do: [:j | (each at: j)
= (each at: i)
ifTrue:
[| sel |
sel := (each at: j) mclass selectorAtMethod: (each at: j)
ifAbsent: [nil].
sel notNil
ifTrue:
[((each at: j) mclass instVarAt: 2) at: sel put: (each at: i)]]]]].
col do: [:each | each instVarAt: 2 put: Object].
Once an image is stripped, there are some items that can no longer
be used, but there are instance variables to reference them. For
example, every method has a sourceCode instance variable.
This instance variable holds an integer which the system can turn
into source code given the changes files. But since a stripped
system no longer contains the changes files, we do not need this
variable.
| newClass |
newClass := CompiledMethod copy.
newClass instVarAt: 3 put: 20482.
CompiledMethod allInstances
do:
[:each |
| newInst |
newInst := newClass basicNew: each basicSize.
newInst instVarAt: 1 put: (each instVarAt: 1);
instVarAt: 2 put: (each instVarAt: 2).
1 to: newInst basicSize do: [:i | newInst at: i put: (each at: i)].
newInst become: each].
CompiledMethod class instVarAt: 7 put: newClass.
CompiledMethod subclasses
do:
[:each |
each superclass: newClass.
(each instVarAt: 2) values do: [:method | method
withAllBlockMethodsDo: [:blk | 1 to: blk basicSize do:
[:i | (blk at: i)
== CompiledMethod ifTrue:
[blk at: i put: newClass]]]]].
CompiledMethod := newClass
Approximate savings: 106KB
Another trick is to replace all of the long meaningful names that programmers have given each program item with short computer generated ones. For example, the base VisualWorks image has 16,000 different symbols for methods and classes. The average length of each symbol is 16 bytes. We could replace the each long symbol with a three byte one or even a zero length one:
Smalltalk allBehaviorsDo: [:each | each selectors do: [:sel |
(sel isSymbol and: [(noChange includes: sel) not])
ifTrue: [Symbol basicNew become: sel]]].
ObjectMemory globalGarbageCollect.
Symbol rehash
The problem with this is that strings can be turned into symbols and performed. While is generally bad practice to do so, it is common with spec that use triggers. For example, to specify a trigger on a widget you will give it a model such as "name | trigger". This will be parsed to split the model from the trigger at runtime.
Another more aggressive variation of this technique would be to not use symbols at all. All that is required is to have a unique object for each symbol, and to replace all references to the symbols with the new object. Furthermore, since we already have methods for each symbol, we could use the methods as the unique objects.
Still another savings that can be made for symbols is to remove many of the entries in the symbol table. If we do not create new symbols at runtime, then we don't need the symbol table. Although we will probably need to create symbols at runtime, we could limit the tables size. In a base image the symbol table is over 110KB.
Approximate savings: 200KB (symbol compression)
Another simple thing to do is to use the same method dictionaries. There are many classes that do not define any methods. Many times these are the metaclasses. We can make all of these classes use the same empty method dictionary. This will save 40 bytes per empty method dictionary. In the standard image, there are approximately 470 of these, so we would save 18,800 bytes.
Another variation of this technique would be to merge compatible method dictionaries. Two classes method dictionaries could be merged if one does not override a method defined by the other class or its superclasses.
| dictionaries classes i j notFound |
classes := OrderedCollection new.
Smalltalk allBehaviorsDo: [:each | each superclass == nil
ifFalse: [classes add: each]].
classes copy do: [:each | (each includesSelector: #doesNotUnderstand:) ifTrue: [
each withAllSubclasses do: [:e | classes remove: e ifAbsent: []]]].
classes := (classes select: [:each |
(each selectors detect: [:sel | Object includesSelector: sel]
ifNone: [nil]) isNil]).
dictionaries := (classes collect: [:each | (each instVarAt: 2) ->
(Array with: (OrderedCollection with: each)
with: (each allSelectors))])
asSortedCollection: [:a :b | a key size < b key size].
i := 1.
[i < dictionaries size] whileTrue: [
j := i + 1.
notFound := true.
[j <= dictionaries size & notFound] whileTrue: [
| cls1 cls2 sel1 sel2 |
cls1 := dictionaries at: i.
cls2 := dictionaries at: j.
sel1 := cls1 key keys.
sel2 := cls2 value last.
notFound := ((sel1 inject: true into: [:bool :each | bool and:
[(sel2 includes: each) not]])
and: [sel1 := cls2 key keys.
sel2 := cls1 value last.
sel1 inject: true
into: [:bool :each | bool and:
[(sel2 includes: each) not]]])
ifTrue: [cls2 key keysAndValuesDo: [:key :val |
cls1 key at: key put: val].
cls2 key: (cls1 key).
cls2 value first do: [:each |
each instVarAt: 2 put: cls1 key].
cls2 value first addAll: cls1 value first.
cls2 value last addAll: cls1 value last.
false]
ifFalse: [true].
j := j + 1].
i := i + 1]
Approximate savings: 30KB
| Comments or suggestions can be sent to brant@cs.uiuc.edu. Last updated on 05-Jul-96. | ![]() ![]() ![]() |