-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathAssembly language notes~
2166 lines (1073 loc) · 49.2 KB
/
Assembly language notes~
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
#ASSEMBLY LANGUAGE#
Computer is composed of various coponents
-> so:
there exists instruction to tell computer what to do
They are called:
Machine code 10101010101001B
CPU transfer the machine code into electric volt to run the computer
\\Let's try sth.
DOSBOX debug -u
The list is:
a series of same hexa numbers: a code in hexa some code in hexa some english words some number
eg.
073f:0100 7403 JZ 0105
Here:
073F:0100 memory No.
7403 is machine instruction
JZ 0105 is assmbly instruction
The translator between machine instruction and assembly one is just compiler.
For assmebly language:
1.assemble instruction The compiler use this to translate into machine code
2.pesudo instruction Told compiler how to translate here or there
3.symbol system +-*/ controled by compiler
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Where to place the assmebly instruction?
RAM 内存条 or say Main memory
Most of the assembly instruction is stored in Main memory
\\Let's try sth.
DOSBOX DEBUG -u -d
-u there is machine instruction and assemble instruction
-d there is data
-> So from apperance:
data is instruction, instruction is data
More thing we can see:
The smallest unit of memory:
2 hexa bit
that is 1 B = 8 bit
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
CPU stores parts of the instructions
Most of them is in memory
Data and instruction is the same in memory, only cpu to tell them apart
CPU and memory is connect by the electric way in main board
There is three information we need to let cpu and memory contact:
1. The memory address no. == Address line
2. The data stored in the memory == Data line
3. write or read == Control line
As the smallest memory unit is 1 B
-> so: one address line connect to 1 B
Since one address line has only two status: 1 or 0
so it's impossible for one line to get the No3 information.
So how many lines depend on how large memory to reach.
Also , the memory no. is obviously start from 0.
Let n = the number of address line
It can display 2^n address no.
e.g.
The ability of find address is 8KB
we need 13 address line.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Address line : determines the ability to find address
Data line ; how large data can be transported
Control line :
As all the data is just a mass of binary bit
So one line to transport one bit
e.g.
16 line to transport 2B data
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
\\ Let's try sth.
DOSBOX debug -e B800:400
1 space 1 space 2 space 2 space
However this is not memory but video card
that's to say memory or say memory space:
- main memory 内存条 start address no. - end address no.
cpu control - main board - video card start address no. - end address no.
- ROM
RAM:
allow to read and write but will lose
ROM:
don;t allow to write but always store
////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
how cpu contact with keyboard or microphone or ....
Port 端口 -> transport data
cpu is just a chip
mouse or keyboard also have chip
chip -> store data and instruction
e.g
cpu
| through three kinds of line
keyboard give data to - port give data through three electric line- main board
How cpu get to port like address
through port no.
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
GPU : 图形处理芯片
Originally there isn't gpu , it done by cpu
Nowadays they 分工 due to the huge work of dealing graphics
gpu has its own code.
We can treat B800:0000 and later as video card
Assemble language is for cpu
Since there is three line:
So cpu has place to store address information data information and control information
That is register (寄存器)
Assemble language control this to control cpu so computer is controlled
AX register is data register
There is also address register and so on
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
AX,BX,CX,DX is general register to store data , data register
They are hex register, from 0 to 2^16-1
However, they can divided into octo register
The first eight bit is AH high oct register
The last eigth bit is AL low oct register
Purposes One: to be compatible with the former program with oct register
Purpose Two: hex register with 16 data line can handle two kinds of information
1 Byte 字节型数据
Or
2 Byte 字型数据
for 2Byte the first eight bit 高位字节
the other 低位字节
this is special for AX ,BX ,CX, DX because onlly them can divided 16 register to 8 register
/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Address Register
073F:0100
段地址:偏移地址
Segement address: Offset address
In cpu
Segement Address Register: Offset address Register:
ds sp
es bp
ss si
cs di
ip
bx
The physics address = segement address * 16D(10H) + offset address
That's because:
In 80x86 cpu, there is 20 address line, however the address just four hexa digit that is to say the address no. it can display is 0000 - FFFF, but the
ability of finding address is actually from 00000 - FFFFF
As a result :
The designer figure out a way that use two four hexa digit address to represents the five hexa digit real physics address as 80x86 cpu only has the ability
to represents a four hexa digit address.
\\ Let's try sth.
-a
mov bx, 1000
mov ds,bx
mov ax,[0]
the above code:
1. we put 1000H into bx, a data register
2. we put the value of bx into ds, a address register, since ds is a segement address register, so now the segement address in ds is 1000H rather than the 073F
which we always see
3. this time we try to get the data from the memory unit rather than the register , '[...]' represents a memory unit, the content represents the offset address,
when the code run , by default it takes the value of ds as the segement address. So for this code, it's actually try to move the data in address no. 1000H * 10H + 0H
that is 10000H into the ax, a data register.
* ps: here we can't directly use mov ds,1000 to set the segement address due to the physics design of 80x86 cpu, so we need to use a transfer register, say bx
to help us move the new address into ds.
The calculation is done by address adder!
However ! What's the cost?
Once you input the wrong segement address , you can never get the physics address due to the limitation.
\\ let's try sth.
debug
-d segement address: offset address
-e 1 space 1 space (repeat for some times)
-d segement address: offset address
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
Previously we always use instruction -u and -d and found some relationship between them
Actually:
u : treat the digits stored from certain start address as instruction.
d : treat the digits stored from certain start address as data.
Originally when you input -u instruction: you can always see 073F:....
However you will also see that ds,cs,ss,es also store 073F. which point to this?
Also when we use -a to revise instruction, the digits stored by IP is also changed.
\\ Let's try sth.
-r ds 0
-r
then you will find ds is changed but address still start from 073F
Until you try cs you will find it changed.
Actually
80x86 cpu treat all the digits from CS:IP as instructions
That's how cpu tell apart data and instructions. Since in memory data and instructions are the same, are binary bits.
it's sensible to assign certain part as the instrcutions.
In fact: CS is called Code Segement register(CS)代码段寄存器 IP is called Instruction Pointer rregister(IP)指令指针寄存器
\\ Let's try sth.
-r CS
2000
-e 2000:0000
B8 24 10 BB 24 10
-u
-t
-t
so this time the data after 2000:0000 is treated as instructions
The following picture is a summarized process:
CPU memory
--------------------------------------------- -----------
AX------- addree adder B8 20000H
CS 2000 ------------ 24 20001H B82410 : MOV AX,1024
------- IP 0000 10 20002H
BX------- ------------
||
------- ||
\/
instruction buffer -------------
------- <-- 20 address line
------- ------------
| data line
| ------------
------- -------------
------- input output circuits
executor
----------------------------------------------
Whenthe cpu read the data after CS:IP, it will treat it as instruction and put thenm into instruction buffer.The ip will += the length of instruction. Then it will execute.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
Revise CS and IP in Assemble Language
Assemble Instruction : jmp 转移指令
\\et's try sth.
-a jmp 2000:0
-t
Then you will find CS and Ip is changed.
Since the 80x86 cpu don;t allow mov cs and mov ip and also mov ip,ax
we an use jmp
Another Grammer:
jmp register
== mov ip,register seemingly the same meaning but not available
*ps: if a instruction follow a jump instruction. it will only be pointed but not readed into instruction buffer.
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
debug tool
r check the content or revise it in the register
-r register change the content in register
d check the content in the memory
-d segement address:offset address the memory start from the input address(default CS:IP if there isn't)
-d segement address:start offset address end offset address the memory start from the input start and end at the input end
u show the content in memory as assemble instruction
-u segement address:offset address the memory start from the input address
-u segement address:start offset address end offset address the memory start from the input start and end at the input end
a put the data of assemble language into memory
-a segement address:offset address assemble instructions put the assemble language data into memory
t execute the instructions pointed by CS:IP
e revise the content in the memory
-e segement address:offset address the memory start from this, use space to continuesly revise every byte
-e segement address:offset address "string" from the start memory stored into the ASCII code of the string
#######################################################################################
#
# register
#
# #####################################################################################
where does the instruction "call" store the IP address?
First let us see the type of data
1.byte data(字节型数据) 1Byte
2.word data(字型数据) 2Byte
so a word data in memory is represented in two memory unit
the high address store the high byte
the low address store the low byte
that's why the machine code of "MOV ax, 1024" is "b8 24 10"
(suppose b8 stored in 2000:0000)
24 is the low byte , so stored in low address
10 is the high byte , so stored in high address
##########################################################################################
mov the data in memory into register
as we said , ds is a segement address register and also dosbox use it as a default register
we can use instruction like: mov ax,ds:[0] or mov al, ds:[0]
here we give the offset address, the content in the [] is read
that's how we operate on memory unit
However, in dosbox, it didn't support such kind of format
Since teh default register is DS, we can then write the instruction like : mov ax,[0]
we just enter the offet address.
More importantly, the compiler decide which kind of data you want to read based on the type of
register. AX then word data, al then byte data.
we make a brief conclusion to instruction mov:
1. mov register, register
2. mov register, data
3. mov register, [offset address]
###########
mov add sub
###########
##############################################################################################
data segement
: it's just an arrangement of coding
e.g. 10000H 01
10001H 02
10002H 03
this is how you store byte data
However
10000H 01
10001H 00
10002H 02
10003H 00
10004H 03
10005H 00
this is how you store word data
###############################################################################################
// now we can make a conclusion:
CS:IP decide where is instructions
DS decide where is data
################################################################################################
#
# stack
#
################################################################################################
stack:
to predict the program with data structure
LIFO rule
################################################################################################
For stack , we need two operation :
push register/memory push the data in register/memory into the stack, renew the pointer to the top of stack
pop register/memory pop up the data from the stack and put it into register/memory, renew the pointer to the top of stack
Other things we need to consider
In 8086 cpu
we use segement address register SS and offset address register SP treated as the top of the stack
As a result
push register/memory SP -= 2 restore the data into memory unit SS:SP
pop register/memory get the data from SS:SP and store into register
As you can see:
the stack is developed from high address to low address
# Here comes a question:
what about the stack is empty?
so the SS:SP point at the address higher than the first word data will be
PS: when you pop out the data, it didn't eliminate, it still in memory but once you push another things it will be covered
############################################################################################################################
As the top of stack is decided by ss:sp at any time
so where we put the stack and how much is the stack is all decided by programmer.
Once you determine the length(bytes) and the terminal address(it must be the lowest one),
then the origin top ss:sp is nothing but terminal ss:sp + 16(10H)
!!!!
Let's try sth:
debug
-a
mov ax,2000
mov ss,ax
mov sp,10
-t
-t
you will find that when you execute mov ss,ax
the instruction mov sp,10 is also executed
! So far, Just remember it.
#################################################################################################################################
stack out of the bounds, overflow
we may pick a extreme example
e.g
ss:sp = 2000:0010
when you execute push ax for eight times
it will become ss:sp = 2000:0000
if you do it again
it will be 2000:FFFE
That may not only cover other important data but also to some extent didn't follow the development of a stack from high to low
// safety problem
check the stack length don't overflow
what about the max length ?
Of course the range of SP
that is to say 65536 bytes that is 32768 word data
now you will find ss:sp = 2000:0000 can help you to cover all the range of SP, quite like the above examples.
Once the sp = 0 again and you push, it will cover the originally pushed data.
###################################################################################################################################
The function of Stack
: temporarily store the data
now we can go back to the assemble instruction call and ret
call will help you jump to another instruction segement
and then ret help you to go back to go on execute the left instructions.
so actually the next address of instruction call is stored in stack
and ret pop it out.
!!!!
This explain why when we design cpu , we want the IP to increase even after the instruction which can jump
because we can store where the IP point and then return back if necessary
The stack is also the function stack we often use in senior programming language, also why we say use a function as call a function
ret is just return it help to pop out the stack and go on execute the external function segement.
#####################################################################################################################################3
Although we already said when we pop out the data still exist, it just revise the sp
However, when you do it , you will find it isn't true.
e.g
debug
a
mov ax,2233
push ax
pop ax
d (address to check)
you may find it become A3 01 (01A3 actually)
In fact, this data come when you execute mov ax, 2233
you will see a list of data is already put
33 22 00 00 03 01 3f 07 A3 01
Here 33 22 is the data of ax
00 00 is the data in register BP
03 01 is of course the data in IP
3f 07 is the data in CS
later will tell the reason
##########################################################################################################################################
before we go to the coding part
there is sth. to mention:
1. safety
it's very dangerous to use instruction like a,e to directly control the memory
it will make program or even system crash
Usually we use the memory distributed by operating system
it help us to distribute different memory for different program and itself memory
So with the operating system, under its allowance program can get memory
1. when loading the program the os will distribute memory
2. during the running, program can ask for extra memory
PS: 0:200 - 0:2ff this segement is safe memory, no program will use it, but it's too small
2.exe
source code
after compile -> link -> exe
when exe is going to run , os give it memory , which is safe
###############################################################################################################################################
#
# program
#
################################################################################################################################################
compile
compile asm -> obj
link
link obj -> exe
why we need two step
because we can divide a project into many parts and compile respectively, so if there is any change , we just compile the changed one
#################################################################################################################################################
exe
how does the os know how much memory to give the exe
Because except code, it also provide discriptive information: how large is the document, where the program starts
That's what pesudo instruction "start" do
to help record the entry point of program
Then os know how to set cs:ip
PS: you can link the obj by yourself and ask it to generate a map document, you can see the extra information
####################################################################################################################################################
Source code
assemble code is composed of:
1.assemble instruction
2.pesudo instruction (tell compiler)
3.symbol (compiler will deal with things like integer or calculation)
See this code:
assume cs:code,ds:data,ss:stack
data segment
db 128 dup (0)
data ends
stack segment stack
db 128 dup (0)
stack ends
code segment
start: mov ax,stack
mov ss,ax
mov sp,128
mov ax,4C00H
int 21H
code ends
end start
pesudo syntax: segement, ends
to define a segement, to store code, data or stack
pesudo syntax: end
tell the compiler the program is over
pesudo syntax: assume
connect the label of segement to the corresponding register
assemble program
they will compiled into machine code and stored into exe
Label
it just represent an address, the label will be compiled into the segement address of that segement
Return of program
mov ax, 4c00h
int 21h
they realize the return
then os will release the memory occupied by this program
################################################################################################################################################
Who put the exe into memory to let it run?
In DOS, if program p1 is going to run, there must be a program p2 to load it into memory and transfer the right to control cpu
after done, p1 will return back the right to control
For any general OS, they will provide shell program which is used by user to operate computer system
In DOS, it's called command.com, the shell of DOS(命令解释器)
when DOS is started, it finish the initialization and then it will run command.com, when command.com finish relative task, it will show the path
corresponding to the current path like : "c:/", "c:/wiindows"
So when user want to run a exe, command find it and load it into memory and set CS:IP. Then command will stop running, cpu will run the exe, when
it finish, return back to command
#####################################################################################################################################################
Program tracking
we can use instruction "debug" to track it
when we use Debug.exe, it didn't give up the control of cpu, so we can run the program one by one
debug will help to load the program into memory and set cs:ip
when debug finishing loading the program into memory
use "r" to check register, in CX, it store the bytes length of program
PSP district
from ds:0 to ds:100 total 256 bytes
DOS system use this district to contact with the loaded program
After psp it's just the loaded program
so the address is just cs:ip(if the code is put at first)
#########################################################################################################################################################
something about coding;
1.if the data is start with alpha, eg. B800H then write it as 0B800H
2. remeber the difference between 10 and 10H
3. ; is notation
///////////////
conclusion
exe is runned by cpu and os control it. So run exe is just let os to distribute memory to it.
#############################################################################################################################################################