perloptree.pod 58 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394139513961397139813991400140114021403140414051406140714081409141014111412141314141415141614171418141914201421142214231424142514261427142814291430143114321433143414351436
  1. =head1 NAME
  2. perloptree - The Perl op tree
  3. =head1 DESCRIPTION
  4. Various material about the internal Perl compilation representation
  5. during parsing and optimization, before the actual execution
  6. begins, represented as C<B> objects, the B<"B" op tree>.
  7. The well-known L<perlguts>.pod focuses more on the internal
  8. representation of the variables, but not so on the structure, the
  9. sequence and the optimization of the basic operations, the ops.
  10. L<illguts>.pod, the "illustrated guts", explains the main data structures
  11. in an easier to understand way than perlguts.
  12. And we have L<perlhack>.pod, which shows e.g. ways to hack into
  13. the op tree structure within the debugger. It focuses on getting
  14. people to start patching and hacking on the CORE, not
  15. understanding or writing compiler backends or optimizations,
  16. which the op tree mainly is used for.
  17. =head1 Brief Summary
  18. The brief summary is very well described in the
  19. L<"Compiled-code"/perlguts#Compiled-code> section of L<perlguts> and
  20. at the top of F<op.c>.
  21. When Perl parses the source code (via Yacc C<perly.y>), the so-called
  22. op tree, a tree of basic perl OP structs pointing to simple
  23. C<pp_>I<opname> functions, is generated bottom-up. Those C<pp_>
  24. functions - "PP Code" (for "Push / Pop Code") - have the same uniform
  25. API as the XS functions, all arguments and return values are
  26. transported on the stack. For example, an C<OP_CONST> op points to
  27. the C<pp_const()> function and to an C<SV> containing the constant
  28. value. When C<pp_const()> is executed, its job is to push that C<SV>
  29. onto the stack.
  30. OPs are created by the C<newFOO()> functions, which are called
  31. from the parser (in F<perly.y>) as the code is parsed. For
  32. example the Perl code C<$a + $b * $c> would cause the equivalent
  33. of the following to be called (oversimplifying a bit):
  34. newBINOP(OP_ADD, flags,
  35. newSVREF($a),
  36. newBINOP(OP_MULTIPLY, flags, newSVREF($b), newSVREF($c))
  37. )
  38. See also L<perlintern/"OP TREES">
  39. The simpliest type of an op structure is C<OP>, a L</BASEOP>: this
  40. has no children. Unary operators, L</UNOP>s, have one child, and
  41. this is pointed to by the C<op_first> field. Binary operators
  42. (L</BINOP>s) have not only an C<op_first> field but also an
  43. C<op_last> field. The most complex type of op is a L</LISTOP>,
  44. which has any number of children. In this case, the first child
  45. is pointed to by C<op_first> and the last child by
  46. C<op_last>. The children in between can be found by iteratively
  47. following the C<op_sibling> pointer from the first child to the
  48. last.
  49. There are also two other op types: a L</"PMOP"> holds a regular
  50. expression, and has no children, and a L</"LOOP"> may or may not
  51. have children. If the C<op_sibling> field is non-zero, it behaves
  52. like a C<LISTOP>. To complicate matters, if an C<UNOP> is
  53. actually a null op after optimization (see L</"Compile pass 2:
  54. context propagation"> below) it will still have children in
  55. accordance with its former type.
  56. The beautiful thing about the op tree representation is that it
  57. is a strict 1:1 mapping to the actual source code, which is
  58. proven by the L<B::Deparse> module, which generates readable
  59. source for the current op tree. Well, almost.
  60. =head1 The Compiler
  61. Perl's compiler is essentially a 3-pass compiler with interleaved
  62. phases:
  63. 1. A bottom-up pass
  64. 2. A top-down pass
  65. 3. An execution-order pass
  66. =head2 Compile pass 1: check routines and constant folding
  67. The bottom-up pass is represented by all the C<"newOP"> routines
  68. and the C<ck_> routines. The bottom-upness is actually driven by
  69. F<yacc>. So at the point that a C<ck_> routine fires, we have no
  70. idea what the context is, either upward in the syntax tree, or
  71. either forward or backward in the execution order. The bottom-up
  72. parser builds that part of the execution order it knows about,
  73. but if you follow the "next" links around, you'll find it's
  74. actually a closed loop through the top level node.
  75. So when creating the ops in the first step, still bottom-up, for
  76. each op a check function (C<ck_ ()>) is called, which which
  77. theroretically may destructively modify the whole tree, but
  78. because it knows almost nothing, it mostly just nullifies the
  79. current op. Or it might set the L</op_next> pointer. See
  80. L</"Check Functions"> for more.
  81. Also, the subsequent constant folding routine C<fold_constants()>
  82. may fold certain arithmetic op sequences. See L</"Constant Folding">
  83. for more.
  84. =head2 Compile pass 2: context propagation
  85. The context determines the type of the return value. When a
  86. context for a part of compile tree is known, it is propagated
  87. down through the tree. At this time the context can have 5 values
  88. (instead of 2 for runtime context): C<void>, C<boolean>,
  89. C<scalar>, C<list>, and C<lvalue>. In contrast with the pass 1
  90. this pass is processed from top to bottom: a node's context
  91. determines the context for its children.
  92. Whenever the bottom-up parser gets to a node that supplies
  93. context to its components, it invokes that portion of the
  94. top-down pass that applies to that part of the subtree (and marks
  95. the top node as processed, so if a node further up supplies
  96. context, it doesn't have to take the plunge again). As a
  97. particular subcase of this, as the new node is built, it takes
  98. all the closed execution loops of its subcomponents and links
  99. them into a new closed loop for the higher level node. But it's
  100. still not the real execution order.
  101. I<Todo: Sample>
  102. Additional context-dependent optimizations are performed at this
  103. time. Since at this moment the compile tree contains back-references
  104. (via "thread" pointers), nodes cannot be C<free()>d now. To allow
  105. optimized-away nodes at this stage, such nodes are C<null()>ified
  106. instead of C<free()>'ing (i.e. their type is changed to C<OP_NULL>).
  107. =head2 Compile pass 3: peephole optimization
  108. The actual execution order is not known till we get a grammar
  109. reduction to a top-level unit like a subroutine or file that will
  110. be called by "name" rather than via a "next" pointer. At that
  111. point, we can call into peep() to do that code's portion of the
  112. 3rd pass. It has to be recursive, but it's recursive on basic
  113. blocks, not on tree nodes.
  114. So finally, when the full parse tree is generated, the "peephole
  115. optimizer" C<peep()> is running. This pass is neither top-down
  116. or bottom-up, but in the execution order with additional
  117. complications for conditionals.
  118. This examines each op in the tree and attempts to determine "local"
  119. optimizations by "thinking ahead" one or two ops and seeing if
  120. multiple operations can be combined into one (by nullifying and
  121. re-ordering the next pointers).
  122. It also checks for lexical issues such as the effect of C<use
  123. strict> on bareword constants. Note that since the last walk the
  124. early sibling pointers for recursive (bottom-up) meta-inspection
  125. are useless, the final exec order is guaranteed by the next and
  126. flags fields.
  127. If write an rpeep extension by your own, beware that the default mode
  128. of peep is to nullify ops.
  129. =head1 basic vs exec order
  130. The highly recursive Yacc parser generates the initial op tree in
  131. B<basic> order. To save memory and run-time the final execution
  132. order of the ops in sequential order is not copied around, just
  133. the next pointers are rehooked in C<Perl_linklist()> to the
  134. so-called B<exec> order. So the exec walk through the
  135. linked-list of ops is not too cache-friendly.
  136. In detail C<Perl_linklist()> traverses the op tree, and sets
  137. op-next pointers to give the execution order for that op
  138. tree. op-sibling pointers are rarely unneeded after that.
  139. Walkers can run in "basic" or "exec" order. "basic" is useful
  140. for the memory layout, it contains the history, "exec" is more
  141. useful to understand the logic and program flow. The
  142. L</B::Bytecode> section has an extensive example about the order.
  143. =head1 OP Structure and Inheritance
  144. The basic C<struct op> looks basically like
  145. C<{ OP* op_next, OP* op_sibling, OP* op_ppaddr, ..., int op_flags, int op_private } OP;>
  146. See L</BASEOP> below.
  147. Each op is defined in size, arguments, return values, class and
  148. more in the F<opcode.pl> table. (See L</"OP Class Declarations in
  149. opcode.pl"> below.)
  150. The class of an OP determines its size and the number of
  151. children. But the number and type of arguments is not so easy to
  152. declare as in C. F<opcode.pl> tries to declare some XS-prototype
  153. like arguments, but in lisp we would say most ops are "special"
  154. functions, context-dependent, with special parsing and precedence rules.
  155. F<B.pm> L<http://search.cpan.org/perldoc?B> contains these
  156. classes and inheritance:
  157. @B::OP::ISA = 'B::OBJECT';
  158. @B::UNOP::ISA = 'B::OP';
  159. @B::BINOP::ISA = 'B::UNOP';
  160. @B::LOGOP::ISA = 'B::UNOP';
  161. @B::LISTOP::ISA = 'B::BINOP';
  162. @B::SVOP::ISA = 'B::OP';
  163. @B::PADOP::ISA = 'B::OP';
  164. @B::PVOP::ISA = 'B::OP';
  165. @B::LOOP::ISA = 'B::LISTOP';
  166. @B::PMOP::ISA = 'B::LISTOP';
  167. @B::COP::ISA = 'B::OP';
  168. @B::SPECIAL::ISA = 'B::OBJECT';
  169. @B::optype = qw(OP UNOP BINOP LOGOP LISTOP PMOP SVOP PADOP PVOP LOOP COP);
  170. I<TODO: ascii graph from perlguts>
  171. F<op.h> L<http://search.cpan.org/src/JESSE/perl-5.12.1/op.h>
  172. contains all the gory details. Let's check it out:
  173. =head2 OP Class Declarations in opcode.pl
  174. The full list of op declarations is defined as C<DATA> in
  175. F<opcode.pl>. It defines the class, the name, some flags, and
  176. the argument types, the so-called "operands". C<make regen> (via
  177. F<regen.pl>) recreates out of this DATA table the files
  178. F<opcode.h>, F<opnames.h>, F<pp_proto.h> and F<pp.sym>.
  179. The class signifiers in F<opcode.pl> are:
  180. baseop - 0 unop - 1 binop - 2
  181. logop - | listop - @ pmop - /
  182. padop/svop - $ padop - # (unused) loop - {
  183. baseop/unop - % loopexop - } filestatop - -
  184. pvop/svop - " cop - ;
  185. Other options within F<opcode.pl> are:
  186. needs stack mark - m
  187. needs constant folding - f
  188. produces a scalar - s
  189. produces an integer - i
  190. needs a target - t
  191. target can be in a pad - T
  192. has a corresponding integer version - I
  193. has side effects - d
  194. uses $_ if no argument given - u
  195. Values for the operands are:
  196. scalar - S list - L array - A
  197. hash - H sub (CV) - C file - F
  198. socket - Fs filetest - F- reference - R
  199. "?" denotes an optional operand.
  200. =head2 BASEOP
  201. All op classes have a single character signifier for easier
  202. definition in F<opcode.pl>. The BASEOP class signifier is B<0>,
  203. for no children.
  204. Below are the BASEOP fields, which reflect the object C<B::OP>,
  205. since Perl 5.10. These are shared for all op classes. The parts
  206. after C<op_type> and before C<op_flags> changed during history.
  207. =over
  208. =item op_next
  209. Pointer to next op to execute after this one.
  210. Top level pre-grafted op points to first op, but this is replaced
  211. when op is grafted in, when this op will point to the real next
  212. op, and the new parent takes over role of remembering the
  213. starting op. I<Now, who wrote this prose? Anyway, that is why it
  214. is called guts.>
  215. =item op_sibling
  216. Pointer to connect the children's list.
  217. The first child is L</op_first>, the last is L</op_last>, and the
  218. children in between are interconnected by op_sibling. This is at
  219. run-time only used for L</LISTOP>s.
  220. So why is it in the BASEOP struct carried around for every op?
  221. Because of the complicated Yacc parsing and later optimization
  222. order as explained in L<"Compile pass 1: check routines and
  223. constant folding"> the L</op_next> pointers are not enough, so
  224. op_sibling's are required. The final and fast execution order by
  225. just following the op_next chain is expensive to calculate.
  226. See
  227. http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2006-09/msg00082.html
  228. for a 20% space-reduction patch to get rid of it at run-time.
  229. =item op_ppaddr
  230. Pointer to current ppcode's function.
  231. The so called "opcode".
  232. =item op_madprop
  233. Pointer to the MADPROP struct. Only with -DMAD, and since
  234. 5.10. See L</MAD> (Misc Attribute Decoration) below.
  235. =item op_targ
  236. PADOFFSET to lexicals vars or when threaded also to GVs. Mainly used
  237. as index into the curpad to access lexical vars. When the op is
  238. nullified the targ holds the previous type.
  239. =item op_type
  240. The type of the op. See F<opnames.h>
  241. Since 5.10 we have the next five fields added, which replace
  242. C<U16 op_seq>.
  243. =item op_opt
  244. "optimized"
  245. Whether or not the op has been optimised, i.e nullified, by the
  246. peephole optimiser.
  247. See the comments in C<S_clear_yystack()> in F<perly.c> for more
  248. details on the following three flags. They are just for freeing
  249. temporary ops on the stack. But we might have statically
  250. allocated op in the data segment, esp. with the perl compiler's
  251. L<B::C> module. Then we are not allowed to free those static
  252. ops. For a short time, from 5.9.0 until 5.9.4, until the B::C
  253. module was removed from CORE, we had another field here for this
  254. reason: B<op_static>. On 1 it didn't free the static op. Before
  255. 5.9.0 the C<op_seq> field was used with the magic value B<-1> to
  256. indicate a static op, not to be freed. Note: Trying to free a
  257. static struct is considered harmful.
  258. =item op_latefree
  259. Tell C<op_free()> to clear this op (and free any kids) but not
  260. yet deallocate the struct. This means that the op may be safely
  261. C<op_free()>d multiple times.
  262. On static ops you just set this to B<1> and after the first
  263. C<op_free()> the C<op_latefreed> is automatically set and further
  264. C<op_free()> called are just ignored.
  265. =item op_latefreed
  266. If 1, an C<op_latefree> op has been C<op_free()>d.
  267. =item op_attached
  268. This op (sub)tree has been attached to the CV C<PL_compcv> so it
  269. doesn't need to be free'd.
  270. =item op_spare
  271. Three spare bits in this bitfield above. At least they survived 5.10.
  272. =item op_static
  273. This op has been allocated statically, usually with the compiler or
  274. within embedded applications. On destruction this op will not be
  275. freed.
  276. This bit came and went and came again in various perl versions. It
  277. was defined until 5.10, and came again with 5.18, because then
  278. latefree was gone.
  279. Those last two fields have been in all perls:
  280. =item op_flags
  281. Flags common to all operations.
  282. See C<OPf_*> in F<op.h>, or more verbose in L<B::Flags> or F<dump.c>
  283. =item op_private
  284. Flags peculiar to a particular operation (BUT, by default, set to
  285. the number of children until the operation is privatized by a
  286. check routine, which may or may not check number of children).
  287. This flag is normally used to hold op specific context hints,
  288. such as C<HINT_INTEGER>. This flag is directly attached to each
  289. relevant op in the subtree of the context. Note that there's no
  290. general context or class pointer for each op, a typical
  291. functional language usually holds this in the ops arguments. So
  292. we are limited to max 32 lexical pragma hints or less. See
  293. L</Lexical Pragmas>.
  294. =back
  295. The exact op.h L</BASEOP> history for the parts after C<op_type> and
  296. before C<op_flags> is:
  297. <=5.8: U16 op_seq;
  298. 5.9.4: unsigned op_opt:1; unsigned op_static:1; unsigned op_spare:5;
  299. >=5.10: unsigned op_opt:1; unsigned op_latefree:1; unsigned op_latefreed:1;
  300. unsigned op_attached:1; unsigned op_spare:3;
  301. The L</BASEOP> class signifier is B<0>, for no children.
  302. The full list of all BASEOP's is:
  303. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /0$/' opcode.pl
  304. null null operation ck_null 0
  305. stub stub ck_null 0
  306. pushmark pushmark ck_null s0
  307. wantarray wantarray ck_null is0
  308. padsv private variable ck_null ds0
  309. padav private array ck_null d0
  310. padhv private hash ck_null d0
  311. padany private value ck_null d0
  312. sassign scalar assignment ck_sassign s0
  313. unstack iteration finalizer ck_null s0
  314. enter block entry ck_null 0
  315. iter foreach loop iterator ck_null 0
  316. break break ck_null 0
  317. continue continue ck_null 0
  318. fork fork ck_null ist0
  319. wait wait ck_null isT0
  320. getppid getppid ck_null isT0
  321. time time ck_null isT0
  322. tms times ck_null 0
  323. ghostent gethostent ck_null 0
  324. gnetent getnetent ck_null 0
  325. gprotoent getprotoent ck_null 0
  326. gservent getservent ck_null 0
  327. ehostent endhostent ck_null is0
  328. enetent endnetent ck_null is0
  329. eprotoent endprotoent ck_null is0
  330. eservent endservent ck_null is0
  331. gpwent getpwent ck_null 0
  332. spwent setpwent ck_null is0
  333. epwent endpwent ck_null is0
  334. ggrent getgrent ck_null 0
  335. sgrent setgrent ck_null is0
  336. egrent endgrent ck_null is0
  337. getlogin getlogin ck_null st0
  338. custom unknown custom operator ck_null 0
  339. =head3 null
  340. null ops are skipped during the runloop, and are created by the peephole optimizer.
  341. =head2 UNOP
  342. X<op_first>
  343. The unary op class signifier is B<1>, for one child, pointed to
  344. by C<op_first>.
  345. struct unop {
  346. BASEOP
  347. OP * op_first;
  348. }
  349. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /1$/' opcode.pl
  350. rv2gv ref-to-glob cast ck_rvconst ds1
  351. rv2sv scalar dereference ck_rvconst ds1
  352. av2arylen array length ck_null is1
  353. rv2cv subroutine dereference ck_rvconst d1
  354. refgen reference constructor ck_spair m1 L
  355. srefgen single ref constructor ck_null fs1 S
  356. regcmaybe regexp internal guard ck_fun s1 S
  357. regcreset regexp internal reset ck_fun s1 S
  358. preinc preincrement (++) ck_lfun dIs1 S
  359. i_preinc integer preincrement (++) ck_lfun dis1 S
  360. predec predecrement (--) ck_lfun dIs1 S
  361. i_predec integer predecrement (--) ck_lfun dis1 S
  362. postinc postincrement (++) ck_lfun dIst1 S
  363. i_postinc integer postincrement (++) ck_lfun disT1 S
  364. postdec postdecrement (--) ck_lfun dIst1 S
  365. i_postdec integer postdecrement (--) ck_lfun disT1 S
  366. negate negation (-) ck_null Ifst1 S
  367. i_negate integer negation (-) ck_null ifsT1 S
  368. not not ck_null ifs1 S
  369. complement 1's complement (~) ck_bitop fst1 S
  370. rv2av array dereference ck_rvconst dt1
  371. rv2hv hash dereference ck_rvconst dt1
  372. flip range (or flip) ck_null 1 S S
  373. flop range (or flop) ck_null 1
  374. method method lookup ck_method d1
  375. entersub subroutine entry ck_subr dmt1 L
  376. leavesub subroutine exit ck_null 1
  377. leavesublv lvalue subroutine return ck_null 1
  378. leavegiven leave given block ck_null 1
  379. leavewhen leave when block ck_null 1
  380. leavewrite write exit ck_null 1
  381. dofile do "file" ck_fun d1 S
  382. leaveeval eval "string" exit ck_null 1 S
  383. #evalonce eval constant string ck_null d1 S
  384. =head2 BINOP
  385. X<op_last>
  386. The BINOP class signifier is B<2>, for two children, pointed to by
  387. C<op_first> and C<op_last>.
  388. struct binop {
  389. BASEOP
  390. OP * op_first;
  391. OP * op_last;
  392. }
  393. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /2$/' opcode.pl
  394. gelem glob elem ck_null d2 S S
  395. aassign list assignment ck_null t2 L L
  396. pow exponentiation (**) ck_null fsT2 S S
  397. multiply multiplication (*) ck_null IfsT2 S S
  398. i_multiply integer multiplication (*) ck_null ifsT2 S S
  399. divide division (/) ck_null IfsT2 S S
  400. i_divide integer division (/) ck_null ifsT2 S S
  401. modulo modulus (%) ck_null IifsT2 S S
  402. i_modulo integer modulus (%) ck_null ifsT2 S S
  403. repeat repeat (x) ck_repeat mt2 L S
  404. add addition (+) ck_null IfsT2 S S
  405. i_add integer addition (+) ck_null ifsT2 S S
  406. subtract subtraction (-) ck_null IfsT2 S S
  407. i_subtract integer subtraction (-) ck_null ifsT2 S S
  408. concat concatenation (.) or string ck_concat fsT2 S S
  409. left_shift left bitshift (<<) ck_bitop fsT2 S S
  410. right_shift right bitshift (>>) ck_bitop fsT2 S S
  411. lt numeric lt (<) ck_null Iifs2 S S
  412. i_lt integer lt (<) ck_null ifs2 S S
  413. gt numeric gt (>) ck_null Iifs2 S S
  414. i_gt integer gt (>) ck_null ifs2 S S
  415. le numeric le (<=) ck_null Iifs2 S S
  416. i_le integer le (<=) ck_null ifs2 S S
  417. ge numeric ge (>=) ck_null Iifs2 S S
  418. i_ge integer ge (>=) ck_null ifs2 S S
  419. eq numeric eq (==) ck_null Iifs2 S S
  420. i_eq integer eq (==) ck_null ifs2 S S
  421. ne numeric ne (!=) ck_null Iifs2 S S
  422. i_ne integer ne (!=) ck_null ifs2 S S
  423. ncmp numeric comparison (<=>)ck_null Iifst2 S S
  424. i_ncmp integer comparison (<=>)ck_null ifst2 S S
  425. slt string lt ck_null ifs2 S S
  426. sgt string gt ck_null ifs2 S S
  427. sle string le ck_null ifs2 S S
  428. sge string ge ck_null ifs2 S S
  429. seq string eq ck_null ifs2 S S
  430. sne string ne ck_null ifs2 S S
  431. scmp string comparison (cmp) ck_null ifst2 S S
  432. bit_and bitwise and (&) ck_bitop fst2 S S
  433. bit_xor bitwise xor (^) ck_bitop fst2 S S
  434. bit_or bitwise or (|) ck_bitop fst2 S S
  435. smartmatch smart match ck_smartmatch s2
  436. aelem array element ck_null s2 A S
  437. helem hash element ck_null s2 H S
  438. lslice list slice ck_null 2 H L L
  439. xor logical xor ck_null fs2 S S
  440. leaveloop loop exit ck_null 2
  441. =head2 LOGOP
  442. X<op_other>
  443. The LOGOP class signifier is B<|>.
  444. A LOGOP has the same structure as a L</BINOP>, two children, just the
  445. second field has another name C<op_other> instead of C<op_last>.
  446. But as you see on the list below, the two arguments as above are optional and
  447. not strictly required.
  448. struct logop {
  449. BASEOP
  450. OP * op_first;
  451. OP * op_other;
  452. };
  453. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /\|$/' opcode.pl
  454. regcomp regexp compilation ck_null s| S
  455. substcont substitution iterator ck_null dis|
  456. grepwhile grep iterator ck_null dt|
  457. mapwhile map iterator ck_null dt|
  458. range flipflop ck_null | S S
  459. and logical and (&&) ck_null |
  460. or logical or (||) ck_null |
  461. dor defined or (//) ck_null |
  462. cond_expr conditional expression ck_null d|
  463. andassign logical and assignment (&&=) ck_null s|
  464. orassign logical or assignment (||=) ck_null s|
  465. dorassign defined or assignment (//=) ck_null s|
  466. entergiven given() ck_null d|
  467. enterwhen when() ck_null d|
  468. entertry eval {block} ck_null |
  469. once once ck_null |
  470. =head3 and
  471. Checks for falseness on the first argument on the stack.
  472. If false, returns immediately, keeping the false value on the stack.
  473. If true pops the stack, and returns the op at C<op_other>.
  474. Note: B<and> is also used for a simple B<if> without B<else>/B<elsif>.
  475. The general B<if> is done with L<cond_expr>.
  476. =head3 cond_expr
  477. Checks for trueness on the first argument on the stack.
  478. If true returns the op at C<op_other>, if false C<op_next>.
  479. Note: A simple B<if> without else is done by L<and>.
  480. =head2 LISTOP
  481. X<op_last>
  482. The LISTOP class signifier is B<@>.
  483. struct listop {
  484. BASEOP
  485. OP * op_first;
  486. OP * op_last;
  487. };
  488. This is most complex type, it may have any number of children. The
  489. first child is pointed to by C<op_first> and the last child by
  490. C<op_last>. The children in between can be found by iteratively
  491. following the C<op_sibling> pointer from the first child to the last.
  492. At all 99 ops from 366 are LISTOP's. This is the least
  493. restrictive format, that's why.
  494. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /\@$/' opcode.pl
  495. bless bless ck_fun s@ S S?
  496. glob glob ck_glob t@ S?
  497. stringify string ck_fun fsT@ S
  498. atan2 atan2 ck_fun fsT@ S S
  499. substr substr ck_substr st@ S S S? S?
  500. vec vec ck_fun ist@ S S S
  501. index index ck_index isT@ S S S?
  502. rindex rindex ck_index isT@ S S S?
  503. sprintf sprintf ck_fun fmst@ S L
  504. formline formline ck_fun ms@ S L
  505. crypt crypt ck_fun fsT@ S S
  506. aslice array slice ck_null m@ A L
  507. hslice hash slice ck_null m@ H L
  508. unpack unpack ck_unpack @ S S?
  509. pack pack ck_fun mst@ S L
  510. split split ck_split t@ S S S
  511. join join or string ck_join mst@ S L
  512. list list ck_null m@ L
  513. anonlist anonymous list ([]) ck_fun ms@ L
  514. anonhash anonymous hash ({}) ck_fun ms@ L
  515. splice splice ck_fun m@ A S? S? L
  516. ... and so on, until
  517. syscall syscall ck_fun imst@ S L
  518. =head2 PMOP
  519. The PMOP "pattern matching" class signifier is B</> for matching.
  520. It inherits from the L</LISTOP>.
  521. The internal struct changed completely with 5.10, as the
  522. underlying engine. Starting with 5.11 the PMOP can even hold
  523. native L<"REGEX"/perlguts#REGEX> objects, not just SV's. So you
  524. have to use the C<PM> macros to stay compatible.
  525. Below is the current C<struct pmop>. You will not like it.
  526. struct pmop {
  527. BASEOP
  528. OP * op_first;
  529. OP * op_last;
  530. #ifdef USE_ITHREADS
  531. IV op_pmoffset;
  532. #else
  533. REGEXP * op_pmregexp; /* compiled expression */
  534. #endif
  535. U32 op_pmflags;
  536. union {
  537. OP * op_pmreplroot; /* For OP_SUBST */
  538. #ifdef USE_ITHREADS
  539. PADOFFSET op_pmtargetoff; /* For OP_PUSHRE */
  540. #else
  541. GV * op_pmtargetgv;
  542. #endif
  543. } op_pmreplrootu;
  544. union {
  545. OP * op_pmreplstart; /* Only used in OP_SUBST */
  546. #ifdef USE_ITHREADS
  547. char * op_pmstashpv; /* Only used in OP_MATCH, with PMf_ONCE set */
  548. #else
  549. HV * op_pmstash;
  550. #endif
  551. } op_pmstashstartu;
  552. };
  553. Before we had no union, but a C<op_pmnext>, which never worked.
  554. Maybe because of the typo in the comment.
  555. The old struct (up to 5.8.x) was as simple as:
  556. struct pmop {
  557. BASEOP
  558. OP * op_first;
  559. OP * op_last;
  560. U32 op_children;
  561. OP * op_pmreplroot;
  562. OP * op_pmreplstart;
  563. PMOP * op_pmnext; /* list of all scanpats */
  564. REGEXP * op_pmregexp; /* compiled expression */
  565. U16 op_pmflags;
  566. U16 op_pmpermflags;
  567. U8 op_pmdynflags;
  568. }
  569. So C<op_pmnext>, C<op_pmpermflags> and C<op_pmdynflags> are gone.
  570. The C<op_pmflags> are not the whole deal, there's also C<op_pmregexp.extflags>
  571. - interestingly called C<B::PMOP::reflags> in B - for the new features.
  572. This is btw. the only inconsistency in the B mapping.
  573. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /\/$/' opcode.pl
  574. pushre push regexp ck_null d/
  575. match pattern match (m//) ck_match d/
  576. qr pattern quote (qr//) ck_match s/
  577. subst substitution (s///) ck_match dis/ S
  578. =head2 SVOP
  579. The SVOP class is very special, and can even change dynamically.
  580. Whole SV's are costly and are now just used as GV or RV.
  581. The SVOP has no special signifier, as there are different subclasses.
  582. See L</"SVOP_OR_PADOP">, L</"PVOP_OR_SVOP"> and L</"FILESTATOP">.
  583. A SVOP holds a SV and is in case of an FILESTATOP the GV for the
  584. filehandle argument, and in case of C<trans> (a L</PVOP>) with utf8 a
  585. reference to a swash (i.e., an RV pointing to an HV).
  586. struct svop {
  587. BASEOP
  588. SV * op_sv;
  589. };
  590. Most old SVOP's were changed to L</PADOP>'s when threading was introduced, to
  591. privatize the global SV area to thread-local scratchpads.
  592. =head3 SVOP_OR_PADOP
  593. The op C<aelemfast> is either a L<PADOP> with threading and a simple L<SVOP> without.
  594. This is thanksfully known at compile-time.
  595. aelemfast constant array element ck_null s$ A S
  596. =head3 PVOP_OR_SVOP
  597. The only op here is C<trans>, where the class is dynamically defined,
  598. dependent on the utf8 settings in the L</op_private> hints.
  599. case OA_PVOP_OR_SVOP:
  600. return (o->op_private & (OPpTRANS_TO_UTF|OPpTRANS_FROM_UTF))
  601. ? OPc_SVOP : OPc_PVOP;
  602. trans transliteration (tr///) ck_null is" S
  603. Character translations (C<tr///>) are usually a L<PVOP>, keeping a pointer
  604. to a table of shorts used to look up translations. Under utf8,
  605. however, a simple table isn't practical; instead, the OP is an L</SVOP>,
  606. and the SV is a reference to a B<swash>, i.e. a RV pointing to an HV.
  607. =head2 PADOP
  608. The PADOP class signifier is B<$> for temp. scalars.
  609. A new C<PADOP> creates a new temporary scratchpad, an PADLIST array.
  610. C<padop->op_padix = pad_alloc(type, SVs_PADTMP);>
  611. C<SVs_PADTMP> are targets/GVs/constants with undef names.
  612. A C<PADLIST> scratchpad is a special context stack, a array-of-array data structure
  613. attached to a CV (i.e. a sub), to store lexical variables and opcode temporary and
  614. per-thread values. See L<perlguts/Scratchpads>.
  615. Only my/our variable (C<SVs_PADMY>/C<SVs_PADOUR>) slots get valid names.
  616. The rest are op targets/GVs/constants which are statically allocated
  617. or resolved at compile time. These don't have names by which they
  618. can be looked up from Perl code at run time through eval "" like
  619. my/our variables can be. Since they can't be looked up by "name"
  620. but only by their index allocated at compile time (which is usually
  621. in C<op_targ>), wasting a name SV for them doesn't make sense.
  622. struct padop {
  623. BASEOP
  624. PADOFFSET op_padix;
  625. };
  626. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /\$$/' opcode.pl
  627. const constant item ck_svconst s$
  628. gvsv scalar variable ck_null ds$
  629. gv glob value ck_null ds$
  630. anoncode anonymous subroutine ck_anoncode $
  631. rcatline append I/O operator ck_null t$
  632. aelemfast constant array element ck_null s$ A S
  633. method_named method with known name ck_null d$
  634. hintseval eval hints ck_svconst s$
  635. =head2 PVOP
  636. This is a simple unary op, holding a string.
  637. The only PVOP is C<trans> op for L<tr///>.
  638. See above at L</PVOP_OR_SVOP> for the dynamic nature of trans with utf8.
  639. The PVOP class signifier is C<"> for strings.
  640. struct pvop {
  641. BASEOP
  642. char * op_pv;
  643. };
  644. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /\"$/' opcode.pl
  645. trans transliteration (tr///) ck_match is" S
  646. =head2 LOOP
  647. The LOOP class signifier is B<{>.
  648. It inherits from the L</LISTOP>.
  649. struct loop {
  650. BASEOP
  651. OP * op_first;
  652. OP * op_last;
  653. OP * op_redoop;
  654. OP * op_nextop;
  655. OP * op_lastop;
  656. };
  657. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /\{$/' opcode.pl
  658. enteriter foreach loop entry ck_null d{
  659. enterloop loop entry ck_null d{
  660. =head2 COP
  661. The C<struct cop>, the "Control OP", changed recently a lot, as the L</BASEOP>.
  662. Remember from perlguts what a COP is? Got you. A COP is nowhere described.
  663. I would have naively called it "Context OP", but not "Control OP". So why?
  664. We have a global C<PL_curcop> and then we have threads. So it cannot be global
  665. anymore. A COP can be said as helper context for debugging and error information
  666. to store away file and line information. But since perl is a file-based
  667. compiler, not block-based, also file based pragmata and hints are stored in the
  668. COP. So we have for every source file a seperate COP. COP's are mostly not
  669. really block level contexts, just file and line information. The block level
  670. contexts are not controlled via COP's, but global C<Cx> structs.
  671. F<cop.h> says:
  672. Control ops (cops) are one of the two ops OP_NEXTSTATE and OP_DBSTATE
  673. that (loosely speaking) are separate statements. They hold
  674. information for lexical state and error reporting. At run time, C<PL_curcop> is set
  675. to point to the most recently executed cop, and thus can be used to determine
  676. our file-level current state.
  677. But we need block context, eval context, subroutine context, loop context, and
  678. even format context. All these are seperate structs defined in F<cop.h>.
  679. So the COPs are not really that important, as the actual C<Cx> context structs
  680. are. Just the C<CopSTASH> is, the current package symbol table hash ("stash").
  681. Another famous COP is C<PL_compiling>, which sets the temporary compilation
  682. environment.
  683. struct cop {
  684. BASEOP
  685. line_t cop_line; /* line # of this command */
  686. char * cop_label; /* label for this construct */
  687. #ifdef USE_ITHREADS
  688. char * cop_stashpv; /* package line was compiled in */
  689. char * cop_file; /* file name the following line # is from */
  690. #else
  691. HV * cop_stash; /* package line was compiled in */
  692. GV * cop_filegv; /* file the following line # is from */
  693. #endif
  694. U32 cop_hints; /* hints bits from pragmata */
  695. U32 cop_seq; /* parse sequence number */
  696. /* Beware. mg.c and warnings.pl assume the type of this is STRLEN *: */
  697. STRLEN * cop_warnings; /* lexical warnings bitmask */
  698. /* compile time state of %^H. See the comment in op.c for how this is
  699. used to recreate a hash to return from caller. */
  700. struct refcounted_he * cop_hints_hash;
  701. };
  702. The COP class signifier is B<;> and there are only two:
  703. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /;$/' opcode.pl
  704. nextstate next statement ck_null s;
  705. dbstate debug next statement ck_null s;
  706. C<NEXTSTATE> is replaced by C<DBSTATE> when you call perl with -d, the
  707. debugger. You can even patch the C<NEXTSTATE> ops at runtime to
  708. C<DBSTATE> as done in the module C<Enbugger>.
  709. For a short time there used to be three. C<SETSTATE> was
  710. added 1999 (pre Perl 5.6.0) to track linenumbers correctly
  711. in optimized blocks, disabled 1999 with change 4309 for Perl
  712. 5.6.0, and removed with 5edb5b2abb at Perl 5.10.1.
  713. =head2 BASEOP_OR_UNOP
  714. BASEOP_OR_UNOP has the class signifier B<%>. As the name says, it may
  715. be a L</BASEOP> or L</UNOP>, it may have an optional L</op_first> field.
  716. The list of B<%> ops is quite large, it has 84 ops.
  717. Some of them are e.g.
  718. $ perl -F"/\cI+/" -ane 'print if $F[3] =~ /%$/' opcode.pl
  719. ...
  720. quotemeta quotemeta ck_fun fstu% S?
  721. aeach each on array ck_each % A
  722. akeys keys on array ck_each t% A
  723. avalues values on array ck_each t% A
  724. each each ck_each % H
  725. values values ck_each t% H
  726. keys keys ck_each t% H
  727. delete delete ck_delete % S
  728. exists exists ck_exists is% S
  729. pop pop ck_shift s% A?
  730. shift shift ck_shift s% A?
  731. caller caller ck_fun t% S?
  732. reset symbol reset ck_fun is% S?
  733. exit exit ck_exit ds% S?
  734. ...
  735. =head2 FILESTATOP
  736. A FILESTATOP may be a L</UNOP>, L</PADOP>, L</BASEOP> or L</SVOP>.
  737. It has the class signifier B<->.
  738. The file stat OPs are created via UNI(OP_foo) in toke.c but use the
  739. C<OPf_REF> flag to distinguish between OP types instead of the usual
  740. C<OPf_SPECIAL> flag. As usual, if C<OPf_KIDS> is set, then we return
  741. C<OPc_UNOP> so that C<walkoptree> can find our children. If C<OPf_KIDS> is not
  742. set then we check C<OPf_REF>. Without C<OPf_REF> set (no argument to the
  743. operator) it's an OP; with C<OPf_REF> set it's an SVOP (and the field C<op_sv> is the
  744. GV for the filehandle argument).
  745. case OA_FILESTATOP:
  746. return ((o->op_flags & OPf_KIDS) ? OPc_UNOP :
  747. #ifdef USE_ITHREADS
  748. (o->op_flags & OPf_REF) ? OPc_PADOP : OPc_BASEOP);
  749. #else
  750. (o->op_flags & OPf_REF) ? OPc_SVOP : OPc_BASEOP);
  751. #endif
  752. lstat lstat ck_ftst u- F
  753. stat stat ck_ftst u- F
  754. ftrread -R ck_ftst isu- F-+
  755. ftrwrite -W ck_ftst isu- F-+
  756. ftrexec -X ck_ftst isu- F-+
  757. fteread -r ck_ftst isu- F-+
  758. ftewrite -w ck_ftst isu- F-+
  759. fteexec -x ck_ftst isu- F-+
  760. ftis -e ck_ftst isu- F-
  761. ftsize -s ck_ftst istu- F-
  762. ftmtime -M ck_ftst stu- F-
  763. ftatime -A ck_ftst stu- F-
  764. ftctime -C ck_ftst stu- F-
  765. ftrowned -O ck_ftst isu- F-
  766. fteowned -o ck_ftst isu- F-
  767. ftzero -z ck_ftst isu- F-
  768. ftsock -S ck_ftst isu- F-
  769. ftchr -c ck_ftst isu- F-
  770. ftblk -b ck_ftst isu- F-
  771. ftfile -f ck_ftst isu- F-
  772. ftdir -d ck_ftst isu- F-
  773. ftpipe -p ck_ftst isu- F-
  774. ftsuid -u ck_ftst isu- F-
  775. ftsgid -g ck_ftst isu- F-
  776. ftsvtx -k ck_ftst isu- F-
  777. ftlink -l ck_ftst isu- F-
  778. fttty -t ck_ftst is- F-
  779. fttext -T ck_ftst isu- F-
  780. ftbinary -B ck_ftst isu- F-
  781. =head2 LOOPEXOP
  782. A LOOPEXOP is almost a L<BASEOP_OR_UNOP>. It may be a L</UNOP> if stacked or
  783. L</BASEOP> if special or L</PVOP> else.
  784. C<next>, C<last>, C<redo>, C<dump> and C<goto> use C<OPf_SPECIAL> to indicate that a
  785. label was omitted (in which case it's a L</BASEOP>) or else a term was
  786. seen. In this last case, all except goto are definitely L</PVOP> but
  787. goto is either a PVOP (with an ordinary constant label), an L</UNOP>
  788. with C<OPf_STACKED> (with a non-constant non-sub) or an L</UNOP> for
  789. C<OP_REFGEN> (with C<goto &sub>) in which case C<OPf_STACKED> also seems to
  790. get set.
  791. ...
  792. =head2 OP Definition Example
  793. Let's take a simple example for a opcode definition in F<opcode.pl>:
  794. left_shift left bitshift (<<) ck_bitop fsT2 S S
  795. The op C<left_shift> has a check function C<ck_bitop> (normally most ops
  796. have no check function, just C<ck_null>), and the options C<fsT2>.
  797. The last two C<S S> describe the type of the two required operands:
  798. SV or scalar. This is similar to XS protoypes.
  799. The last C<2> in the options C<fsT2> denotes the class BINOP, with
  800. two args on the stack.
  801. Every binop takes two args and this produces one scalar, see the C<s> flag.
  802. The other remaining flags are C<f> and C<T>.
  803. C<f> tells the compiler in the first pass to call C<fold_constants()>
  804. on this op. See L</"Compile pass 1: check routines and constant folding">
  805. If both args are constant, the result is constant also and the op will
  806. be nullified.
  807. Now let's inspect the simple definition of this op in F<pp.c>.
  808. C<pp_left_shift> is the C<op_ppaddr>, the function pointer, for every
  809. left_shift op.
  810. PP(pp_left_shift)
  811. {
  812. dVAR; dSP; dATARGET; tryAMAGICbin(lshift,opASSIGN);
  813. {
  814. const IV shift = POPi;
  815. if (PL_op->op_private & HINT_INTEGER) {
  816. const IV i = TOPi;
  817. SETi(i << shift);
  818. }
  819. else {
  820. const UV u = TOPu;
  821. SETu(u << shift);
  822. }
  823. RETURN;
  824. }
  825. }
  826. The first IV arg is pop'ed from the stack, the second arg is left on the stack (C<TOPi>/C<TOPu>),
  827. because it is used as the return value. (I<Todo: explain the opASSIGN magic check.>)
  828. One IV or UV is produced, dependent on C<HINT_INTEGER>, set by the C<use integer> pragma.
  829. So it has a special signed/unsigned integer behaviour, which is not defined in the opcode
  830. declaration, because the API is indifferent on this, and it is also independent on the
  831. argument type. The result, if IV or UV, is entirely context dependent at compile-time
  832. ( C<use integer at BEGIN> ) or run-time ( C<$^H |= 1> ), and only stored in the op.
  833. What is left is the C<T> flag, "target can be a pad". This is a useful optimization technique.
  834. This is checked in the macro C<dATARGET>
  835. C<SV *targ = (PL_op->op_flags & OPf_STACKED ? sp[-1] : PAD_SV(PL_op->op_targ));>
  836. C<OPf_STACKED> means "Some arg is arriving on the stack." (see F<op.h>)
  837. So this reads, if the op contains C<OPf_STACKED>, the magic C<targ> ("target argument")
  838. is simply on the stack, but if not, the C<op_targ> points to a SV on a private scratchpad.
  839. "target can be a pad", voila.
  840. For reference see L<perlguts/"Putting a C value on Perl stack">.
  841. =head2 Check Functions
  842. They are defined in F<op.c> and not in F<pp.c>, because they belong tightly to the
  843. ops and newOP definition, and not to the actual pp_ opcode. That's why
  844. the actual F<op.c> file is bigger than F<pp.c> where the real gore for each op begins.
  845. The name of each op's check function is defined in F<opcodes.pl>, as shown above.
  846. The C<ck_null> check function is the most common.
  847. $ perl -F"/\cI+/" -ane 'print $F[2],"\n" if $F[2] =~ /ck_null/' opcode.pl|wc -l
  848. 128
  849. But we do have a lot of those check functions.
  850. $ perl -F"/\cI+/" -ane 'print $F[2],"\n" if $F[2] =~ /ck_/' opcode.pl|sort -u|wc -l
  851. 43
  852. B<When are they called, how do they look like, what do they do.>
  853. The macro CHECKOP(type,o) used to call the ck_ function has a little bit of
  854. common logic.
  855. #define CHECKOP(type,o) \
  856. ((PL_op_mask && PL_op_mask[type]) \
  857. ? ( op_free((OP*)o), \
  858. Perl_croak(aTHX_ "'%s' trapped by operation mask", PL_op_desc[type]), \
  859. (OP*)0 ) \
  860. : CALL_FPTR(PL_check[type])(aTHX_ (OP*)o))
  861. So when a global B<PL_op_mask> is fitting to the type the OP is nullified at once.
  862. If not, the type specific check function with the help of F<opcodes.pl> generating
  863. the C<PL_check> array in F<opnames.h> is called.
  864. =head2 Constant Folding
  865. In theory pretty easy. If all op's arguments in a sequence are constant and the
  866. op is sideffect free ("purely functional"), replace the op sequence with an
  867. constant op as result.
  868. We do it like this: We define the C<f> flag in F<opcodes.pl>, which tells the
  869. compiler in the first pass to call C<fold_constants()> on this op. See
  870. L<"Compile pass 1: check routines and constant folding"> above. If all args are
  871. constant, the result is constant also and the op sequence will be replaced by
  872. the constant.
  873. But take care, every C<f> op must be sideeffect free.
  874. E.g. our C<newUNOP()> calls at the end:
  875. return fold_constants((OP *) unop);
  876. OA_FOLDCONST ...
  877. =head2 Lexical Pragmas
  878. To implement user lexical pragmas, there needs to be a way at run time to get
  879. the compile time state of `%^H` for that block. Storing `%^H` in every
  880. block (or even COP) would be very expensive, so a different approach is
  881. taken. The (running) state of C<%^H> is serialised into a tree of HE-like
  882. structs. Stores into C<%^H> are chained onto the current leaf as a struct
  883. refcounted_he * with the key and the value. Deletes from C<%^H> are saved
  884. with a value of C<PL_sv_placeholder>. The state of C<%^H> at any point can be
  885. turned back into a regular HV by walking back up the tree from that point's
  886. leaf, ignoring any key you've already seen (placeholder or not), storing
  887. the rest into the HV structure, then removing the placeholders. Hence
  888. memory is only used to store the C<%^H> deltas from the enclosing COP, rather
  889. than the entire C<%^H> on each COP.
  890. To cause actions on C<%^H> to write out the serialisation records, it has
  891. magic type 'H'. This magic (itself) does nothing, but its presence causes
  892. the values to gain magic type 'h', which has entries for set and clear.
  893. C<Perl_magic_sethint> updates C<PL_compiling.cop_hints_hash> with a store
  894. record, with deletes written by C<Perl_magic_clearhint>. C<SAVEHINTS>
  895. saves the current C<PL_compiling.cop_hints_hash> on the save stack, so that
  896. it will be correctly restored when any inner compiling scope is exited.
  897. =head1 Examples
  898. =head2 Call a subroutine
  899. subname(args...) =>
  900. pushmark
  901. args ...
  902. gv => subname
  903. entersub
  904. =head2 Call a method
  905. Here we have several combinations to define the package and the method name, either
  906. compile-time (static as constant string), or dynamic as B<GV> (for the method name) or
  907. B<PADSV> (package name).
  908. B<method_named> holds the method name as C<sv> if known at compile time.
  909. If not B<gv> (of the name) and B<method> is used.
  910. The package name is at the top of the stack.
  911. A call stack is added with B<pushmark>.
  912. 1. Static compile time package ("class") and method:
  913. Class->subname(args...) =>
  914. pushmark
  915. const => PV "Class"
  916. args ...
  917. method_named => PV "subname"
  918. entersub
  919. 2. Run-time package ("object") and compile-time method:
  920. $obj->meth(args...) =>
  921. pushmark
  922. padsv => GV *packagename
  923. args ...
  924. method_named => PV "meth"
  925. entersub
  926. 3. Run-time package and run-time method:
  927. $obj->$meth(args...) =>
  928. pushmark
  929. padsv => GV *packagename
  930. args ...
  931. gvsv => GV *meth
  932. method
  933. entersub
  934. 4. Compile-time package ("class") and run-time method:
  935. Class->$meth(args...) =>
  936. pushmark
  937. const => PV "Class"
  938. args ...
  939. gvsv => GV *meth
  940. method
  941. entersub
  942. =head1 Hooks
  943. =head2 Special execution blocks BEGIN, CHECK, UNITCHECK, INIT, END
  944. Perl keeps special arrays of subroutines that are executed at the
  945. beginning and at the end of a running Perl program and its program
  946. units. These subroutines correspond to the special code blocks:
  947. C<BEGIN>, C<CHECK>, C<UNITCHECK>, C<INIT> and C<END>. (See basics at
  948. L<perlmod/basics>.)
  949. Such arrays belong to Perl's internals that you're not supposed to
  950. see. Entries in these arrays get consumed by the interpreter as it
  951. enters distinct compilation phases, triggered by statements like
  952. C<require>, C<use>, C<do>, C<eval>, etc. To play as safest as
  953. possible, the only allowed operations are to add entries to the start
  954. and to the end of these arrays.
  955. BEGIN, UNITCHECK and INIT are FIFO (first-in, first-out) blocks while
  956. CHECK and END are LIFO (last-in, first-out).
  957. L<Devel::Hook> allows adding code the start or end of these
  958. blocks. L<Manip::END> even tries to remove certain entries.
  959. =head3 The BEGIN block
  960. A special array of code at C<PL_beginav>, that is executed before
  961. C<main_start>, the first op, which is defined be called C<ENTER>.
  962. E.g. C<use module;> adds its require and importer code into the BEGIN
  963. block.
  964. =head3 The CHECK block
  965. The B compiler starting block at C<PL_checkav>. This hooks int the
  966. check function which is executed for every op created in bottom-up,
  967. basic order.
  968. =head3 The UNITCHECK block
  969. A new block since Perl 5.10 at C<PL_unitcheckav> runs right after the
  970. CHECK block, to seperate possible B compilation hooks from other
  971. checks.
  972. =head3 The INIT block
  973. At C<PL_initav>.
  974. =head3 The END block
  975. At C<PL_endav>.
  976. L<Manip::END> started to mess around with this block.
  977. The array contains an C<undef> for each block that has been
  978. encountered. It's not really an C<undef> though, it's a kind of raw
  979. coderef that's not wrapped in a scalar ref. This leads to funky error
  980. messages like C<Bizarre copy of CODE in sassign> when you try to assign
  981. one of these values to another variable. See L<Manip::END> how to
  982. manipulate these values array.
  983. =head2 B and O module. The perl compiler.
  984. Malcom Beattie's B modules hooked into the early op tree stages to
  985. represent the internal ops as perl objects and added the perl compiler
  986. backends. See L<B> and L<perlcompile>.
  987. The three main compiler backends are still B<Bytecode>, B<C> and B<CC>.
  988. I<Todo: Describe B's object representation a little bit deeper, its
  989. CHECK hook, its internal transformers for Bytecode (asm and vars) and
  990. C (the sections).>
  991. =head2 MAD
  992. MAD stands for "Misc Attributed Data".
  993. Larry Wall worked on a new MAD compiler backend outside of the B
  994. approach, dumping the internal op tree representation as B<XML> or
  995. B<YAML>, not as tree of perl B objects.
  996. The idea is that all the information needed to recreate the original source is
  997. stored in the op tree. To do this the tokens for the ops are associated with ops,
  998. these madprops are a list of key-value pairs, where the key is a character as
  999. listed at the end of F<op.h>, the value normally is a string, but it might also be
  1000. a op, as in the case of a optimized op ('O'). Special for the whitespace key '_'
  1001. (whitespace before) and '#' (whitespace after), which indicate the whitespace or
  1002. comment before/after the previous key-value pair.
  1003. Also when things normally compiled out, like a BEGIN block, which normally do
  1004. not results in any ops, instead create a NULLOP with madprops used to recreate
  1005. the object.
  1006. I<Is there any documentation on this?>
  1007. Why this awful XML and not the rich tree of perl objects?
  1008. Well there's an advantage.
  1009. The MAD XML can be seen as some kind of XML Storable/Freeze of the B
  1010. op tree, and can be therefore converted outside of the CHECK block,
  1011. which means you can easier debug the conversion (= compilation)
  1012. process. To debug the CHECK block in the B backends you have to
  1013. use the L<B::Debugger> B<Od> or B<Od_o> modules, which defer the
  1014. CHECK to INIT. Debugging the highly recursive data is not easy,
  1015. and often problems can not be reproduced in the B debugger because
  1016. the B debugger influences the optree.
  1017. B<kurila> L<http://search.cpan.org/dist/kurila/> uses MAD to convert
  1018. Perl 5 source to the kurila dialect.
  1019. To convert a file 'source.pm' from Perl 5.10 to Kurila you need to do:
  1020. kurilapath=/usr/src/perl/kurila-1.9
  1021. bleadpath=/usr/src/perl/blead
  1022. cd $kurilapath
  1023. madfrom='perl-5.10' madto='kurila-1.9' \
  1024. madconvert="/usr/bin/perl $kurilapath/mad/p5kurila.pl" \
  1025. madpath="$bleadpath/mad" \
  1026. mad/convert /path/to/source.pm
  1027. B<PPI> L<http://search.cpan.org/dist/PPI/>, a Perl 5 source level parser not
  1028. related to the op tree at all, could also have been used for that.
  1029. =head2 Pluggable runops
  1030. The compile tree is executed by one of two existing runops functions, in F<run.c>
  1031. or in F<dump.c>. C<Perl_runops_debug> is used with C<DEBUGGING> and the faster
  1032. C<Perl_runops_standard> is used otherwise (See below in L</"Walkers">). For fine
  1033. control over the execution of the compile tree it is possible to provide your
  1034. own runops function.
  1035. It's probably best to copy one of the existing runops functions and
  1036. change it to suit your needs. Then, in the C<BOOT> section of your XS
  1037. file, add the line:
  1038. PL_runops = my_runops;
  1039. This function should be as efficient as possible to keep your programs
  1040. running as fast as possible. See L<Jit> for an even faster just-in-time
  1041. compilation runloop.
  1042. =head3 Walkers or runops
  1043. The standard op tree B<walker> or B<runops> is as simple as this fast
  1044. C<Perl_runops_standard()> in (F<run.c>). It starts with C<main_start> and walks
  1045. the C<op_next> chain until the end. No need to check other fields, strictly
  1046. linear through the tree.
  1047. int
  1048. Perl_runops_standard(pTHX)
  1049. {
  1050. dVAR;
  1051. while ((PL_op = CALL_FPTR(PL_op->op_ppaddr)(aTHX))) {
  1052. PERL_ASYNC_CHECK(); /* until 5.13.2 */
  1053. }
  1054. TAINT_NOT;
  1055. return 0;
  1056. }
  1057. To inspect the op tree within a perl program, you can also hook C<PL_runops> (see
  1058. above at L</"Pluggable runops">) to your own perl walker (see e.g. L<B::Utils>
  1059. for various useful walkers), but you cannot modify the tree from within the B
  1060. accessors, only via XS. Or via L<B::Generate> as explained in Simon Cozen's
  1061. "Hacking the Optree for Fun..." L<http://www.perl.com/pub/a/2002/05/07/optree.html>.
  1062. I<Todo: Show the other runloops, and esp. the B:Utils ones.>
  1063. I<Todo: Describe the dumper, the debugging and more extended walkers.>
  1064. =head1 SEE ALSO
  1065. =head2 Internal and external modifications
  1066. See the short description of the internal optimizer in the "Brief Summary".
  1067. I<Todo: Describe the exported variables and functions which can be
  1068. hooked, besides simply adding code to the blocks.>
  1069. Via L</"Pluggable runops"> you can provide your own walker function, as it
  1070. is done in most B modules. Best see L<B::Utils>.
  1071. You may also create custom ops at runtime (well, strictly speaking at
  1072. compile-time) via L<B::Generate>.
  1073. =head2 Modules
  1074. The most important op tree module is L<B::Concise> by Stephen McCamant.
  1075. L<B::Utils> provides abstract-enough op tree grep's and walkers with
  1076. callbacks from the perl level.
  1077. L<Devel::Hook> allows adding perl hooks into the BEGIN, CHECK,
  1078. UNITCHECK, INIT blocks.
  1079. L<Devel::TypeCheck> tries to verify possible static typing for
  1080. expressions and variables, a pretty hard problem for compilers,
  1081. esp. with such dynamic and untyped variables as Perl 5.
  1082. Reini Urban maintains the interactive op tree debugger L<B::Debugger>,
  1083. the Compiler suite (B::C, B::CC, B::Bytecode), L<B::Generate> and
  1084. is working on L<Jit>.
  1085. =head2 Various Articles
  1086. The best source of information is the source. It is very well documented.
  1087. There are some pod files from talks and workshops in F<ramblings/>.
  1088. From YAPC EU 2010 there is a good screencast at L<http://vimeo.com/14058377>.
  1089. Simon Cozens has posted the course material to NetThink's
  1090. L<http://books.simon-cozens.org/index.php/Perl_5_Internals#The_Lexer_and_the_Parser>
  1091. training course. This is the currently best available description on
  1092. that subject.
  1093. "Hacking the Optree for Fun..." at
  1094. L<http://www.perl.com/pub/a/2002/05/07/optree.html> is the next step by
  1095. Simon Cozens.
  1096. Scott Walters added more details at L<http://perldesignpatterns.com/?PerlAssembly>
  1097. Joshua ben Jore wrote a 50 minute presentation on "Perl 5
  1098. VM guts" at L<http://diotalevi.isa-geek.net/~josh/Presentations/Perl%205%20VM/>
  1099. focusing on the op tree for SPUG, the Seattle Perl User's Group.
  1100. Eric Wilhelm wrote a brief tour through the perl compiler backends for
  1101. the impatient refactorerer. The perl_guts_tour as mp3
  1102. L<http://scratchcomputing.com/developers/perl_guts_tour.html> or as
  1103. pdf L<http://scratchcomputing.com/developers/perl_guts_tour.pdf>
  1104. This text was created in this wiki article:
  1105. L<http://www.perlfoundation.org/perl5/index.cgi?optree_guts>
  1106. The with B::C released version should be more actual.
  1107. =head1 Conclusion
  1108. So this is about 30% of the basic op tree information so far. Not speaking about
  1109. the guts. Simon Cozens and Scott Walters have more 30%, in the source are more
  1110. 10% to copy&paste, and in the compilers and run-time information is the rest. I
  1111. hope with the help of some hackers we'll get it done, so that some people will
  1112. begin poking around in the B backends. And write the wonderful new C<dump>/C<undump>
  1113. functionality (which actually worked in the early years on Solaris) to
  1114. save-image and load-image at runtime as in LISP, analyse and optimize the
  1115. output, output PIR (parrot code), emit LLVM or another JIT optimized code or
  1116. even write assemblers. I have a simple one at home. :)
  1117. Written 2008 on the perl5 wiki with socialtext and pod in parallel
  1118. by Reini Urban, CPAN ID C<rurban>.