Java 7 subset syntax directed Dalvik bytecode translator.

g.cze c99edb468c Method and field search structs. 1 week ago
.gitignore 237ce8f359 Improving method and field treatment. 3 months ago
BisonifyAlternatives.sed 48e1adcb84 Sets stuff for Bisonify alternatives in grammar. 2 years ago
BisonifyParenthesys.pl 17faa2a1b3 Sets stuff for grammar's transform the Bison's way. Adds transformed grammar. 2 years ago
LALRifying aef37abf5f Sets on LR grammar transform. 2 years ago
LALRifyingBrackets.pl e4448a1fb9 Sets stuff for LR grammar transform. 2 years ago
LICENSE 4c79995f3e Migrates program name, license and copyright years. 11 months ago
Makefile 3e6983a5df Creates method_id.c, prunes TODO's, updates Makefile. 7 months ago
README.md 027c534bd8 Writting project log into README.md. 1 month ago
build_wtname.sh 089b446122 Gets rid of a couple of parser conflicts. 2 years ago
class_def.c 11d72a0db3 Completing class_def's pack function (not finished). 6 months ago
class_def.h c99edb468c Method and field search structs. 1 week ago
composer.c 8b6708a878 Making a correction to use buf_offset when the actual buffer content's length is needed. The other way round, buf_len belongs on an internal state of the buffer (bugfix). 3 months ago
composer.h 933810d3c8 In making padding, fills with 0's in the buffers instead of directly in the output file. 3 months ago
context.c ea685c6db6 Fills context's operations. 3 months ago
context.h ea685c6db6 Fills context's operations. 3 months ago
convenience.h cfbe438850 Improves header names. 9 months ago
doc2jstub.pl 733b6d689b Adds doc2jstub.pl and fetch-api-doc.sh 11 months ago
fetch-api-doc.sh 733b6d689b Adds doc2jstub.pl and fetch-api-doc.sh 11 months ago
field_id.h c99edb468c Method and field search structs. 1 week ago
header.c 8b6708a878 Making a correction to use buf_offset when the actual buffer content's length is needed. The other way round, buf_len belongs on an internal state of the buffer (bugfix). 3 months ago
java7.y 39ee870e11 Writting notes. 3 weeks ago
lgwseq.c 22a5421422 Corrects compilation errors. 10 months ago
lgwseq.h cfbe438850 Improves header names. 9 months ago
main.c 933810d3c8 In making padding, fills with 0's in the buffers instead of directly in the output file. 3 months ago
map_list.c 4c79995f3e Migrates program name, license and copyright years. 11 months ago
method_id.c 815ecdb258 Completing pack_omethod_id function. 6 months ago
method_id.h c99edb468c Method and field search structs. 1 week ago
ongoing.sed 6ce52f3727 Sets points on LR grammar transform. 2 years ago
ongoing.txt 3a3cac4366 Adds ongoing.txt 2 years ago
pack_test.pl d1a2336512 Refined the line-up about addressing, packing and outstreaming process. 2 years ago
parse_file.c bce34f3e76 Introducing external declarations and fixing interaction with the context. 8 months ago
parse_file.h bce34f3e76 Introducing external declarations and fixing interaction with the context. 8 months ago
program.java d322ec2b4a Modifies program.java and adds referential stuff. 2 years ago
proto_id.c 47a6e161a1 Miscellaneous refactor. 6 months ago
proto_id.h 47a6e161a1 Miscellaneous refactor. 6 months ago
scandoc.sh 22b5a2b762 Starts with working on class_def. 1 year ago
search_entry.h b4ea609ae9 Putting methods and field refs. 1 week ago
search_entry_upstream.h c99edb468c Method and field search structs. 1 week ago
str_id.c 47a6e161a1 Miscellaneous refactor. 6 months ago
str_id.h 47a6e161a1 Miscellaneous refactor. 6 months ago
type_id.c 47a6e161a1 Miscellaneous refactor. 6 months ago
type_id.h 47a6e161a1 Miscellaneous refactor. 6 months ago
type_list.c 47a6e161a1 Miscellaneous refactor. 6 months ago
type_list.h 11d72a0db3 Completing class_def's pack function (not finished). 6 months ago
types.c 4c79995f3e Migrates program name, license and copyright years. 11 months ago
types.h b5f2a90980 Setting up copyright and license notes. Lifting up TODO's. 6 months ago
unresolved_type.c bce34f3e76 Introducing external declarations and fixing interaction with the context. 8 months ago
unresolved_type.h bce34f3e76 Introducing external declarations and fixing interaction with the context. 8 months ago
upgoing.pl 00c0548c50 Sets on LR grammar transform.! 2 years ago
upgoing.sed 6ce52f3727 Sets points on LR grammar transform. 2 years ago

README.md

Côtehaus

Côtehaus is a house in the coast. In the land of hopes, Côtehaus is to become a Java 7 subset syntax directed Dalvik bytecode translator.

License

It is released dually under The MIT License and The GNU General Public License version 3, with the exception of file types.h which includes copied code and addresses its own license definition. For the general case, see LICENSE file.

Developer notes

A large part of Côtehaus source code is fairly repetitive. The source code tends to exhibit some recognizible form according to the function it accomplishes (broadly speaking input, processing and output). Each of these general functions defines a stage within which the process is found anytime, and the source code's forms arround each stage repeats as many times as the quantity of structures the stage targets from the Dalvik's format. At the input stage, there's a Bison grammar that sets up structure recognition over input text. Bison's annotations then integrate with the next stage, the processing, by allocating objects that represent the Dalvik format's structures. The former is found in the file java7.y, where the many instances of the former construction look like this:

MethodOrFieldDecl:
  Type Identifier MethodOrFieldRest
  { /* TODO
    switch($3.type) {
      case "method":
      */
      add_method_id(context_peek()->enclosing_class, add_proto_id($1, $3->formal_params->type_list),$2);
      /*
      break;
      case "field":
        add_field_id(...);
      break;
    }
    */
  }
  ;

The implementation in the C language of these calls, allocates memory and chain the new object with their same-typed ones. When an object is expected to reference another one, the call that deals with the input of the referer accomodates a pointer at the referee. For example:

struct proto_id_st *add_proto_id(struct type_id_st *return_type_idx, struct type_list_st *parameters_list) {
  struct proto_id_st *proto_id = malloc(sizeof(struct proto_id_st));
  proto_id->return_type_idx = return_type_idx;
  proto_id->parameters_list = parameters_list;
  wchar_t *shorty = malloc((1+parameters_list->nb_members)*sizeof(wchar_t));
  *shorty = *return_type_idx->str_id->str_dt->unpacked_data->data;
  int i=0;
  struct list_head *tmp;
  list_for_each(tmp, &parameters_list->type_member_head) {
    // Take a general TypeDescriptor into an array position
    shorty[++i] = *list_entry(tmp, struct type_member_st, type_member_list)->type_id->str_id->str_dt->unpacked_data->data;
    // Convert it to its ShortyDescriptor form
    if(shorty[i]==L'[')
      shorty[i]=L'L';
  }
  lgwseq_t *shorty_2;
  for_lgwseq(&shorty_2, i+1, shorty);
  proto_id->shorty_idx = add_str_dt(shorty_2);
  list_add(&proto_id->proto_id_list, &proto_id_head);
  proto_id_list_size++;
  return proto_id;
}

At the processing stage then, the program takes on permutations. It takes the objects as they were blown from input stage and permutes as much as necessary to have these objects reordered as in accordance with the Dalvik's format. For example:

int oproto_id_st_compar(struct oproto_id_st *oproto_id_1, struct oproto_id_st *oproto_id_2) {
  if(oproto_id_1->proto_id->return_type_idx!=oproto_id_2->proto_id->return_type_idx) {
    return oproto_id_1->proto_id->return_type_idx-oproto_id_2->proto_id->return_type_idx;
  }
  if(oproto_id_1->proto_id->parameters_list->relative_offset!=oproto_id_2->proto_id->parameters_list->relative_offset) {
    return oproto_id_1->proto_id->parameters_list->relative_offset-oproto_id_2->proto_id->parameters_list->relative_offset;
  }
  return 0;
}

void
build_oproto_id() {
  oproto_id_ary = malloc(proto_id_list_size*sizeof(struct oproto_id_st));
  memset(oproto_id_ary, 0, proto_id_list_size*sizeof(struct oproto_id_st));
  struct list_head *tmp;
  unsigned int i=0;
  list_for_each(tmp, &proto_id_head) {
    (oproto_id_ary+i)->proto_id = (struct proto_id_st *)list_entry(tmp, struct proto_id_st, proto_id_list);
    i++;
  }
  qsort(oproto_id_ary, proto_id_list_size, sizeof(struct oproto_id_st), (int (*)(const void *, const void *))oproto_id_st_compar);
  i=1;
  unsigned int representative_i = 0, representative_idx = 0;
  while(representative_i<proto_id_list_size) {
    (oproto_id_ary+representative_i)->proto_id->idx = representative_idx;
    while(i!=proto_id_list_size && oproto_id_st_compar(oproto_id_ary+representative_i, oproto_id_ary+i)==0) {
      (oproto_id_ary+i)->proto_id->idx = (oproto_id_ary+representative_i)->proto_id->idx;
      i++;
    }
    if(i!=proto_id_list_size)
      (oproto_id_ary+representative_i)->next_representative = oproto_id_ary+i;
    representative_i=i++;
    representative_idx++;
  }
  bounds_move(NH_PROTO_ID_IDX, 3*sizeof(uint32_t)/*Storage Designators: ((struct str_id_st)(struct proto_id_st).shorty_idx).idx,
  ((struct type_id_st)(struct proto_id_st).return_type_idx).idx,
  ((struct type_list_st)(struct proto_id_st).parameters_list).relative_offset;
  This makes for a total of 3 uint32_t.
  */ *representative_idx);
  majors_size_ary[NH_PROTO_ID_IDX] = representative_idx;
}

Finally, the program at the output stage carries with emitting an external representation in compliance to Dalvik's format. Right here, the the fitting of the datas in a spatial referential system overwhelms the data's meaning itself. after that fact, library calls are used that invert the relation format-meaning, turning the format into the meaning. For example:

void pack_oproto_id() {
  for(struct oproto_id_st *oproto_id_representative = oproto_id_ary;oproto_id_representative!=NULL; oproto_id_representative=oproto_id_representative->next_representative){
    uint32_t packed = htole32(oproto_id_representative->proto_id->shorty_idx->idx);
    BUFFER_WRITE(buffer[NH_PROTO_ID_IDX],buf_len[NH_PROTO_ID_IDX],&packed,buf_offset[NH_PROTO_ID_IDX],sizeof(uint32_t))
    packed = htole32(oproto_id_representative->proto_id->return_type_idx->idx);
    BUFFER_WRITE(buffer[NH_PROTO_ID_IDX],buf_len[NH_PROTO_ID_IDX],&packed,buf_offset[NH_PROTO_ID_IDX],sizeof(uint32_t))
    packed = htole32(oproto_id_representative->proto_id->parameters_list->relative_offset+up_bounds_ary[NH_TYPE_LIST_IDX-1]);
    BUFFER_WRITE(buffer[NH_PROTO_ID_IDX],buf_len[NH_PROTO_ID_IDX],&packed,buf_offset[NH_PROTO_ID_IDX],sizeof(uint32_t))
  }
}

Project updates

The current focus is at gathering samples of what could be Côtehaus input/output data, i.e. samples of any Android Java code and its corresponding .dex code, regardless if .apk wrapped. Mainly from F-droid site, although may be from any place. The gathered stuff is listed below:

Of course, all the apps listed above are unrelated to Côtehaus.