Java 7 subset syntax directed Dalvik bytecode translator.
|
1 week ago | |
---|---|---|
.gitignore | 3 months ago | |
BisonifyAlternatives.sed | 2 years ago | |
BisonifyParenthesys.pl | 2 years ago | |
LALRifying | 2 years ago | |
LALRifyingBrackets.pl | 2 years ago | |
LICENSE | 11 months ago | |
Makefile | 7 months ago | |
README.md | 1 month ago | |
build_wtname.sh | 2 years ago | |
class_def.c | 6 months ago | |
class_def.h | 1 week ago | |
composer.c | 3 months ago | |
composer.h | 3 months ago | |
context.c | 3 months ago | |
context.h | 3 months ago | |
convenience.h | 9 months ago | |
doc2jstub.pl | 11 months ago | |
fetch-api-doc.sh | 11 months ago | |
field_id.h | 1 week ago | |
header.c | 3 months ago | |
java7.y | 3 weeks ago | |
lgwseq.c | 10 months ago | |
lgwseq.h | 9 months ago | |
main.c | 3 months ago | |
map_list.c | 11 months ago | |
method_id.c | 6 months ago | |
method_id.h | 1 week ago | |
ongoing.sed | 2 years ago | |
ongoing.txt | 2 years ago | |
pack_test.pl | 2 years ago | |
parse_file.c | 8 months ago | |
parse_file.h | 8 months ago | |
program.java | 2 years ago | |
proto_id.c | 6 months ago | |
proto_id.h | 6 months ago | |
scandoc.sh | 1 year ago | |
search_entry.h | 1 week ago | |
search_entry_upstream.h | 1 week ago | |
str_id.c | 6 months ago | |
str_id.h | 6 months ago | |
type_id.c | 6 months ago | |
type_id.h | 6 months ago | |
type_list.c | 6 months ago | |
type_list.h | 6 months ago | |
types.c | 11 months ago | |
types.h | 6 months ago | |
unresolved_type.c | 8 months ago | |
unresolved_type.h | 8 months ago | |
upgoing.pl | 2 years ago | |
upgoing.sed | 2 years ago |
Côtehaus is a house in the coast. In the land of hopes, Côtehaus is to become a Java 7 subset syntax directed Dalvik bytecode translator.
It is released dually under The MIT License and The GNU General Public License version 3, with the exception of file types.h
which includes copied code and addresses its own license definition. For the general case, see LICENSE file.
A large part of Côtehaus source code is fairly repetitive.
The source code tends to exhibit some recognizible form according to the function it accomplishes (broadly speaking input, processing and output). Each of these general functions defines a stage within which the process is found anytime, and the source code's forms arround each stage repeats as many times as the quantity of structures the stage targets from the Dalvik's format.
At the input stage, there's a Bison grammar that sets up structure recognition over input text. Bison's annotations then integrate with the next stage, the processing, by allocating objects that represent the Dalvik format's structures.
The former is found in the file java7.y
, where the many instances of the former construction look like this:
MethodOrFieldDecl:
Type Identifier MethodOrFieldRest
{ /* TODO
switch($3.type) {
case "method":
*/
add_method_id(context_peek()->enclosing_class, add_proto_id($1, $3->formal_params->type_list),$2);
/*
break;
case "field":
add_field_id(...);
break;
}
*/
}
;
The implementation in the C language of these calls, allocates memory and chain the new object with their same-typed ones. When an object is expected to reference another one, the call that deals with the input of the referer accomodates a pointer at the referee. For example:
struct proto_id_st *add_proto_id(struct type_id_st *return_type_idx, struct type_list_st *parameters_list) {
struct proto_id_st *proto_id = malloc(sizeof(struct proto_id_st));
proto_id->return_type_idx = return_type_idx;
proto_id->parameters_list = parameters_list;
wchar_t *shorty = malloc((1+parameters_list->nb_members)*sizeof(wchar_t));
*shorty = *return_type_idx->str_id->str_dt->unpacked_data->data;
int i=0;
struct list_head *tmp;
list_for_each(tmp, ¶meters_list->type_member_head) {
// Take a general TypeDescriptor into an array position
shorty[++i] = *list_entry(tmp, struct type_member_st, type_member_list)->type_id->str_id->str_dt->unpacked_data->data;
// Convert it to its ShortyDescriptor form
if(shorty[i]==L'[')
shorty[i]=L'L';
}
lgwseq_t *shorty_2;
for_lgwseq(&shorty_2, i+1, shorty);
proto_id->shorty_idx = add_str_dt(shorty_2);
list_add(&proto_id->proto_id_list, &proto_id_head);
proto_id_list_size++;
return proto_id;
}
At the processing stage then, the program takes on permutations. It takes the objects as they were blown from input stage and permutes as much as necessary to have these objects reordered as in accordance with the Dalvik's format. For example:
int oproto_id_st_compar(struct oproto_id_st *oproto_id_1, struct oproto_id_st *oproto_id_2) {
if(oproto_id_1->proto_id->return_type_idx!=oproto_id_2->proto_id->return_type_idx) {
return oproto_id_1->proto_id->return_type_idx-oproto_id_2->proto_id->return_type_idx;
}
if(oproto_id_1->proto_id->parameters_list->relative_offset!=oproto_id_2->proto_id->parameters_list->relative_offset) {
return oproto_id_1->proto_id->parameters_list->relative_offset-oproto_id_2->proto_id->parameters_list->relative_offset;
}
return 0;
}
void
build_oproto_id() {
oproto_id_ary = malloc(proto_id_list_size*sizeof(struct oproto_id_st));
memset(oproto_id_ary, 0, proto_id_list_size*sizeof(struct oproto_id_st));
struct list_head *tmp;
unsigned int i=0;
list_for_each(tmp, &proto_id_head) {
(oproto_id_ary+i)->proto_id = (struct proto_id_st *)list_entry(tmp, struct proto_id_st, proto_id_list);
i++;
}
qsort(oproto_id_ary, proto_id_list_size, sizeof(struct oproto_id_st), (int (*)(const void *, const void *))oproto_id_st_compar);
i=1;
unsigned int representative_i = 0, representative_idx = 0;
while(representative_i<proto_id_list_size) {
(oproto_id_ary+representative_i)->proto_id->idx = representative_idx;
while(i!=proto_id_list_size && oproto_id_st_compar(oproto_id_ary+representative_i, oproto_id_ary+i)==0) {
(oproto_id_ary+i)->proto_id->idx = (oproto_id_ary+representative_i)->proto_id->idx;
i++;
}
if(i!=proto_id_list_size)
(oproto_id_ary+representative_i)->next_representative = oproto_id_ary+i;
representative_i=i++;
representative_idx++;
}
bounds_move(NH_PROTO_ID_IDX, 3*sizeof(uint32_t)/*Storage Designators: ((struct str_id_st)(struct proto_id_st).shorty_idx).idx,
((struct type_id_st)(struct proto_id_st).return_type_idx).idx,
((struct type_list_st)(struct proto_id_st).parameters_list).relative_offset;
This makes for a total of 3 uint32_t.
*/ *representative_idx);
majors_size_ary[NH_PROTO_ID_IDX] = representative_idx;
}
Finally, the program at the output stage carries with emitting an external representation in compliance to Dalvik's format. Right here, the the fitting of the datas in a spatial referential system overwhelms the data's meaning itself. after that fact, library calls are used that invert the relation format-meaning, turning the format into the meaning. For example:
void pack_oproto_id() {
for(struct oproto_id_st *oproto_id_representative = oproto_id_ary;oproto_id_representative!=NULL; oproto_id_representative=oproto_id_representative->next_representative){
uint32_t packed = htole32(oproto_id_representative->proto_id->shorty_idx->idx);
BUFFER_WRITE(buffer[NH_PROTO_ID_IDX],buf_len[NH_PROTO_ID_IDX],&packed,buf_offset[NH_PROTO_ID_IDX],sizeof(uint32_t))
packed = htole32(oproto_id_representative->proto_id->return_type_idx->idx);
BUFFER_WRITE(buffer[NH_PROTO_ID_IDX],buf_len[NH_PROTO_ID_IDX],&packed,buf_offset[NH_PROTO_ID_IDX],sizeof(uint32_t))
packed = htole32(oproto_id_representative->proto_id->parameters_list->relative_offset+up_bounds_ary[NH_TYPE_LIST_IDX-1]);
BUFFER_WRITE(buffer[NH_PROTO_ID_IDX],buf_len[NH_PROTO_ID_IDX],&packed,buf_offset[NH_PROTO_ID_IDX],sizeof(uint32_t))
}
}
The current focus is at gathering samples of what could be Côtehaus input/output data, i.e. samples of any Android Java code and its corresponding .dex code, regardless if .apk wrapped. Mainly from F-droid site, although may be from any place. The gathered stuff is listed below:
Of course, all the apps listed above are unrelated to Côtehaus.