I'm not working on this but am still interested in the idea, in my hobby capacity. The below emails were exchanged. I didn't yet receive further reply.

---------- Forwarded message ---------
Date: Sun, Apr 24, 2022, 10:31 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
To: Karl Semich <0xloem@gmail.com>


I'm very interested in your ideas.

Yep, I"m interested, I've been trying to find someone interested in
this for 5 years now

---------- Forwarded message ---------
Date: Sun, Apr 24, 2022, 10:32 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
To: Karl Semich <0xloem@gmail.com>


I think we should start with a binary blob for an open system like the
allwinner d1. We have the blob, it's pretty simple code.

---------- Forwarded message ---------
From: Karl Semich <0xloem@gmail.com>
Date: Sun, Apr 24, 2022, 2:41 PM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC


hi

is this too many words below? are you still interested? how do you feel about making the conversation more public?

my biggest issue is neurological and psychological, it's had for me to do things, so the more people the better

the transformer model approach works on data. they need a lot of examples of something to begin copying well, thousands or millions or more. it seems fewer and fewer examples are needed the more similar data a model has been trained on already, down to finally needing none. ("few-shot", "zero-shot"). so one could take a pretrained model and e.g. adjust it (via training) to predict comments from uncommented code, or predict source from binaries of an architecture where more binaries are available, or from a specific set of compilers compiler where many training binaries could be generated from commented code. then by combining these separate models one might likely be able to perform a task on which less data is available.

for me at this point, finding or generating that data, and finding paths to fit the model better to the target tasks, seems equally challenging compared to just setting up a basic training framework to fine tune models. there are a few different infrastructures, and sometimes people make their own if doing a new kind of task.

most people are using jupyter notebooks via various online portals. there are also compute servers at places like vast.ai. but I am out of money and most of my devices are very low end.

I expect this to happen and am interested in helping it happen but i'll be pretty slow and flaky on my own :)

do you have any thoughts on generating or finding data similar to many pairings of embedded binary blobs and license-free interface specifications? is the allwinner d1 much data?

most of the similar models I know of out there continue streams of human text or source code pick selecting up to 2048 words from a fixed vocabulary maybe 10k to 100k words large. There are many known techniques to extend this to more words but finding and implementing them is part of the effort. This also informs how to pick or develop a training framework: something that makes it easier to rearchitect the models will be needed to work with or generate data larger than 2048 symbols long.

the challenge of lengthening may also be postponeable if the meaningful patterns of the data can be squeezed to fit within that size, as is done for human text in the mainstream. the model may learn appropriate patterns and then take only a little further adjustment to handle larger ones once rearchitectured to do so.


---------- Forwarded message ---------
From: Karl Semich <0xloem@gmail.com>
Date: Sun, Apr 24, 2022, 2:58 PM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC


Basically, there's a class of neural networks called transformer models that have become mainstream for successfully solving problems across domains.

Common systems used to work with them that I'm aware of include pytorch, jax, huggingface transformers, keras, and tensorflow: all python libraries. A number of groups are offering hubs where people share pretrained models for others to use. I've heard https://fast.ai/ has a good starting course, but it didn't quite seem for me, I haven't taken it.

I think most people are learning from example jupyter notebooks. I've mostly learned by skimming the huggingface transformers source code and occasional research papers.

Each transformer model is roughly made of "layers" of "self attention" , which is roughly just a class of a few operations performed on data represented as matrices, and then passed to the next layer where the operations are performed again. Each layer has many "weights" -- coefficients -- which are how the model "learns", by backpropagating gradients to adjust their values.

A big thing appears to have been how to organise the beginning and end of the chain. The models usually have something different on both ends, to process the input or output data between the internal representation of huge matrices of floating point numbers. It seems important to do this in a way that preserves the meaning of the data well, and often a couple other neural network parts are used -- very short algorithms that can operate with trainable coefficients on large matrices.

The basic takeaway is that transformer models use an algorithm called "self attention" to combine mutating data matrices with trained matrices, in sequential layers like a pipeline, and these chunks of attention have been found to be able to adjust themselves to pretty much any task, if they have good data and enough quantity.

As stated previously most of the mainstream code out there gets unwieldy if there are more than 2048 input or output elements, and there are many changes explored around that but I'm not aware of one having been settled upon.

---------- Forwarded message ---------
Date: Sun, Apr 24, 2022, 2:49 PM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
To: Karl Semich <0xloem@gmail.com>


you don't want to RE a binary, that gets you into trouble.

But you are allowed to RE from the activity, basically memory reads
and writes. That's turning observation into code.

So, that's the first challenge.


---------- Forwarded message ---------
From: Karl Semich <0xloem@gmail.com>
Date: Sun, Apr 24, 2022, 3:01 PM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC


I agree that's preferable, but I don't have a simulator or a circuit logger at this time. Do you have these things?

I understand it's legal to re-engineer if the reverse engineer keeps their work private, sharing only an interface specification. Is this your experience?

---------- Forwarded message ---------
From: Karl Semich <0xloem@gmail.com>
Date: Mon, Apr 25, 2022, 5:34 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC




On Sun, Apr 24, 2022, 10:31 AM wrote:
I'm very interested in your ideas.

Yep, I"m interested, I've been trying to find someone interested in
this for 5 years now

I think I kind of glossed over this; what approaches have you pursued already? What challenges did you encounter?

On Sun, Apr 24, 2022, 2:49 PM  wrote:
you don't want to RE a binary, that gets you into trouble.

But you are allowed to RE from the activity, basically memory reads
and writes. That's turning observation into code.

So, that's the first challenge.

I didn't realize that serialice does this already.  It looks like serialice's qemu patches make it pretty easy to probe firmware behavior.

We could start training on normal open-source blobs and generate random working code to augment the data.

---------- Forwarded message ---------
From: Karl Semich <0xloem@gmail.com>
Date: Mon, Apr 25, 2022, 5:40 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC





On Sun, Apr 24, 2022, 10:32 AM  wrote:
I think we should start with a binary blob for an open system like the
allwinner d1. We have the blob, it's pretty simple code.

I websearched this chip; it seems it's RISC-V. I think that means that serialice's qemu patches wouldn't work with it out of the box.

Do you know how to hook i/o accesses for RISC-V in qemu?

If the first step is patching qemu it seems it might make more sense to start with an x86 chip?

---------- Forwarded message ---------
Date: Mon, Apr 25, 2022, 10:02 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC


The x86 are fiendishly complex.

 There is in development an open source version of the memory setup code for the D1, so it would be possible to create a proof of concept by logging all the memory rights that are done to set up dram.

I'm pretty familiar with the many types of dram startup and I would prefer if possible to start with the all winter D1 startup.

---------- Forwarded message ---------
From: Karl Semich <0xloem@gmail.com>
Date: Mon, Apr 25, 2022, 10:30 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC


I websearched but didn't find the open firmware yet. Where's it at?

You didn't mention an emulation environment, so I was thinking of how the logging would be done. I think the cpu or architecture actually wouldn't matter at all if you have enough example source, and instrument them to log their writes and reads.

The relevent task here seems to be collecting firmware examples that can be expressed in blocks smaller than 2048 tokens, and logging their runs on host hardware with instrumented reads and writes. I'm thinking on this a little, not quite sure yet how to approach it.

---------- Forwarded message ---------
Date: Mon, Apr 25, 2022, 10:34 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
To: Karl Semich <0xloem@gmail.com>


You are helping me refine my thinking.

We need a cheap platform, that has lots of people involved, that would
let us do this test.

I would still suggest something like the lichee r5 or similar
allwinner-based platform

---------- Forwarded message ---------
From: Karl Semich <0xloem@gmail.com>
Date: Mon, Apr 25, 2022, 10:40 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC



the hits I get for lichee r5 seem to mostly be in chinese, a language I haven't learned

what do you think of the idea of setting up some scripts to instrument firmware sources for a lot of different platforms?

---------- Forwarded message ---------
From: Karl Semich <0xloem@gmail.com>
Date: Mon, Apr 25, 2022, 11:17 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC


here is my present nascent idea, unsure:

- instrument sources to log their writes and reads, and use the logs as input data
- during compilation, use inotify to log which files are read from, and use these files as output data to train
- change to different source history points and shuffle configuration options to produce more data
- if needed, a larger model is bigbird: https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus this can handle 4096 tokens of input+output
- when input data is too large, add further trainable models to shrink it to fewer inputs as their output
- when output data is too large, parameterise the generation to generate only part at once
- run the code in qemu for the architecture so as to generate data for multiple architectures

- if timing is important and the instrumentation changes the timing, this could be measured and compensated for arithmetically

i'm a little confused on how logging might work in qemu. thinking of changing bootblock assembler to log things. seems this would best be done via a debugger with qemu that can act on the instructions themselves, not sure.

it seems it might also be good to measure code coverage so as to train only around code that is actually executed.

given there is a lot of hidden behavior, structures mutated without port writes, it could be good to err on the side of more data and more training expense.

or if things are limited to ordered port writes, it could be incredibly simple.

it seems to me x86 or arm makes more sense, so the code blocks could be executed on a normal consumer system.