Automated Engineering Notes was Re: Deprecation of the Intel Quark SoC - coreboot

2 May 2022


      I'm not working on this but am still interested in the idea, in my hobby
capacity. The below emails were exchanged. I didn't yet receive further
reply.
---------- Forwarded message ---------
Date: Sun, Apr 24, 2022, 10:31 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
To: Karl Semich 0xloem@gmail.com
I'm very interested in your ideas.
Yep, I"m interested, I've been trying to find someone interested in
this for 5 years now
---------- Forwarded message ---------
Date: Sun, Apr 24, 2022, 10:32 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
To: Karl Semich 0xloem@gmail.com
I think we should start with a binary blob for an open system like the
allwinner d1. We have the blob, it's pretty simple code.
---------- Forwarded message ---------
From: *Karl Semich* 0xloem@gmail.com
Date: Sun, Apr 24, 2022, 2:41 PM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
hi
is this too many words below? are you still interested? how do you feel
about making the conversation more public?
my biggest issue is neurological and psychological, it's had for me to do
things, so the more people the better
the transformer model approach works on data. they need a lot of examples
of something to begin copying well, thousands or millions or more. it seems
fewer and fewer examples are needed the more similar data a model has been
trained on already, down to finally needing none. ("few-shot",
"zero-shot"). so one could take a pretrained model and e.g. adjust it (via
training) to predict comments from uncommented code, or predict source from
binaries of an architecture where more binaries are available, or from a
specific set of compilers compiler where many training binaries could be
generated from commented code. then by combining these separate models one
might likely be able to perform a task on which less data is available.
for me at this point, finding or generating that data, and finding paths to
fit the model better to the target tasks, seems equally challenging
compared to just setting up a basic training framework to fine tune models.
there are a few different infrastructures, and sometimes people make their
own if doing a new kind of task.
most people are using jupyter notebooks via various online portals. there
are also compute servers at places like vast.ai. but I am out of money and
most of my devices are very low end.
I expect this to happen and am interested in helping it happen but i'll be
pretty slow and flaky on my own :)
do you have any thoughts on generating or finding data similar to many
pairings of embedded binary blobs and license-free interface
specifications? is the allwinner d1 much data?
most of the similar models I know of out there continue streams of human
text or source code pick selecting up to 2048 words from a fixed vocabulary
maybe 10k to 100k words large. There are many known techniques to extend
this to more words but finding and implementing them is part of the effort.
This also informs how to pick or develop a training framework: something
that makes it easier to rearchitect the models will be needed to work with
or generate data larger than 2048 symbols long.
the challenge of lengthening may also be postponeable if the meaningful
patterns of the data can be squeezed to fit within that size, as is done
for human text in the mainstream. the model may learn appropriate patterns
and then take only a little further adjustment to handle larger ones once
rearchitectured to do so.
---------- Forwarded message ---------
From: *Karl Semich* 0xloem@gmail.com
Date: Sun, Apr 24, 2022, 2:58 PM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
Basically, there's a class of neural networks called transformer models
that have become mainstream for successfully solving problems across
domains.
Common systems used to work with them that I'm aware of include pytorch,
jax, huggingface transformers, keras, and tensorflow: all python libraries.
A number of groups are offering hubs where people share pretrained models
for others to use. I've heard https://fast.ai/ has a good starting course,
but it didn't quite seem for me, I haven't taken it.
I think most people are learning from example jupyter notebooks. I've
mostly learned by skimming the huggingface transformers source code and
occasional research papers.
Each transformer model is roughly made of "layers" of "self attention" ,
which is roughly just a class of a few operations performed on data
represented as matrices, and then passed to the next layer where the
operations are performed again. Each layer has many "weights" --
coefficients -- which are how the model "learns", by backpropagating
gradients to adjust their values.
A big thing appears to have been how to organise the beginning and end of
the chain. The models usually have something different on both ends, to
process the input or output data between the internal representation of
huge matrices of floating point numbers. It seems important to do this in a
way that preserves the meaning of the data well, and often a couple other
neural network parts are used -- very short algorithms that can operate
with trainable coefficients on large matrices.
The basic takeaway is that transformer models use an algorithm called "self
attention" to combine mutating data matrices with trained matrices, in
sequential layers like a pipeline, and these chunks of attention have been
found to be able to adjust themselves to pretty much any task, if they have
good data and enough quantity.
As stated previously most of the mainstream code out there gets unwieldy if
there are more than 2048 input or output elements, and there are many
changes explored around that but I'm not aware of one having been settled
upon.
---------- Forwarded message ---------
Date: Sun, Apr 24, 2022, 2:49 PM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
To: Karl Semich 0xloem@gmail.com
you don't want to RE a binary, that gets you into trouble.
But you are allowed to RE from the activity, basically memory reads
and writes. That's turning observation into code.
So, that's the first challenge.
---------- Forwarded message ---------
From: *Karl Semich* 0xloem@gmail.com
Date: Sun, Apr 24, 2022, 3:01 PM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
I agree that's preferable, but I don't have a simulator or a circuit logger
at this time. Do you have these things?
I understand it's legal to re-engineer if the reverse engineer keeps their
work private, sharing only an interface specification. Is this your
experience?
---------- Forwarded message ---------
From: *Karl Semich* 0xloem@gmail.com
Date: Mon, Apr 25, 2022, 5:34 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
On Sun, Apr 24, 2022, 10:31 AM wrote:
I'm very interested in your ideas.
Yep, I"m interested, I've been trying to find someone interested in
this for 5 years now
I think I kind of glossed over this; what approaches have you pursued
already? What challenges did you encounter?
On Sun, Apr 24, 2022, 2:49 PM  wrote:
you don't want to RE a binary, that gets you into trouble.
But you are allowed to RE from the activity, basically memory reads
and writes. That's turning observation into code.
So, that's the first challenge.
I didn't realize that serialice does this already.  It looks like
serialice's qemu patches make it pretty easy to probe firmware behavior.
We could start training on normal open-source blobs and generate random
working code to augment the data.
---------- Forwarded message ---------
From: *Karl Semich* 0xloem@gmail.com
Date: Mon, Apr 25, 2022, 5:40 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
On Sun, Apr 24, 2022, 10:32 AM  wrote:
I think we should start with a binary blob for an open system like the
allwinner d1. We have the blob, it's pretty simple code.
I websearched this chip; it seems it's RISC-V. I think that means that
serialice's qemu patches wouldn't work with it out of the box.
Do you know how to hook i/o accesses for RISC-V in qemu?
If the first step is patching qemu it seems it might make more sense to
start with an x86 chip?
---------- Forwarded message ---------
Date: Mon, Apr 25, 2022, 10:02 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
The x86 are fiendishly complex.
There is in development an open source version of the memory setup code
for the D1, so it would be possible to create a proof of concept by logging
all the memory rights that are done to set up dram.
I'm pretty familiar with the many types of dram startup and I would prefer
if possible to start with the all winter D1 startup.
---------- Forwarded message ---------
From: *Karl Semich* 0xloem@gmail.com
Date: Mon, Apr 25, 2022, 10:30 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
I websearched but didn't find the open firmware yet. Where's it at?
You didn't mention an emulation environment, so I was thinking of how the
logging would be done. I think the cpu or architecture actually wouldn't
matter at all if you have enough example source, and instrument them to log
their writes and reads.
The relevent task here seems to be collecting firmware examples that can be
expressed in blocks smaller than 2048 tokens, and logging their runs on
host hardware with instrumented reads and writes. I'm thinking on this a
little, not quite sure yet how to approach it.
---------- Forwarded message ---------
Date: Mon, Apr 25, 2022, 10:34 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
To: Karl Semich 0xloem@gmail.com
You are helping me refine my thinking.
We need a cheap platform, that has lots of people involved, that would
let us do this test.
I would still suggest something like the lichee r5 or similar
allwinner-based platform
---------- Forwarded message ---------
From: *Karl Semich* 0xloem@gmail.com
Date: Mon, Apr 25, 2022, 10:40 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
the hits I get for lichee r5 seem to mostly be in chinese, a language I
haven't learned
what do you think of the idea of setting up some scripts to instrument
firmware sources for a lot of different platforms?
---------- Forwarded message ---------
From: *Karl Semich* 0xloem@gmail.com
Date: Mon, Apr 25, 2022, 11:17 AM
Subject: Re: [coreboot] Re: Deprecation of the Intel Quark SoC
here is my present nascent idea, unsure:
- instrument sources to log their writes and reads, and use the logs as
input data
- during compilation, use inotify to log which files are read from, and use
these files as output data to train
- change to different source history points and shuffle configuration
options to produce more data
- if needed, a larger model is bigbird:
https://huggingface.co/docs/transformers/model_doc/bigbird_pegasus this can
handle 4096 tokens of input+output
- when input data is too large, add further trainable models to shrink it
to fewer inputs as their output
- when output data is too large, parameterise the generation to generate
only part at once
- run the code in qemu for the architecture so as to generate data for
multiple architectures
- if timing is important and the instrumentation changes the timing, this
could be measured and compensated for arithmetically
i'm a little confused on how logging might work in qemu. thinking of
changing bootblock assembler to log things. seems this would best be done
via a debugger with qemu that can act on the instructions themselves, not
sure.
it seems it might also be good to measure code coverage so as to train only
around code that is actually executed.
given there is a lot of hidden behavior, structures mutated without port
writes, it could be good to err on the side of more data and more training
expense.
or if things are limited to ordered port writes, it could be incredibly
simple.
it seems to me x86 or arm makes more sense, so the code blocks could be
executed on a normal consumer system.