Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set up code to capture new GPUs #6

Open
bb30994 opened this issue Sep 24, 2020 · 7 comments
Open

set up code to capture new GPUs #6

bb30994 opened this issue Sep 24, 2020 · 7 comments

Comments

@bb30994
Copy link

bb30994 commented Sep 24, 2020

To capture the information when a new GPU shows up, you're going to need some new code somewhere. What are the chances that you can start capturing the PCIe binary codes and make a temporary entry somewhere 0xiiii:0xjjjj:N:1: [TBD]? Even though the assignment to beta may not happen until later, capturing the pcie codes would help me a lot. The lookup table for iiii to N can be done at the same time -- or not. Your choice.

We can also start work on filtering out the such reports that do not represent a new GPU, too.

@jchodera
Copy link
Member

This is a good question for @jcoffland : Can we capture these automatically?

Alternatively, since there are only three prefixes (0x1002,0x10de, 0x8086) we could just enumerate 3 x 65536 = 166K lines in GPUs.txt for all possible cards and assign any unrecognized cards to device class 1 ("benchmark/test only").

@bb30994
Copy link
Author

bb30994 commented Sep 25, 2020

I think we have to filter the results. We already have a number of old GPUs that cannot be supported but we have not systematically listed all the pre-Fermi GPUs, even looking just a nV,

When Intel asked us to support their GPUs, they specified Gen 9 or better. I happen to have a Gen 7. Should I manually enable it and run tests on it or should we accept their recommendation. AMD has a lot of old devices acquired from ATI that they no longer support.

I think it's wrong to accept anything that happens to show up unless it's "modern"

@bb30994
Copy link
Author

bb30994 commented Sep 25, 2020

The reason gpus.txt was originally established was to exclude those devices that couldn't be supported.

@bb30994
Copy link
Author

bb30994 commented Sep 25, 2020

There are a lot of 0s already but it's not an all-inclusive list. The lists were built based on somebody asking for support.

Are we also going to look up the name from some external source to replace TBD with something meaningful or is that still going to require human support?

@bb30994 bb30994 changed the title set up code to capture new wus set up code to capture new GPUs Sep 25, 2020
@bb30994
Copy link
Author

bb30994 commented Sep 25, 2020

The following email discussion took place before this ticket was opened. It is included here for reference.

Joseph Coffland
We are going to need to make all non-supported GPUs species zero. This
will cause the client to report the GPU as unsupported. Otherwise, if say
GPU species 1 is unsupported, the client will still configure the GPU and
the user will be confused as to why they cannot get an assignment.

John Chodera
Do you have a suggestion for how we can test new GPUs (e.g. the Intel GPUs or other GPUs without having a species assignment yet) under this scheme?

Joseph Coffland
John, We could treat species 1 as "beta". We would put GPUs into beta to test them. And plan to remove GPUs from beta fairly quickly. The main thing is that we

Bruce Borden 
In a discussion with John, we've come up with the following plan, (1) Assuming new GPUs might appear for a person who runs WebClient, AND (2) assuming changing WebClient is a lot easier than distributing a new FAHClient....* Add a button to WebClient "Request support for a new GPU"* This will run lspci and identify the unsupported GPU.  It can make whatever changes are necessary in config.xml to place a request to be added as species=1 modification [this is in lieu of asking the donor to gather this information and set up INTERNAL and PROJECT-KEY or whatever Joseph deems is appropriate.* John will create a new benchmark project similar to 17100 which awards zero points.  That way the Project-Key can be static and without the incentive to earn increased PPD by manually making those settings in case somebody peeks at config.xml.* The appropriate number of benchmark WUs will be processed.Some kind of a reply will appear in WebControl informing them that the GPU will process test projects for a day or two and earn zero points before the GPU can be configured.

What do you think about reclassifying all Intel iGPUs as species 1 and telling folks who want to test them that they're allowed to gather benchmarking information for 0 points?  That'll conflict with Joseph's desire to have the client NOT create a slot for them. (You might suggest it to him.)

@bb30994
Copy link
Author

bb30994 commented Sep 25, 2020

The client should have already detected the lspci codes and checked GPUs.txt to see if it's already known. If it's a 0, message says it's unsupported. If it's a 1, the message an say this GPU is being benchmarked. If it is not listed an entry can be added as mentioned at the top of this topic and the message for species 1 posted using a future tense.

This is not a huge problem. We may get a dozen requests for new GPUs per year. The most labor intensive step is obtaining the correct lspci code which, as noted above, the client will already have available to it.

@bb30994
Copy link
Author

bb30994 commented Oct 2, 2020

We'll still need to add a man-readable name or devise a way to capture that information elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants