This document is available in pdf at: goo.gl/R9eUZh
QUESTION FROM CERN & PAVIA’s SCIENTISTS: How can your invention with thousands of parallel-processing 3D-Flow OPRA processors in a cube of 16 cm or 36 cm of electronics compete with the major supercomputer projects such as APE of INFN and others, although none of them claim to be able to save millions of human lives from cancer or be the most powerful tool in the world to discover subatomic particles?
CROSETTO: It is first necessary to clarify the claim of saving millions of human lives from cancer. This is confirmed by experimental data demonstrating that early diagnosis of any type of cancer greatly increases the likelihood of saving the patient’s life. If colon cancer is diagnosed early, survival is 91%, but if diagnosed late, survival drops to 11%; for breast cancer from 98% to 27% if diagnosed late, and lung cancer survival is 50% if diagnosed early, but drops dramatically to 2.8% if diagnosed late.
Since there are effective treatments that work when cancer is diagnosed early, it is necessary look for the missing piece of the puzzle that will provide us with early effective diagnosis.
It is therefore necessary to observe what signals are provided by the body when there is a mutation of normal cells into cancerous cells. To put it simply, these cells “go crazy”, lose their correct DNA characteristics, and grow more rapidly than normal cells generating other cancerous cells.
Among all the signs of change such as odor, temperature, tissue conductivity, fluorescence, density, etc., the most reliable signal is the change in metabolism; that is, the higher nutrient consumption by cancer cells that increases from 5% to 70% because of their rapid growth.
The instrument that allows us to observe noninvasively the difference in cellular consumption in the body is called PET (Positron Emission Tomography) which associates a radioactive radioisotope with nutrient molecules such as oxygen, glucose, etc., and then administers this radioactive nutrient by inhalation or injection into the patient.
The radioactive signals that allow us to follow the path of this nutrient inside the body and to verify which organs or tissues of the body consume it abnormally are two 511keV photons emitted in opposite directions from the body of the patient. It is therefore necessary to accurately capture as many of these signals as possible in order to reduce radiation to the patient and to see the change from normal to cancerous cells even when the abnormal consumption of nutrient is barely perceptible.
The patient’s body, in current PET devices, is surrounded by a ring of crystals (or a 1.6-meter long cylinder of crystals, in the case of my 3D-CBS -3-D Complete Body Screening) that converts the good signals for 511 keV and the background radiation (noise) in electrical signals. As millions of signals per second are generated, it becomes necessary to process them in parallel from many processors because one processor would fail to filter the good 511 keV pair of photons signals from the background noise of the radiation.
There are different types of processors and parallel-processing systems on the market. I followed the evolution of processors since their beginning in the 1970s, thoroughly studying the most popular ones such as Intel’s 8080 and 8085, the Zilog Z80, the 6502 of the MOS technology used in the first Apple computers, and the 6809 from Motorola.
I learned the basic knowledge of these processors, the difference between their internal architectures, interfacing, their instruction set, by attending CERN’s specialized evening classes held by Tor Linejarde and Adolfo Fucci in the 1970s.
Subsequently, I deepened the knowledge of not only generic processors and the two main streams of Berkley University’s ‘window register’ architecture and Stanford University ‘MIPS’, but also those for special applications such as Inmos Transputer and the DSP (Digital Signal Processors) architecture from Harvard University, which separates the program memory from the data memory. Using them, I have increased and deepened my knowledge by learning from the results of my projects, from their realization and during the phase of testing.
One month of the year for ten years, from 1980 to 1990, I was part of a team of experts from CERN, from universities in the USA and other countries who would teach some 120 PhD physicists and engineers of third world countries at a College on Microprocessors, how to build a computer, an operating system, hardware, software and interfacing (with laboratory hours) for medical or other equipment, beginning with the Motorola 6809 microprocessor, at a cost of just a few dollars.
In the 1980s, I designed and built applications using one of the first DSPs, the Texas Instrument 320C10, and I also studied the Motorola DSP96000. In 1988, when I was employee of CERN, I designed the FDPP parallel-processing node (goo.gl/kjy4sm) in a TRAM standard form factor for IBM PC and VME that was using commercial processors: the Transputer and the 50Mflops DSP32C from AT&T, which at that time was the lowest cost per Mflop. I then used that system for Q measurements, that serve to focus the beam in the SPS accelerator at CERN and my parallel-processing system and I designed a parallel-processing system for the Level-2 trigger of the DELPHI experiment using the Texas Instrument DSP320C10 when the Spokesperson of DELPHI was Ugo Amaldi.
Thanks to these projects and my knowledge of microprocessors and parallel-processing systems, I have been recognized as a CERN expert in this field, and I was invited to give a lecture on the application of microprocessors in High Energy Physics at the prestigious “CERN School of Computing” in 1991. This lesson, available at “goo.gl/2mMRmD”, summarizes in a figure for each type of processor the different architectures of the Transputer, the Berkley’s “window register” architecture, the Stanford’s “MIPS” architecture, the microcontrollers, and compares them with Harvard’s DSP architecture.
Because I was not satisfied with the performance of microprocessors and of the commercial systems (not even DSP320C40 from Texas Instrument offering features in communications channels similar to the Transputer, but faster) to identify specific particles with different characteristics defined by theoretical physicists, I had the idea to conceive the invention of the 3D-Flow processor architecture and the 3D-Flow parallel-processing system architecture (goo.gl/NQ8Cck) that would allow physicists to move full the programmability from the Level 2 Trigger to the Level-1 Trigger.
In this way, efficiency in both physics and medical diagnostics would increase considerably. In the first case, it would eliminate the problem of analyzing only garbage data at the Level-2 and Level-3 trigger because the good ones had already been lost by Level-1, and in the second case it would make an effective early cancer diagnosis possible using a radiation of 1% compared to current PETs; this is the missing key to save many lives, since the cure for cancer diagnosed at an early stage has existed for a long time and it works.
My 3D-Flow invention has been recognized valuable by top experts in the field for over twenty years (see excerpts from letters at “goo.gl/GIC5aR” and some complete letters at “goo.gl/VXBx33“) and approved in a formal, official review at FERMILAB on December 14, 1993 (goo.gl/zP76Tc) published in a 45-page peer-reviewed article by a prestigious scientific journal: Nuclear Instruments and Methods in Physics Research, Sec. A, vol. 436 (1999) pp. 341-385. The proof of concept of the 3D-Flow architecture was presented functional in hardware in two large FPGA chips, each containing four 3D- Flow processors at the IEEE-NSS-MIC conference in San Diego, California in 2001. The small 8-processor 3D-Flow system enabled scientists who attended the conference to select the “shape” of an object by choosing the position of the input switches to the system The LEDs showed whether the system had found the object, and the waveforms on the oscilloscope would accurately visualize the execution time of the algorithm. Then the 3D-Flow system was proven feasible and functional in hardware in two modular electronic boards, each with 68 x 3D-Flow processors that were presented at the IEEE-NSS-MIC 2003 conference in Portland, Oregon (goo.gl/RiIn0B). The 3D-Flow OPRA processor has the capability to perform up to 26 addition, subtraction, comparison, etc., operations every three nanoseconds.
Because my 3D-Flow processor and system architecture is technology-independent, it continues to be valid and very competitive even today with respect to any other system in the world for applications that need to execute complex object pattern recognition algorithms in real-time on data arriving from thousands of channels at very high-speed.
In fact, after my former supervisor from the Superconducting Super Collider, Jim Siegrist (who is currently the Director of the Office of High Energy Physics at the US Department of Energy and manages over a billion dollars a year for research in physics), requested me to target my invention to current technology, it turned out from 59 quotes from reputable companies that my new 3D-Flow OPRA system targeted to the 2015 technology, was superior to any other system in the world for this kind of applications that can process incoming data at very high speeds.
The major parallel-processing systems commercially available do not have these features although they are more powerful (but also more expensive) to solve other computing problems that do not require data be acquired at several terabytes or petabytes per second as it can be sustained by the 3D-Flow OPRA system.
Currently, the top parallel-processing systems in the world are in China with “Sunway TaihulLight and Tianhe-2, followed by the systems in United States with the Tititan system, model Cray XK7, at DOE, Oak Ridge National Lab, followed by the Sequia system, model IBM-Blue Gene/Q, followed by the Cori system, model Cray XC40, followed by the Japanese Fujitsu’s Oakforest-PACS and K Computer systems, followed by the Piz Daint system, model Cray XC50 in Switzerland.
Peak performance is at 150 PetaFlops (i.e. 150 million billion floating point operations per second, represented by the number 150 followed by 15 zeros).
Several supercomputers listed above use a Graphic Processing Unit (GPU). For example, NVIDIA K20x is used in the Titan supercomputer in the US, with 261,632 units providing 17.59 PetaFlops, while Piz Daint in Lugano in Switzerland uses 73,808 to achieve a 6.27 PetaFlop performance.
The Italian project of the INFN APEnext 2000-2005 had achieved the performance of 7 TeraFlops (i.e. 7 million millions of floating point operations per second) with the prospect of reaching 1 PetaFlops in 2009.
The Italian ENI Oil and Gas Company’s High Performance Computing HPC3 together with the co-existing HPC2 system, both based on GPUs, will provide a sustained 5.8 PetaFlops, and 8.4 PetaFlops of peak computing capacity. HPC3 is an intermediate step towards the next evolution, the HPC4, expected at the beginning of 2018 to overcome the barrier of 10 PetaFlops of computing power.
In 2015, President Obama signed an executive order authorizing the National Strategic Computing Initiative (NSCI) to create a new supercomputer of 1 ExaFlops (i.e. 1,000 PetaFlops, equivalent to 1 billion billions floating point operations per second, or represented by the number 1 followed by 18 zeros).
By comparison, 1,000 cubes of electronics, 36 cm per side using the 3D-Flow OPRA would have the performance of 1 Exa Operations on integer numbers per second, i.e. 1,000 Peta Operations per second (For the recognition in real-time of objects such as subatomic particles there is no need for floating point computation, however, special operations such as comparing a value with 24 variables in three nanoseconds are required. These instructions are not available on the supercomputers listed above while they are available in the instruction set of the 3D-Flow OPRA).
In addition, the 3D-Flow OPRA system in 1,000 cubes of electronics, 36 cm per side would have the capability to sustain an input data speed of 20 petabytes per second. This is the unique feature in the world that other supercomputers listed above cannot even get close to, not even the future supercomputer approved by President Obama in 2015.
Therefore, the answer to the question as to why other supercomputers cannot claim to improve efficiency in filtering and capturing all signals from tumor markers that make it possible to diagnose cancer at an early stage making it possible to saves many lives, can be exemplified in one simple analogy without undermining the value of supercomputers.
For each application it is necessary to use the appropriate instrument; to move a clock gear a bulldozer is not needed because even though it has higher capabilities, it is not the tool suitable for this task.
As a practical example to solve the Level-1 Trigger of large-scale experiments at CERN, the 59 quotes from reputable industries prove the feasibility of building a 3D-Flow OPRA system in a 36 cm cube of electronics with 8,192 electronic channels capable of acquiring 1.3 terabytes/second input data rate and capable of identifying all subatomic particles that meet the user-programmable defined characteristics at a cost of $100,000 per unit per experiment.
(See two-page at goo.gl/AoszvQ, details at goo.gl/w3XlZ1; how it can replace 4,000 CERN CMS boards (goo.gl/mPHw5Y) with 9 boards “goo.gl/OTkH4z” while providing an enormous performance improvement at one thousandth the cost).
As a practical example of the solution to the problem of building the 3D-CBS (3-D Complete Body Screening, see trifold at “goo.gl/YcAJDy” and to see more information at “goo.gl/JMKyek“) that makes it possible to diagnose cancer at an early curable stage is provided by the 59 quotes from reputable industries which prove feasibility of building a 3D-Flow OPRA system contained in a 16 cm electronic cube of electronics with 2,304 electronic channels capable of sustaining an input data rate of 368 Gigabytes per second, and capable of capturing all possible signals from tumor markers at the lowest cost for each valid signal captured. The cost of this unit based on the 59 quotes is $50,000. (See the 2000 book: “400+ times improved PET efficiency for lower-dose radiation, lower-cost cancer screening” at “goo.gl/ggGGwF”, the 32-page 2013 article at “goo.gl/qpnNxd”, one-page innovations at “goo.gl/3AFCWM”, one-page benefits at “goo.gl/Zx1p9Q”, two-page 2016 summary and comparison with the Explorer at: “goo.gl/QLuA1n”). Sign the Petition at: goo.gl/dzmYCz
QUESTION FROM CERN & PAVIA’s SCIENTISTS: How can your 3D-Flow OPRA invention in a 16 cm or 36 cm electronic cube that receives signals through copper wire compete with the high data rates that can be reached with optical fibers?
CROSETTO: For each application one needs to use the most suitable component to maximize performance, maximize robustness, and maximize reliability in the most cost-effective way.
In the case of the 3D-Flow OPRA project, CERN’s researcher has the maximum freedom to use the most convenient means of communication to transfer the data from the detector of each experiment to the PRAI chassis (Patch Panel Regrouping Associates Ideas) located 50 cm from the 3D-Flow OPRA electronic boards as illustrated in the center of figure 1 on page 3, in figure 2, bottom left on page 19 and in figure 11 in the center left on page 36 of ”goo.gl/w3XlZ1”. In fact, these figures include fiber channel connectors and cables, 100Gbps Ethernet, PCI, Firewire, USB, SATA, InfiniBand, etc.
On the PRAI-B electronic board of the first page and bottom left-hand side of the second page of “goo.gl/AoszvQ”, information relative to each event generated at a specific time in the detector is decoded by electronic circuits according to the different transmission protocols, and events are aligned according to the event’s generation time in a word of 8,192-bit width. This word is then sent at speeds of 1.28 Gbps through 8.192 Micro Twinax cables carrying LVDS signals to the 8 x 3D-Flow OPRA boards of the same system about 50 cm away.
I did not use 100 Gbps optical buffers for several reasons. High-speed transmissions need a receiving circuit in the 3D-Flow OPRA board that would increase power consumption and subtract some printed circuit areas to the 3D-Flow OPRA processors, so all of them could not fit in a 36 cm cube, requiring more crates to be added. This would increase the distance between the components, which would require more layers of processors to support the same input data rate so input data would not be lost. In turn, this would require more processors that would require more crates.
In short, the increase in the number of components for 100 Gbps signal decoding would require an increase in other components to achieve the same performance, and this would create an avalanche effect that would increase dissipated power, costs, and lower the overall level of performance of the system.
To maximize the system performance of parallel-processing systems that need to exchange data with neighboring processors to perform image pattern recognition algorithms, it is necessary to keep the system contained in a minimum space because the greater the distances between components, the longer the delay of the signal would be to reach the destination (each cm is 33 picoseconds).
The robustness of the Micro Twinax cables (see page 184 of “goo.gl/w3XlZ1”) and the reliability of using cables for a transmission ten times slower than the performance characteristics of the component is increased. Likewise, having used 400-pin connectors (see page 187 of “goo.gl/w3XlZ1”) to transmit 128 signals by connecting a plurality of pins between the signals to ground in order to eliminate the crosstalk between signals in adjacent pins, increases reliability. Likewise, the cables and connector assembly on a small 30 cm electronic board layout on page 195 of “goo.gl/w3XlZ1” for the 36 cm cube of electronics, and the assembly on a small 21 cm electronic board on page 193 “goo.gl/w3XlZ1” for the 16 cm cube of electronics, offer not only guarantees of the robustness of the system but also reliability. Several of the assemblies of the Micro Twinax cables and the pin assignment of the 400-pin connectors have been simulated for signal integrity by Samtec.
To reduce costs, increase the robustness and reliability of the system, I used copper traces on a printed circuit for transferring signals between the electronic boards on a backplane of the 3D-Flow OPRA system as shown on page 157 of “goo.gl/w3XlZ1”).
The transfer rate of 1.3 Terabytes per second at a distance of about 50 cm between the PRAI crate and the 3D-Flow OPRA crate using 8,192 Micro Twinax cables carrying LVDS signals has great advantages in robustness, reliability and lower cost of the system compared to transferring them through optical fibers requiring circuits for encoding and decoding a transmission protocol. These additional circuits increase power consumption, occupy a printed circuit area and introduce a delay in coding, decoding, and aligning data upon arrival between signals transmitted in different optical fibers.
Sign the Petition at: goo.gl/dzmYCz