Showing posts with label idea. Show all posts
Showing posts with label idea. Show all posts

Saturday, December 28, 2024

Debugging Spark Scala/ Java components

In continuation to the earlier post regarding debugging Pyspark, here we show how to debug the Spark Scala/ Java side. Spark is a distributed processing environment and has Scala Api's for connecting from different languages like Python & Java. The high level Pyspark Architecture is shown here.

For debugging the Spark Scala/ Java components as these run within the JVM, it's easy to make use of Java Tooling Options for remote debugging from any compatible IDE such as Idea (Eclipse longer supports Scala). A few points to remember:

  • Multiple JVMs in Spark: Since Spark is a distributed application, it involves several components like the Master/ Driver, Slave/ Worker, Executor. In a real world truly distributed setting, each of the components runs in its own separate JVM on separated Physical machines. So be clear about which component you are exactly wanting to debug & set up the Tooling options accordingly targetting the specific JVM instance.

  • Two-way connectivity between IDE & JVM: At the same time there should be a two-way network connectivity between the IDE (debugger) & the running JVM instance

  • Debugging Locally: Debugging is mostly a dev stage activity & done locally. So it may be better to debug on a a Spark cluster running locally. This could be either on a Spark Spark cluster or a Spark run locally (master=local[n]/ local[*]).

Steps:

Environment: Ubuntu-20.04 having Java-8, Spark/Pyspark (ver 2.1.0), Python3.5, Idea-Intelli (ver 2024.3), Maven3.6

(I) Idea Remote JVM Debugger
In Idea > Run/ Debug Config > Edit > Remote JVM Debug.

  • Start Debugger in Listen to Remote JVM Mode
  • Enable Auto Restart

(II)(a) Debug Spark Standlone cluster
Key features of the Spark Standalone cluster are:

  • Separate JVMs for Master, Slave/ Worker, Executor
  • All could run on a single dev box, provided enough resources (Mem, CPU) are available
  • Scripts inside SPARK_HOME/sbin folder like start-master.sh, start-slave.sh (start-worker.sh), etc to start the services

In order to Debug lets say some Executor, a Spark Standalone cluster could be started off with 1 Master, 1 Worker, 1 Executor.   

    # Start Master (Check http://localhost:8080/ to get Master URL/ PORT)
    ./sbin/start-master.sh 

    # Start Slave/ Worker
    ./sbin/start-slave.sh spark://MASTER_URL:<MASTER_PORT>

    # Add Jvm tooling to extraJavaOption to spark-defaults.conf
    spark.executor.extraJavaOptions  -agentlib:jdwp=transport=dt_socket,server=n,address=localhost:5005,suspend=n

    # The value could instead be passed as a conf to SparkContext in Python script:
    from pyspark.conf import SparkConf
    confVals = SparkConf()
    confVals.set("spark.executor.extraJavaOptions","-agentlib:jdwp=transport=dt_socket,server=n,address=localhost:5005,suspend=y")
    sc = SparkContext(master="spark://localhost:7077",appName="PythonStreamingStatefulNetworkWordCount1",conf=confVals)

(II)(b) Debug locally with master="local[n]"

  • In this case a local Spark cluster is spun up via scripts like spark-shell, spark-submit, etc. located inside the bin/ folder
  • The different components Master, Worker, Executor all run within one JVM as threads, where the value n is the no of threads, (set n=2)
  • Export JAVA_TOOL_OPTIONS before in the terminal from which the Pyspark script will be run

        export JAVA_TOOL_OPTIONS="-agentlib:jdwp=transport=dt_socket,server=n,suspend=n,address=5005"

(III) Execute PySpark Python script
    python3.5 ${SPARK_HOME}/examples/src/main/python/streaming/network_wordcount.py localhost 9999

This should start off the Pyspark & connect the Executor JVM to the waiting Idea Remote debugger instance for debugging.

Sunday, February 24, 2019

Memory Gene

Have a conjecture that soon someone's going to be discovering a memory gene in the human genome. This doesn't seem to be have been done or published in any scientific literature so far. The concept of Genetic Memory from an online game is close, but then that's fiction.

The idea of the memory gene is that this gene on the human genome will act as a memory card. Whatever data the individual writes to the memory gene in their lifetime can be later retrieved by their progenies in their lifetimes. The space available in the memory gene would be small compared to what is available in the brain. If the disk space of the human brain is a Petabyte (= 10^12 Kilobytes), space of the memory gene would be about 10 Kilobytes. So very little can be written to the memory gene.

Unlike the brain to which every bit of information (visual, aural, textual, etc.) can be written to at will, writing to the memory gene would require some serious intent & need. Writing to the memory gene would be more akin to etching on a brass plate - strenuous but permanent. The intent would be largely triggered by the individual's experience(s), particularly ones that triggers strong emotions perhaps beneficial to survival. Once written to the memory gene this information would carry forward to the offsprings.

The human genome is known to have about 2% coding DNA & the rest non-coding DNA. The coding portions carry the instructions (genetic instructions) to synthesize proteins, while the purpose of the non-coding portions is not clearly known so far. The memory gene is likely to have a memory addressing mechanism, followed by the actual memory data stored in the large non coding portion.

At the early age of 2 or 3 years, when a good portion of brain development has happened in the individual, the memory recovery will begin. The mRNA, ribosome & the rest of the translation machinery will get to work in translating the genetic code from the memory gene to synthesize the appropriate proteins & biomolecules of the brain cell. In the process the memory data would be restored block by block in the brain. This would perhaps happen over a period of low activity such as night's sleep. The individual would later awaken to new transferred knowledge about unknowns, that would appear to be intuitive. Since the memory recovery would take place at an early age, conflicts in experiences between the individual and the ancestor wouldn't happen.
  
These are some basic features of the very complex memory gene. As mentioned earlier, this is purely a conjecture and shouldn't be taken otherwise. Look forward to exploring genuine scientific researches in this space as they get formalized & shared.

Update 1 (27-Aug-19):
For some real research take a look at the following:

 =>> Arch Gene:
 Neuronal gene Arc required for synaptic plasticity and cognition. Resemble  retroviral/retrotransposon in their transfer between cells followed by activity dependent translation. These studies throw light on a completely new way through which neurons could send genetic information to one another. More details from the 2018 publication available here:
  •    https://www.nih.gov/news-events/news-releases/memory-gene-goes-viral
  •    https://www.cell.com/cell/comments/S0092-8674(17)31504-0
  •    https://www.ncbi.nlm.nih.gov/pubmed/29328915/

  =>> Memories pass between generations (2013):
 When grand-parent generation mice are taught to fear an odor, their next two generations (children & grand-children) retain the fear:
  •    https://www.bbc.com/news/health-25156510
  •    https://www.nature.com/articles/nn.3594
  •    https://www.nature.com/articles/nn.3603

 =>> Epigenetics & Cellular Memory (1970s onwards):
  •    https://en.wikipedia.org/w/index.php?title=Genetic_memory_(biology)&oldid=882561903
  •    https://en.wikipedia.org/w/index.php?title=Genomic_imprinting&oldid=908987981

  =>> Psychology - Genetic Memory (1940s onwards):Largely focused on the phenomenon of knowing things that weren't explicitly learned by an individual:
  •    https://blogs.scientificamerican.com/guest-blog/genetic-memory-how-we-know-things-we-never-learned/
  •    https://en.wikipedia.org/w/index.php?title=Genetic_memory_(psychology)&oldid=904552075
  •    http://www.bahaistudies.net/asma/The-Concept-of-the-Collective-Unconscious.pdf
  •    https://en.wikipedia.org/wiki/Collective_unconscious